Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Teja Anupoju

Apex,NC

Summary

Senior Data Engineer and Analyst with experience in Automobile, Media, Health Care, Software and Engineering domains. Expertise in building data pipelines and dashboards that furnish insights used to advance opportunity identification and process re-engineering along with a story to tell. Specialized in Data Analytics, Data Engineering and Data Visualization. Analytically minded professional with a proven ability to solve complex quantitative business challenges. Exceptional verbal and written communication skills, with a track record of effectively conveying insights to both business and technical teams. Adept at utilizing data to drive strategic decision-making and delivering impactful presentations.

Over 8+ years of IT experience in Big Data Engineering, Analysis, Design, Implementation, Development, Maintenance, and test large scale applications using SQL, Hadoop, Python, Java, and other Big Data technologies. Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Cloudera and Hortonworks distribution of Hadoop. Experience working with Elasticsearch, Log stash and kibana. Experience working with AWS Stack (S3, EMR, EC2, SQS, Glue and Athena, RedShift). Design AWS architecture, cloud migration, AWS EMR, DynamoDB, Redshift, and lambda function event processing. Used AWS Glue and I designed jobs to convert nested JSON objects to parquet files and load them into S3. Experienced in developing PySpark programs and creating data frames and working on transformations. Created Hive, SQL and HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL, and a variety of portfolios. Worked on developing ETL processes to load data from multiple data sources to HDFS using Kafka and Sqoop. Experience in using Apache Kafka for collecting, aggregating, and moving large amounts of data. Design and implementation of Oracle database systems to meet business requirements and performance goals. Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS. Experience in data analysis using Hive, Impala. Experience in developing large scale applications using Hadoop and Other Big Data tools. In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node. Practical experience in all phases of the data management lifecycle, including the analysis of initial business requirements, conversion to data requirements, as well as the creation and execution of creative business procedures and data solutions for the manufacturing, healthcare, energy, and e-commerce industries. Proven expertise in data governance, master data management, advanced analytics, healthcare data domains, data distribution, data warehousing, and relational data modeling. Skilled at working with cross-functional teams to comprehend business requirements and transform them into scalable pySpark solutions. Spearheaded end-to-end ETL processes using Talend BigData ETL tool, ensuring seamless data extraction, transformation, and loading. Major expertise with SQL, PL/SQL, and TALEND development and maintenance in a corporate-wide ETL solution on UNIX and Windows platforms. Experienced in data manipulation using python for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations. I managed the development and delivery of Docker images using CI/CD pipelines, and as part of the deployment procedure, terraform managed the required infrastructure upgrades. Large-scale environments were created using Terraform as infrastructure as a code (IAC). Experience with database SQL and NoSQL (HBase and Cassandra.) Perform structural modifications using Hive and analyze data using visualization/reporting tools (Tableau). Experience in using Hadoop ecosystem and processing data using Tableau, Quick sight and Power BI. Django and Flask, two top Python web frameworks, are used by an experienced data engineer to power data-driven solutions for the automobile sector. Took part in cross-functional meetings to collect requirements and offer technical knowledge on issues pertaining to APIs. Experienced with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing, and analysis of data. Experienced in using various Hadoop infrastructures such as Hive and Sqoop. To seek and maintain full-time position that offers professional challenges utilizing interpersonal skills, excellent time management and problem-solving skills. Detail-oriented team player with strong organizational skills. Ability to handle multiple projects simultaneously with a high degree of accuracy. Hardworking and passionate job seeker with strong organizational skills eager to secure entry-level [Job Title] position. Ready to help team achieve company goals. Organized and dependable candidate successful at managing multiple priorities with a positive attitude. Willingness to take on added responsibilities to meet team goals. Astute Data Enigneer with data-driven and technology-focused approach. Communicates clearly with stakeholders and builds consensus around well-founded models. Talented in writing applications and reformulating models. Astute [Job Title] with data-driven and technology-focused approach. Communicates clearly with stakeholders and builds consensus around well-founded models. Talented in writing applications and reformulating models.

Overview

9
9
years of professional experience
1
1
Certification

Work History

Sr Data Engineer

FORD
04.2022 - Current
  • Designed and implemented data pipelines using Hive, Flume, Sqoop, PIG and map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
  • Used AWS Glue ETL service that consumes raw data from S3 bucket and transforms raw data as per the requirement and write the output to s3 bucket in parquet format for data analytics purpose
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions
  • Familiarity with AWS Cloud Watch for monitoring and managing AWS resources
  • Solved problems with the data pipeline or slowdowns in performance swiftly by organizing and analyzing log data through the combination of Cloud Watch Logs
  • To examine the information used Quick Sight to create business intelligence reports after extensively using Athena to run numerous queries on the processed data from Glue ETL Jobs
  • Stored data in AWS S3 like HDFS and performed EMR programs on data stored in S3
  • Hands on experiences in Hadoop Eco - system components like HDFS, Cloudera, YARN, Hive, HBase, Sqoop, Flume, Kafka, Impala, Airflow and Programming in Spark using Python and Scala
  • Created and implemented data governance guidelines and protocols to guarantee data security, quality, and adherence to industry standards
  • Established and provided integration points and techniques for master data management with big data platforms so that master data records could be shared with the distributed file system of Hadoop
  • Hands-on with AWS data migration between database platforms Local SQL Servers to Amazon RDS and EMR HIVE
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3
  • Involved in writing Java and Node.js API for Amazon Lambda to manage some of the AWS services
  • Worked on AWS Lambda to run servers without managing them and to trigger run code by S3 and SNS
  • Moved data from the traditional databases like MS SQL Server, MySQL, and Oracle into Hadoop by using HDFS
  • Developed Spark SQL scripts using PySpark to perform transformations and actions on Data frames, Data set in spark for faster data Processing
  • Worked on CI/CD solution, using Git, Jenkins, Docker, and Kubernetes to setup and configure big data architecture on AWS cloud platform
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase
  • Working on integrating Kafka Publisher in spark job to capture errors from Spark Application and push into database
  • Implemented monitoring solutions to track API usage, identify performance issues, and proactively address potential issues
  • Use the Tableau reporting tool to create an Athena data source on S3 buckets for business dashboarding and querying
  • Creating Informatica mappings and workflows for data level validation of source system records for ETL
  • Informatica PowerCenter is used for the design and development of data integration workflows, ensuring smooth data flow from various sources to data warehouses
  • Created Informatica workflows to automate data processing operations while maximizing resource usage
  • Data is properly prepared and integrated into PowerBI for effective reporting and analysis
  • Used Tableau Data Visualization tool for reports, integrated tableau with Alteryx for Data & Analytics
  • Used JIRA as the Scrum Tool for Scrum Task board and work on user stories
  • Environment: AWS (S3, EMR, EC2, LAMBDA, GLUE, Athena, Cloud Watch), Hadoop Ecosystem, Hive, Pig, ETL, Python, Java, node.js, PowerBI, MongoDB, API Dev, MDM, Integration, Apache Airflow, Snowflake, CI/CD, Kubernetes, Sqoop, MSSQL, Git, Jenkins.

Sr Data Engineer

The Walt Disney
08.2021 - 03.2022
  • Worked in migration of existing application to the AWS cloud
  • I have experience with EC2, S3, Cloud Formation, Cloud Front, RDS, VPC, Cloud Watch, IAM Roles/Policies, and SNS subscription services
  • Use the management console and the command line interface to provision AWS resources (CLI)
  • The role of the cloud engineer is to design, construct, and maintain several AWS infrastructures to support various financial applications
  • I developed data transition programs from AWS DynamoDB to AWS Redshift (ETL Process) by writing Python functions for specific events based on use cases
  • Added support for Amazon AWS S3 and RDS to host files and the database into Amazon Cloud
  • Modeled and configured AWS EC2 instances using AWS Security Groups
  • To identify an error, if one exists, debug the application, and pay attention to the messages in log files
  • Exceptional proficiency in Talend tools, including data integration, big data, jobs, data mappers, meta data, and Talend components
  • Utilize Lambda, CloudWatch Events, and Schedules to automate operations procedures
  • Aided in the creation of Flask web applications
  • Used Git to manage API version control, making sure that branching strategies and versioning were clear to allow for backward compatibility
  • Experienced in implementing APIs to synchronize data between systems, guaranteeing consistency and real-time updates across platforms
  • I took part in API testing to confirm overall system reliability, error handling, and data accuracy
  • Overcame difficulties with versioning and integrating disparate JSON formats for data
  • Experience in tracking API usage, found performance problems, and took proactive measures to fix possible problems by implementing monitoring solutions
  • Expertise in configuration, logging, and exceptional handling
  • Overseen and evaluated Hadoop log file and worked in analyzing SQL scripts and designed the solution for the process using PySpark
  • Environment: AWS Services, API Integration, Big Data Tools, IBM, ETL, JSON, LAMBDA, Git, Jupyter, PyCharm, Pyspark, Data Flow, SQL, Cloud Storage.

Data Engineer/Analyst

Anthem INC
10.2019 - 07.2021
  • Designed, and built scalable distributed data solutions using AWS and planned migration for existing on-premises Cloudera Hadoop distribution to AWS on business requirement
  • Worked in AWS environment for development and deployment of custom Hadoop applications
  • Troubleshoot and maintain ETL/ELT jobs running using Matillion
  • Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring integrity in a relational environment by working closely with the stakeholders and solution architect
  • Maintained AWS Glue updates and best practices, offering suggestions for data engineering process optimization and continuous improvement
  • AWS EMR to process big data across Hadoop clusters of virtual servers on Amazon Simple Storage Service (S3)
  • I was responsible for creating on-demand tables on S3 files using Lambda functions and AWS Glue using Python and Spark
  • By using Hive, Athena, and Redshift, I created multiple external tables with partitions
  • Worked in AWS environment for development and deployment of custom Hadoop applications
  • Built AWS Data pipelines using various resources in AWS including AWS API Gateway to receive response from AWS lambda and retrieve data from snowflake using lambda function and convert the response into JSON format using Database as Snowflake, DynamoDB, AWS Lambda function and AWS S3
  • I have worked with extracting, converting, and loading (ETL) data from a variety of sources into a format appropriate for analysis in Tableau
  • To assure data correctness, completeness, and timeliness, I create and deploy ETL processes
  • Solid Experience in building interactive reports, dashboards, and integrating, modeling results and have strong data visualization experience in Tableau and PowerBI
  • Experienced technical and business background in EDI across the SDLC, with a particular on ANSI X12 standards, HIPAA security protocols, and healthcare transactions
  • Respected the naming conventions set out by the business for the Talend Jobs, the Flat file structure, and the daily batches used to run the Talend Jobs
  • Design, Develop, Manage, and utilize the Tableau platform to extract meaningful insights from it Drill-down data and prepare reports with utmost accuracy using various visualization and data modeling methods
  • By Utilizing the Terraform tool as an IAC, we build and configure infrastructure in the cloud with efficiency
  • Developed an ample amount of backend modules using Python Flask Web Framework using ORM models
  • Expertise in efficient interaction between various components, web applications' core functionality became available through the design and development of RESTful APIs
  • Created API endpoints for CRUD functions, data retrieval, and authentication in Python utilizing frameworks like Flask and Django
  • Authorization and authentication systems based on tokens were implemented to guarantee API security
  • Worked on projects designed with waterfall and agile methodologies, delivered high-quality deliverables on time
  • Extensively worked on the tuning and optimizing SQL Queries to reduce run
  • Designed and developed JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services
  • Successfully designed and developed Java Multi-Threading based collector parser and distributor process, when the requirement was to collect, parse and distribute the data coming at a speed of thousand messages per seconds
  • Environment: AWS stack, SSIS & SSRS, JAVA API, RESTful APIs, MDM, Snowflake, Cassandra, Matillion, Alteryx, Tableau, PowerBI, Python, SQL, Sqoop.

Data Engineer/Analyst

National Grid
01.2018 - 09.2019
  • Migrated the existing SQL CODE to Data Lake and sent the extracted reports to the consumers
  • Created PySpark Engines processing huge environmental data load within minutes implementing various business logics
  • Worked extensively on Data mapping to understand the source to target mapping rules
  • Developed data pipelines using python, PySpark, Hive, Pig and HBase
  • Migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1N2)
  • To help users in knowledge transfer and troubleshooting, documentation for SSIS packages and ETL procedures should be created
  • Experience to make sure the ETL operations function well
  • This entails adjusting the ETL pipeline to match performance demands, choosing the appropriate SSIS components, and optimizing SQL queries
  • Optimizing Snowflake's performance is one of my primary responsibilities
  • To speed up query execution times, I regularly track query performance, tweak SQL queries, and employ Snowflake's clustering and indexing features
  • Creating Informatica mappings and workflows for data level validation of source system records for ETL
  • Migrating an entire Oracle database to Big Query and using PowerBI for reporting
  • Developed PySpark scripts to encrypt specified set of data by using hashing algorithms concepts
  • Designed and implemented a part of the product of knowledge lens which takes environmental data on a real time basis from all the industries
  • Performed data analysis and data profiling using SQL on various extracts
  • Created reports of analyzed and validated data using Apache Hue and Hive, generated graphs for data analytics
  • Worked on data migration into HDFS and Hive using Sqoop
  • Written multiple batch processes in python and PySpark processing huge amount of time series data which created reports and scheduled these reports to industries
  • Created analytical reports on this real time environmental data using Tableau
  • Generated final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector
  • Responsible for HBase bulk load process, created HFiles using MapReduce and then loaded data to HBase tables using complete bulk load tool
  • Commit and Rollback methods were provided for transactions processing
  • Fetch data to/from HBase using Map Reduce jobs
  • Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies (Jira)
  • Environment: Python, Tableau, Pyspark, SQL, Hadoop, DB’s, Hive, Map Reduce, Sqoop, Data Analytics, Oracle, DB2, MongoDB, ADLS, ADF.

Data Engineering Analyst

Infosys Pvt Ltd
11.2014 - 05.2017
  • Understand how data is related to Big Data and with large data sets
  • Hadoop’s method of distributed processing
  • Yarn and MAP Reduce on Hadoop
  • Hadoop Administration
  • Moving Data into Hadoop
  • Query optimization through SQL tools for quick response time
  • Worked on Database views, tables, and objects
  • Worked on various phases of projects like design, development and testing and deployment phases
  • Used GitHub as a version control tool
  • Developed backend modules using tornado framework, later started to use Flask framework
  • Spark Fundamentals and how it is processor an essential tool set for working with Big Data
  • Accessing Hadoop Data Using Hive
  • Experience in evaluating master data problems in front of business units and process specialists to find solutions, and making sure that master data is accurate, compliant, and consistent across business systems
  • Designing and maintaining data systems and databases, this includes fixing coding errors and other data-related problems
  • Mining data from primary and secondary sources, then reorganizing said data in a format that can be easily read by either human or machine
  • Using statistical tools to interpret data sets, paying particular attention to trends and patterns that could be valuable for diagnostic and predictive analytics efforts
  • Demonstrating the significance of their work in the context of local, national, and global trends that impact both their organization and industry
  • Collecting data from various data sources, refining, and preparing data for analytics
  • Define new KPIs and consistently measure in the datasets
  • Test published dashboards and scheduled refresh reports
  • Preparing reports for executive leadership that effectively communicate trends, patterns, and predictions using relevant data
  • Environment: GitHub, Python, SQL, Spark, Cassandra project, Data Analytics, Tableau, Hadoop, Hive, Sqoop Procedures, GitHub, Big Data Tools.

Education

Skills

  • C
  • Java
  • Python
  • Scala
  • SQL
  • JavaScript
  • MVP
  • Structs
  • Spring
  • Hibernate
  • Rest API
  • API Doc
  • Linux
  • Unix
  • Windows
  • Eclipse
  • IntelliJ
  • PyCharm
  • AWS
  • EMR
  • Glue
  • Cloud watch
  • AWS S3
  • SNS
  • Azure Snowflake
  • Kinesis
  • Shell Scripting
  • HiveQL
  • PL/SQL
  • Glue
  • Cloud watch
  • AWS S3
  • SNS
  • Azure Snowflake
  • Kinesis
  • Shell Scripting
  • HiveQL
  • PL/SQL
  • Glue
  • Cloud watch
  • AWS S3
  • SNS
  • Azure Snowflake
  • Kinesis
  • Shell Scripting
  • HiveQL
  • PL/SQL
  • Glue
  • Cloud watch
  • AWS S3
  • SNS
  • Azure Snowflake
  • Kinesis
  • Shell Scripting
  • HiveQL
  • PL/SQL

Certification

  • AWS Solution Architect: [https://www.credly.com/badges/e96a9664-2ad6-4465-b952-e5d7860b9be4](https://www.credly.com/badges/e96a9664-2ad6-4465-b952-e5d7860b9be4)
  • GCP Collaboration Engineer

Timeline

Sr Data Engineer

FORD
04.2022 - Current

Sr Data Engineer

The Walt Disney
08.2021 - 03.2022

Data Engineer/Analyst

Anthem INC
10.2019 - 07.2021

Data Engineer/Analyst

National Grid
01.2018 - 09.2019

Data Engineering Analyst

Infosys Pvt Ltd
11.2014 - 05.2017

Teja Anupoju