Summary

Overview

Work History

Education

Skills

Certification

Timeline

Teja Anupoju

Apex,NC

Summary

Senior Data Engineer and Analyst with experience in Automobile, Media, Health Care, Software and Engineering domains. Expertise in building data pipelines and dashboards that furnish insights used to advance opportunity identification and process re-engineering along with a story to tell. Specialized in Data Analytics, Data Engineering and Data Visualization. Analytically minded professional with a proven ability to solve complex quantitative business challenges. Exceptional verbal and written communication skills, with a track record of effectively conveying insights to both business and technical teams. Adept at utilizing data to drive strategic decision-making and delivering impactful presentations.

Over 8+ years of IT experience in Big Data Engineering, Analysis, Design, Implementation, Development, Maintenance, and test large scale applications using SQL, Hadoop, Python, Java, and other Big Data technologies. Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Cloudera and Hortonworks distribution of Hadoop. Experience working with Elasticsearch, Log stash and kibana. Experience working with AWS Stack (S3, EMR, EC2, SQS, Glue and Athena, RedShift). Design AWS architecture, cloud migration, AWS EMR, DynamoDB, Redshift, and lambda function event processing. Used AWS Glue and I designed jobs to convert nested JSON objects to parquet files and load them into S3. Experienced in developing PySpark programs and creating data frames and working on transformations. Created Hive, SQL and HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL, and a variety of portfolios. Worked on developing ETL processes to load data from multiple data sources to HDFS using Kafka and Sqoop. Experience in using Apache Kafka for collecting, aggregating, and moving large amounts of data. Design and implementation of Oracle database systems to meet business requirements and performance goals. Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS. Experience in data analysis using Hive, Impala. Experience in developing large scale applications using Hadoop and Other Big Data tools. In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node. Practical experience in all phases of the data management lifecycle, including the analysis of initial business requirements, conversion to data requirements, as well as the creation and execution of creative business procedures and data solutions for the manufacturing, healthcare, energy, and e-commerce industries. Proven expertise in data governance, master data management, advanced analytics, healthcare data domains, data distribution, data warehousing, and relational data modeling. Skilled at working with cross-functional teams to comprehend business requirements and transform them into scalable pySpark solutions. Spearheaded end-to-end ETL processes using Talend BigData ETL tool, ensuring seamless data extraction, transformation, and loading. Major expertise with SQL, PL/SQL, and TALEND development and maintenance in a corporate-wide ETL solution on UNIX and Windows platforms. Experienced in data manipulation using python for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations. I managed the development and delivery of Docker images using CI/CD pipelines, and as part of the deployment procedure, terraform managed the required infrastructure upgrades. Large-scale environments were created using Terraform as infrastructure as a code (IAC). Experience with database SQL and NoSQL (HBase and Cassandra.) Perform structural modifications using Hive and analyze data using visualization/reporting tools (Tableau). Experience in using Hadoop ecosystem and processing data using Tableau, Quick sight and Power BI. Django and Flask, two top Python web frameworks, are used by an experienced data engineer to power data-driven solutions for the automobile sector. Took part in cross-functional meetings to collect requirements and offer technical knowledge on issues pertaining to APIs. Experienced with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN. Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing, and analysis of data. Experienced in using various Hadoop infrastructures such as Hive and Sqoop. To seek and maintain full-time position that offers professional challenges utilizing interpersonal skills, excellent time management and problem-solving skills. Detail-oriented team player with strong organizational skills. Ability to handle multiple projects simultaneously with a high degree of accuracy. Hardworking and passionate job seeker with strong organizational skills eager to secure entry-level [Job Title] position. Ready to help team achieve company goals. Organized and dependable candidate successful at managing multiple priorities with a positive attitude. Willingness to take on added responsibilities to meet team goals. Astute Data Enigneer with data-driven and technology-focused approach. Communicates clearly with stakeholders and builds consensus around well-founded models. Talented in writing applications and reformulating models. Astute [Job Title] with data-driven and technology-focused approach. Communicates clearly with stakeholders and builds consensus around well-founded models. Talented in writing applications and reformulating models.

Overview

years of professional experience

Certification

Work History

Sr Data Engineer

FORD

04.2022 - Current

Designed and implemented data pipelines using Hive, Flume, Sqoop, PIG and map reduce to ingest customer behavioral data and financial histories into HDFS for analysis
Used AWS Glue ETL service that consumes raw data from S3 bucket and transforms raw data as per the requirement and write the output to s3 bucket in parquet format for data analytics purpose
Worked on ETL Migration services by developing and deploying AWS Lambda functions
Familiarity with AWS Cloud Watch for monitoring and managing AWS resources
Solved problems with the data pipeline or slowdowns in performance swiftly by organizing and analyzing log data through the combination of Cloud Watch Logs
To examine the information used Quick Sight to create business intelligence reports after extensively using Athena to run numerous queries on the processed data from Glue ETL Jobs
Stored data in AWS S3 like HDFS and performed EMR programs on data stored in S3
Hands on experiences in Hadoop Eco - system components like HDFS, Cloudera, YARN, Hive, HBase, Sqoop, Flume, Kafka, Impala, Airflow and Programming in Spark using Python and Scala
Created and implemented data governance guidelines and protocols to guarantee data security, quality, and adherence to industry standards
Established and provided integration points and techniques for master data management with big data platforms so that master data records could be shared with the distributed file system of Hadoop
Hands-on with AWS data migration between database platforms Local SQL Servers to Amazon RDS and EMR HIVE
Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3
Involved in writing Java and Node.js API for Amazon Lambda to manage some of the AWS services
Worked on AWS Lambda to run servers without managing them and to trigger run code by S3 and SNS
Moved data from the traditional databases like MS SQL Server, MySQL, and Oracle into Hadoop by using HDFS
Developed Spark SQL scripts using PySpark to perform transformations and actions on Data frames, Data set in spark for faster data Processing
Worked on CI/CD solution, using Git, Jenkins, Docker, and Kubernetes to setup and configure big data architecture on AWS cloud platform
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase
Working on integrating Kafka Publisher in spark job to capture errors from Spark Application and push into database
Implemented monitoring solutions to track API usage, identify performance issues, and proactively address potential issues
Use the Tableau reporting tool to create an Athena data source on S3 buckets for business dashboarding and querying
Creating Informatica mappings and workflows for data level validation of source system records for ETL
Informatica PowerCenter is used for the design and development of data integration workflows, ensuring smooth data flow from various sources to data warehouses
Created Informatica workflows to automate data processing operations while maximizing resource usage
Data is properly prepared and integrated into PowerBI for effective reporting and analysis
Used Tableau Data Visualization tool for reports, integrated tableau with Alteryx for Data & Analytics
Used JIRA as the Scrum Tool for Scrum Task board and work on user stories
Environment: AWS (S3, EMR, EC2, LAMBDA, GLUE, Athena, Cloud Watch), Hadoop Ecosystem, Hive, Pig, ETL, Python, Java, node.js, PowerBI, MongoDB, API Dev, MDM, Integration, Apache Airflow, Snowflake, CI/CD, Kubernetes, Sqoop, MSSQL, Git, Jenkins.

Sr Data Engineer

The Walt Disney

08.2021 - 03.2022

Worked in migration of existing application to the AWS cloud
I have experience with EC2, S3, Cloud Formation, Cloud Front, RDS, VPC, Cloud Watch, IAM Roles/Policies, and SNS subscription services
Use the management console and the command line interface to provision AWS resources (CLI)
The role of the cloud engineer is to design, construct, and maintain several AWS infrastructures to support various financial applications
I developed data transition programs from AWS DynamoDB to AWS Redshift (ETL Process) by writing Python functions for specific events based on use cases
Added support for Amazon AWS S3 and RDS to host files and the database into Amazon Cloud
Modeled and configured AWS EC2 instances using AWS Security Groups
To identify an error, if one exists, debug the application, and pay attention to the messages in log files
Exceptional proficiency in Talend tools, including data integration, big data, jobs, data mappers, meta data, and Talend components
Utilize Lambda, CloudWatch Events, and Schedules to automate operations procedures
Aided in the creation of Flask web applications
Used Git to manage API version control, making sure that branching strategies and versioning were clear to allow for backward compatibility
Experienced in implementing APIs to synchronize data between systems, guaranteeing consistency and real-time updates across platforms
I took part in API testing to confirm overall system reliability, error handling, and data accuracy
Overcame difficulties with versioning and integrating disparate JSON formats for data
Experience in tracking API usage, found performance problems, and took proactive measures to fix possible problems by implementing monitoring solutions
Expertise in configuration, logging, and exceptional handling
Overseen and evaluated Hadoop log file and worked in analyzing SQL scripts and designed the solution for the process using PySpark
Environment: AWS Services, API Integration, Big Data Tools, IBM, ETL, JSON, LAMBDA, Git, Jupyter, PyCharm, Pyspark, Data Flow, SQL, Cloud Storage.

Data Engineer/Analyst

Anthem INC

10.2019 - 07.2021

Designed, and built scalable distributed data solutions using AWS and planned migration for existing on-premises Cloudera Hadoop distribution to AWS on business requirement
Worked in AWS environment for development and deployment of custom Hadoop applications
Troubleshoot and maintain ETL/ELT jobs running using Matillion
Responsible for maintaining quality reference data in source by performing operations such as cleaning, transformation and ensuring integrity in a relational environment by working closely with the stakeholders and solution architect
Maintained AWS Glue updates and best practices, offering suggestions for data engineering process optimization and continuous improvement
AWS EMR to process big data across Hadoop clusters of virtual servers on Amazon Simple Storage Service (S3)
I was responsible for creating on-demand tables on S3 files using Lambda functions and AWS Glue using Python and Spark
By using Hive, Athena, and Redshift, I created multiple external tables with partitions
Worked in AWS environment for development and deployment of custom Hadoop applications
Built AWS Data pipelines using various resources in AWS including AWS API Gateway to receive response from AWS lambda and retrieve data from snowflake using lambda function and convert the response into JSON format using Database as Snowflake, DynamoDB, AWS Lambda function and AWS S3
I have worked with extracting, converting, and loading (ETL) data from a variety of sources into a format appropriate for analysis in Tableau
To assure data correctness, completeness, and timeliness, I create and deploy ETL processes
Solid Experience in building interactive reports, dashboards, and integrating, modeling results and have strong data visualization experience in Tableau and PowerBI
Experienced technical and business background in EDI across the SDLC, with a particular on ANSI X12 standards, HIPAA security protocols, and healthcare transactions
Respected the naming conventions set out by the business for the Talend Jobs, the Flat file structure, and the daily batches used to run the Talend Jobs
Design, Develop, Manage, and utilize the Tableau platform to extract meaningful insights from it Drill-down data and prepare reports with utmost accuracy using various visualization and data modeling methods
By Utilizing the Terraform tool as an IAC, we build and configure infrastructure in the cloud with efficiency
Developed an ample amount of backend modules using Python Flask Web Framework using ORM models
Expertise in efficient interaction between various components, web applications' core functionality became available through the design and development of RESTful APIs
Created API endpoints for CRUD functions, data retrieval, and authentication in Python utilizing frameworks like Flask and Django
Authorization and authentication systems based on tokens were implemented to guarantee API security
Worked on projects designed with waterfall and agile methodologies, delivered high-quality deliverables on time
Extensively worked on the tuning and optimizing SQL Queries to reduce run
Designed and developed JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services
Successfully designed and developed Java Multi-Threading based collector parser and distributor process, when the requirement was to collect, parse and distribute the data coming at a speed of thousand messages per seconds
Environment: AWS stack, SSIS & SSRS, JAVA API, RESTful APIs, MDM, Snowflake, Cassandra, Matillion, Alteryx, Tableau, PowerBI, Python, SQL, Sqoop.

Data Engineer/Analyst

National Grid

01.2018 - 09.2019

Migrated the existing SQL CODE to Data Lake and sent the extracted reports to the consumers
Created PySpark Engines processing huge environmental data load within minutes implementing various business logics
Worked extensively on Data mapping to understand the source to target mapping rules
Developed data pipelines using python, PySpark, Hive, Pig and HBase
Migration of on-premises data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1N2)
To help users in knowledge transfer and troubleshooting, documentation for SSIS packages and ETL procedures should be created
Experience to make sure the ETL operations function well
This entails adjusting the ETL pipeline to match performance demands, choosing the appropriate SSIS components, and optimizing SQL queries
Optimizing Snowflake's performance is one of my primary responsibilities
To speed up query execution times, I regularly track query performance, tweak SQL queries, and employ Snowflake's clustering and indexing features
Creating Informatica mappings and workflows for data level validation of source system records for ETL
Migrating an entire Oracle database to Big Query and using PowerBI for reporting
Developed PySpark scripts to encrypt specified set of data by using hashing algorithms concepts
Designed and implemented a part of the product of knowledge lens which takes environmental data on a real time basis from all the industries
Performed data analysis and data profiling using SQL on various extracts
Created reports of analyzed and validated data using Apache Hue and Hive, generated graphs for data analytics
Worked on data migration into HDFS and Hive using Sqoop
Written multiple batch processes in python and PySpark processing huge amount of time series data which created reports and scheduled these reports to industries
Created analytical reports on this real time environmental data using Tableau
Generated final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector
Responsible for HBase bulk load process, created HFiles using MapReduce and then loaded data to HBase tables using complete bulk load tool
Commit and Rollback methods were provided for transactions processing
Fetch data to/from HBase using Map Reduce jobs
Worked in complete Software Development Life Cycle (analysis, design, development, testing, implementation and support) using Agile Methodologies (Jira)
Environment: Python, Tableau, Pyspark, SQL, Hadoop, DB’s, Hive, Map Reduce, Sqoop, Data Analytics, Oracle, DB2, MongoDB, ADLS, ADF.

Data Engineering Analyst

Infosys Pvt Ltd

11.2014 - 05.2017

Understand how data is related to Big Data and with large data sets
Hadoop’s method of distributed processing
Yarn and MAP Reduce on Hadoop
Hadoop Administration
Moving Data into Hadoop
Query optimization through SQL tools for quick response time
Worked on Database views, tables, and objects
Worked on various phases of projects like design, development and testing and deployment phases
Used GitHub as a version control tool
Developed backend modules using tornado framework, later started to use Flask framework
Spark Fundamentals and how it is processor an essential tool set for working with Big Data
Accessing Hadoop Data Using Hive
Experience in evaluating master data problems in front of business units and process specialists to find solutions, and making sure that master data is accurate, compliant, and consistent across business systems
Designing and maintaining data systems and databases, this includes fixing coding errors and other data-related problems
Mining data from primary and secondary sources, then reorganizing said data in a format that can be easily read by either human or machine
Using statistical tools to interpret data sets, paying particular attention to trends and patterns that could be valuable for diagnostic and predictive analytics efforts
Demonstrating the significance of their work in the context of local, national, and global trends that impact both their organization and industry
Collecting data from various data sources, refining, and preparing data for analytics
Define new KPIs and consistently measure in the datasets
Test published dashboards and scheduled refresh reports
Preparing reports for executive leadership that effectively communicate trends, patterns, and predictions using relevant data
Environment: GitHub, Python, SQL, Spark, Cassandra project, Data Analytics, Tableau, Hadoop, Hive, Sqoop Procedures, GitHub, Big Data Tools.

Education

Skills

C
Java
Python
Scala
SQL
JavaScript
MVP
Structs
Spring
Hibernate
Rest API
API Doc
Linux
Unix
Windows
Eclipse
IntelliJ
PyCharm
AWS
EMR
Glue
Cloud watch
AWS S3
SNS
Azure Snowflake
Kinesis
Shell Scripting
HiveQL

PL/SQL
Glue
Cloud watch
AWS S3
SNS
Azure Snowflake
Kinesis
Shell Scripting
HiveQL
PL/SQL
Glue
Cloud watch
AWS S3
SNS
Azure Snowflake
Kinesis
Shell Scripting
HiveQL
PL/SQL
Glue
Cloud watch
AWS S3
SNS
Azure Snowflake
Kinesis
Shell Scripting
HiveQL
PL/SQL

Certification

AWS Solution Architect: [https://www.credly.com/badges/e96a9664-2ad6-4465-b952-e5d7860b9be4](https://www.credly.com/badges/e96a9664-2ad6-4465-b952-e5d7860b9be4)
GCP Collaboration Engineer

Timeline

Sr Data Engineer

FORD

04.2022 - Current

Sr Data Engineer

The Walt Disney

08.2021 - 03.2022

Data Engineer/Analyst

Anthem INC

10.2019 - 07.2021

Data Engineer/Analyst

National Grid

01.2018 - 09.2019

Data Engineering Analyst

Infosys Pvt Ltd

11.2014 - 05.2017

Teja Anupoju

Summary

Overview

Work History

Sr Data Engineer

Sr Data Engineer

Data Engineer/Analyst

Data Engineer/Analyst

Data Engineering Analyst

Education

Skills

Certification

Timeline

Sr Data Engineer

Sr Data Engineer

Data Engineer/Analyst

Data Engineer/Analyst

Data Engineering Analyst

Similar Profiles

Felipe RodriguesFelipe Rodrigues

ANDRE WILLIAMSANDRE WILLIAMS

Aaron ApolinarAaron Apolinar

Robert QuinnRobert Quinn

Elvis MebugeElvis Mebuge