Summary

Overview

Work History

Education

Skills

Timeline

Anurag Kethireddy

Cummings,GA

Summary

Senior Data Engineer with 8 years of experience in designing and developing scalable data solutions in cloud environments. Proficient in building and optimizing data pipelines, ETL frameworks, and data warehousing using a wide array of tools, including AWS, Azure, and Hadoop ecosystems. Adept at handling large datasets and delivering insights through real-time data processing and advanced analytics.

Key strengths include:

Cloud Expertise: Deep experience with AWS (Lambda, Glue, Kinesis, EMR) and Azure (Data Factory, Synapse, SQL Azure), utilizing over 25 AWS services to build and maintain end-to-end data pipelines, improving data ingestion efficiency by up to 40%.
Big Data & Analytics: Skilled in leveraging Hadoop, Spark, Kafka, and Flink to process and analyze large-scale data, resulting in a 30% reduction in ETL execution times and improved query performance.
Data Modeling & Warehousing: Expert in Dimensional and Relational Data Modeling, with extensive experience in data warehousing solutions like RedShift, Cassandra, and DynamoDB.
Programming: Proficient in Python (Pandas, NumPy), PySpark, Scala, and SQL, developing robust data transformations and automating data workflows to reduce manual intervention by 40%.
ETL & Stream Processing: Proven success in creating efficient ETL processes, batch processing, and real-time message ingestion, with improvements of 25% in data processing speed.
Reporting & Dashboards: Developed and optimized reports using Tableau and Power BI, enabling faster decision-making and improving data-driven insights by 20%.
DevOps & CI/CD: Experience with version control tools like Git, Bitbucket, and containerization technologies like Docker and Kubernetes, ensuring seamless deployments and reducing cloud resource management costs by 15%.

Strong collaborator with a demonstrated ability to work closely with data scientists, analysts, and business stakeholders to deliver high-impact data solutions, improving overall system efficiency and reducing operational costs.

Overview

years of professional experience

Work History

DATA ENGINEER II

AMAZON INC

06.2022 - 05.2023

Designed, implemented, and optimized scalable data pipelines and architectures using Hadoop, Spark, and EMR, processing over 5 TB of data per day, leading to a 30% improvement in data processing speed and reducing operational costs by 15%
Developed and implemented a comprehensive end-to-end data pipeline, utilizing AWS S3 and Apache Spark, which improved data ingestion efficiency by 40% and reduced manual intervention in pipeline management by 25%
Designed and developed tools for functional integration tests, enhancing test coverage by 20% and reducing critical production issues by 15%, leading to improved system reliability
Created Python AWS Lambda functions for EMR clusters, which processed large datasets (up to 2 TB) with a 25% increase in data processing speed, enabling near real-time analytics for critical business decisions
Built and maintained scalable data processing and ingestion pipelines, ensuring 99.9% uptime and reducing data ingestion latency by 20%- , improving overall system efficiency
Developed and optimized distributed data processing workflows using Apache Spark and Apache Flink, resulting in a 25% improvement in query performance and a 30% reduction in ETL execution time
Deployed and managed Spark jobs using Airflow in an AWS environment, optimizing resource usage by 15% and reducing job execution time by 20%, leading to cost savings in cloud resource management
Integrated REST APIs to enhance connectivity between databases and the data access layers, reducing data retrieval times by 30% and ensuring seamless integration across multiple platforms
Collaborated closely with data analysts, data scientists, and stakeholders to deliver robust data solutions that reduced report generation time by 20% and improved the accuracy of analytics by 10%
Designed and implemented data frameworks in RedShift to automate data ingestion and transformation processes, decreasing manual intervention by 40% and reducing ETL error rates by 15%

SOFTWARE ENGINEER

LinkedIn Corporation

12.2020 - 05.2022

Developed and optimized Spark applications using Scala, resulting in a 25% reduction in processing time and increased scalability for handling 2 TB/day data
Created scalable data pipelines using Azure Data Factory and Databricks, integrating 5+ data sources and reducing data processing time by 30%
Managed Azure Databricks data processing, improving data pipeline efficiency by 20% and reducing storage costs by 15% through optimized resource allocation
Deployed IAM roles and Azure Data Factory using Terraform, automating role creation and improving deployment time by 35%
Developed Spark streaming applications using Scala with optimized configurations, reducing execution time by 25% and improving system reliability
Developed Spark code using Scala and Spark-SQL, optimizing algorithms for faster data processing, resulting in 20% faster query execution and reduced resource use
Deployed Spark jobs via HDInsight in Azure, improving job execution efficiency by 15% and reducing cloud infrastructure costs
Designed and deployed POCs using Spark on YARN, demonstrating 30% performance improvement compared to traditional SQL processing for large datasets
Implemented text analytics using Spark's in-memory capabilities, reducing data processing times by 20% and enhancing analysis speed for text-heavy datasets
Led data collection and cleaning efforts, improving data accuracy by 15%, and developed predictive models that increased forecasting accuracy by 10%
Created Spark applications using DataFrames and Spark SQL APIs, improving query performance by 25% and enabling faster decision-making for business teams

Sr Hadoop Developer

Citi Groups

11.2019 - 09.2020

Managed Hadoop ecosystems (Hive, HBase, Oozie, Zookeeper, Spark Streaming), improving data processing speeds by 25% and streamlining data storage processes
Implemented Spark Streaming to analyze 10M+ user events/day, improving visitor behavior analysis, leading to a 20% increase in user engagement insights
Developed Spark Streaming and MapReduce jobs for a large-scale data lake, improving data storage efficiency by 30% and reducing data retrieval time by 15%
Developed Spark Streaming jobs with RDDs and SparkSQL, optimizing processing performance by 20% and enabling real-time analytics on streaming data
Managed Oozie jobs for capacity planning, improving storage utilization by 10% and reducing processing delays in critical data workflows
Developed and optimized REST APIs using Python (Pandas, Django), reducing data retrieval time by 20% and improving data availability across applications
Implemented partitioning and bucketing in Hive, improving query performance by 30% and reducing storage costs by 15%
Developed external Hive tables and optimized HiveQL queries, improving data analysis speed by 25% and supporting business-critical reporting functions
Used Sqoop to transfer 500 GB+ of data daily between HDFS and relational databases, improving data integration processes and reducing latency by 20%
Used Apache Spark on YARN for large-scale data processing, improving performance by 30% and reducing the time for large batch processes by 20%

Sr Hadoop Developer

Visa Inc

06.2018 - 11.2019

Applied in-depth knowledge of Hadoop architecture (HDFS, MapReduce), leading to a improvement in data processing efficiency and reduced cluster overhead
Managed Big Data ecosystems (Spark, Hive, Sqoop, Oozie), improving data processing speeds by 25% and ensuring seamless integration across platforms
Collaborated with business users to finalize technical requirements, resulting in more accurate project deliverables and reducing requirement clarification time by 15%
Converted ETL processes into optimized MapReduce jobs, reducing data processing times by 30% and increasing system efficiency for wholesale, market risk, and securitization
Extracted and transferred 1TB+ of data between Exadata and HDFS using Sqoop, improving data transfer efficiency by 20% and enabling faster reporting cycles
Optimized MapReduce jobs with compression mechanisms, reducing storage requirements by 30% and improving job runtime performance by 25%
Optimized MapReduce algorithms (Combiners, Partitioners, Distributed Cache), improving data processing speed by 20% and reducing job execution times by 25%
Reconciled data from MapReduce and ETL processes using Spark, reducing data discrepancies by 15% and improving reporting accuracy for financial stakeholders
Tuned Hive queries to improve query performance by 30%, reducing processing time for large datasets and enabling faster decision-making
Converted Hive tables to Avro and ORC formats, reducing storage usage by 40% and freeing up cluster space for additional workloads

Hadoop Developer

SpindleTop Technologies

12.2014 - 11.2015

Used Sqoop to load structured data from relational databases into HDFS
Loaded transactional data from Teradata using Sqoop and created Hive Tables
Worked on automation of delta feeds from Teradata using Sqoop and from FTP Servers to Hive using Flume
Performed transformations like de-normalizing, cleansing of data sets, date transformations, parsing some complex columns
Worked with different compression codecs like GZIP, SNAPPY and BZIP2 in MapReduce, Pig and Hive for better performance
Handled Avro, JSON and Apache log data in Hive using custom Hive SerDes
Worked on batch processing and scheduled workflows using Oozie
Worked on fine tuning hive scripts to improve join performances, reduce skewness in aggregate operations etc
Used Hive-QL to create partitioned RC, Parquet tables, used compression techniques to optimize data process and faster retrieval
Implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access

Software Engineer

SpindleTop Technologies

06.2014 - 11.2014

Involved in analysis estimation and development of the project
Used Struts MVC framework to enable the interaction between JSP/View layers
Involved in the client meeting and call individually and with them
Status reporting (weekly) on project progress
Preparing release notes and taking care of all deployment activities
Involved in XML parsing coding
Responsible for build process to deploy the applications
Involved in creating patches, Code Base and Sanity Code checking's for code Release to client
Completion of development on time within scheduled plan
Doing QA for testing of requirement done
Did testing (unit testing, System Integration testing and regression testing)
Code review and given the Technical support to the Team
Involved in writing test case for unit testing, functional testing, integration testing

Education

Master of Science - Computer Informa

New England College

Henniker, NH

04.2018

Bachelor of Science - Electronics And Communications Engineering

JNTUH

Hyderabad, India

04.2014

Skills

Spark Development
Data Warehousing
Hadoop Ecosystem
Data Pipeline Design
Data Modeling
Big Data Processing

ETL development
Python Programming
Real-time Analytics
Data integration
Database Design
Risk Analysis

Timeline

DATA ENGINEER II

AMAZON INC

06.2022 - 05.2023

SOFTWARE ENGINEER

LinkedIn Corporation

12.2020 - 05.2022

Sr Hadoop Developer

Citi Groups

11.2019 - 09.2020

Sr Hadoop Developer

Visa Inc

06.2018 - 11.2019

Hadoop Developer

SpindleTop Technologies

12.2014 - 11.2015

Software Engineer

SpindleTop Technologies

06.2014 - 11.2014

Master of Science - Computer Informa

New England College

Bachelor of Science - Electronics And Communications Engineering

JNTUH

Anurag Kethireddy

Summary

Overview

Work History

DATA ENGINEER II

SOFTWARE ENGINEER

Sr Hadoop Developer

Sr Hadoop Developer

Hadoop Developer

Software Engineer

Education

Master of Science - Computer Informa

Bachelor of Science - Electronics And Communications Engineering

Skills

Timeline

DATA ENGINEER II

SOFTWARE ENGINEER

Sr Hadoop Developer

Sr Hadoop Developer

Hadoop Developer

Software Engineer

Master of Science - Computer Informa

Bachelor of Science - Electronics And Communications Engineering

Similar Profiles

Crystal TedrowCrystal Tedrow

Crystal BurksCrystal Burks

Daririn Santos ChinchillaDaririn Santos Chinchilla

Archana SridharArchana Sridhar

Robbie Sue DimaioRobbie Sue Dimaio