ARAVIND REDDY

Charlotte,NC

Summary

Experienced Data Engineer: Transforming Data with 5+ Years of Expertise Dedicated and results-driven data engineer with over 5 years of hands-on experience in data analysis and ETL processes. Proficient in an extensive array of technologies and tools, including Python, SQL, Cloud services, and Spark API. Adept at designing, developing, and optimizing data pipelines for both real-time and batch processing, with a strong background in Hadoop, Spark, and cloud platforms like Azure and AWS.

Overview

years of professional experience

Work History

Data Engineer

Fifth Third Bank

11.2022 - Current

Led end-to-end data pipeline development in AWS, coordinating team tasks for data ingestion and transformation
Proficient in Informatica, Impala-based ETL, and automated data migration using AWS Lambda and Step Functions
Developed real-time data processing apps in Scala and Python, integrating Kafka and Spark Streaming
Designed ETL pipelines for data warehousing, automated workflows, and ensured data accuracy.

Data Engineer

Merck Pharma

10.2021 - 10.2022

Developed pipelines to extract data from various sources using Sqoop and Linux shell scripts, loading it into HDFS Data Lake
Utilized Spark for efficient data processing
Employed Python and PySpark for large dataset analysis, enhancing data insights
Utilized Pandas for statistical analysis and designed data models for Redshift
Designed and implemented ETL pipelines for data ingestion from multiple sources using Spark and Hive
Utilized Informatica for data integration across systems
Managed AWS resources implemented batch processing using Airflow for Snowflake, and participated in application migration to AWS.

Data Engineer

Tiger Analytics

06.2018 - 08.2021

Developed pipelines to extract data from various sources using Sqoop and Linux shell scripts, loading it into an HDFS Data Lake. Utilized Spark for efficient data processing.
Employed Python and PySpark for large dataset analysis, enhancing data insights. Utilized Pandas for statistical analysis and designed data models for Redshift.
Designed and implemented ETL pipelines for data ingestion using Spark and Hive from multiple sources. Utilized Informatica for data integration across systems.
Managed AWS resources implemented batch processing using Airflow for Snowflake, and participated in application migration to AWS

Education

Master of Science - computer science

Pace University

New York, NY

2023

Skills

Big Data: Hadoop, HDFS, PIG, Hive, HBase, Oozie, Kafka, Yarn, Apache Spark
Databases: Oracle, MySQL, SQL Server, MongoDB
Programming: Scala, Python, SQL, PL/SQL, HiveQL, Unix, Shell Scripting
Cloud Platforms: Azure, AWS

Automation/Orchestration: Jenkins, Apache Airflow
Data Services/Tools: Azure Data Factory, Data Dicks, Azure Synapse, Azure Data Lake, Snowflake, Tableau
Additional Skills: PySpark, Scala, Data Warehousing, Data Preparation, ETL, Agile, MS-SQL

Timeline

Data Engineer

Fifth Third Bank

11.2022 - Current

Data Engineer

Merck Pharma

10.2021 - 10.2022

Data Engineer

Tiger Analytics

06.2018 - 08.2021

Master of Science - computer science

Pace University