Proven Data Engineer with 2+ years of experience in ETL data processing at Honeywell. Expert in Apache Airflow, AWS, and Python, adept at developing scalable ETL workflows and real-time data pipelines. Skilled in collaborating with stakeholders to deliver insightful BI solutions, demonstrating exceptional problem-solving and analytical capabilities.
Overview
4
4
years of professional experience
1
1
Certification
Work History
Data Engineer
State of Illinois- Department of Correction
Alpharetta, USA
02.2023 - Current
Designed and optimized scalable ETL workflows using Apache Airflow, AWS Glue, and Spark, ensuring efficient data ingestion, transformation, and loading of structured and semi-structured data.
Developed real-time and batch data pipelines using Apache Spark on AWS EMR, enabling large-scale analytics and reducing processing time by 40%.
Built automated data pipeline monitoring systems using Python and SQL, implementing alert-based anomaly detection to improve data integrity and uptime.
Developed API-based data ingestion frameworks using FastAPI, AWS API Gateway, and Lambda, streamlining data flow from external sources.
Optimized query performance in AWS Redshift and Snowflake using clustering, materialized views, and partitioning strategies, improving execution speeds by 35%.
Implemented automated data validation frameworks using Python and SQL, reducing data inconsistencies by 30%.
Partnered with business stakeholders to build BI solutions, providing data insights through Looker, Tableau, and Power BI.
Leveraged Hadoop, Hive, and Spark for big data analytics, improving data processing efficiency by 50%.
Integrated AWS services (S3, Redshift Spectrum, Glue) to facilitate cost-effective data lake solutions.
Implemented role-based access control (RBAC) and data encryption to ensure secure and compliant data access across teams.
Database Administrator
TTEC Technologies
Hyderabad, India
10.2020 - 07.2021
Assisted in building and optimizing ETL pipelines using Apache Airflow and AWS Glue, improving pipeline efficiency by 30%.
Developed and maintained SQL-based data processing workflows, optimizing query execution and reducing runtime by 25%.
Designed data validation scripts in Python to automate data integrity checks, ensuring high-quality analytical reporting.
Helped ingest, clean, and transform raw data into structured formats using AWS S3, Redshift, and Snowflake, supporting BI and analytics teams.
Worked with Apache Spark and Hadoop to process large datasets, improving reporting performance and scalability.
Created basic BI dashboards in Tableau and Power BI, enabling teams to access key business insights efficiently.
Education
Master of Science - Computer and Information Sciences
The University of Texas At Arlington
Arlington, Texas
12-2022
Bachelor of Technology - Computer Science and Technology
Jawaharlal Nehru Technological University
Hyderabad, India
09.2020
Skills
Programming: SQL, Python, Java, Scala
Database: PostgreSQL, AWS Redshift, Snowflake, SQL Server
Big Data & Cloud: AWS (S3, Redshift, Glue, Lambda, EMR, Athena), Hadoop, Apache Spark, Databricks
ETL & Data Engineering: Apache Airflow, AWS Glue, DBT, Data Pipelines, Apache Kafka
Data Quality & Validation: Anomaly Detection, Data Profiling, Data Validation, Automated Data Monitoring
Business Intelligence & Reporting: Looker, Tableau, Power BI, SQL-based Reporting