
Detail-oriented Data Engineer with 5 years of experience in designing and optimizing scalable data pipelines and platforms. Specializes in ETL development, cloud data engineering, and data warehousing. Delivers high-performance data solutions across finance, retail, and healthcare sectors. Proficient in Python, SQL, Spark, AWS, and Azure, with a focus on enhancing data quality and supporting analytics initiatives.
- Developed data ingestion frameworks to integrate structured and semi-structured data from multiple sources- Designed and implemented scalable ETL pipelines using AWS Glue and PySpark to process 500GB+ healthcare data daily
- Optimized Spark jobs to enhance processing performance by 40%, resulting in faster data availability for analytics
- Built and maintained data warehouse solutions on Amazon Redshift to enable efficient data retrieval and reporting
- Implemented data quality validation reducing data errors by 35%
- Automated workflows using Apache Airflow reducing manual effort by 50%
- Collaborated with analytics teams to deliver reporting and dashboard requirements, supporting data-driven decision making
- Developed scalable data pipelines using Azure Data Factory and Databricks
- Processed large datasets exceeding 300GB daily using Spark
- Designed and implemented data warehouse using Azure Synapse
- Improved data load performance by 30% through query optimization
- Built incremental data processing pipelines
- Implemented monitoring and alerting mechanics
- Worked closely with BI teams to deliver reporting solutions
- Improved ETL reliability and reduced failures by 25%
- Developed ETL pipelines using Python and SQL
- Integrated multiple financial data sources into centralized warehouse
- Designed database schemas data model
- Performed performance tuning on SQL queries and
- Supported data migration and transformation activities
Big Data: Apache Spark, Hadoop, Hive, Kafka
Cloud Platforms: AWS (S3, Glue, Redshift, Lambda, EMR), Azure (Data Factory, Synapse, Databricks)
ETL Tools: AWS Glue, Azure Data Factory, Informatica
Data Warehousing: Snowflake, Amazon Redshift, Azure Synapse
Databases: PostgreSQL, MySQL, SQL Server, Oracle
Data Modeling: Star Schema, Snowflake Schema, Dimensional Modeling
Orchestration: Apache Airflow
Programming: Python, SQL, PySpark, Shell Scripting
Version Control: Git, GitHub, Bitbucket
CI/CD: Jenkins, Azure DevOps
Visualization: Power BI, Tableau
Operating Systems: Linux, Windows