Summary
Overview
Work History
Skills
Timeline
Generic

Mohan CH

Summary

Data Engineer with 4+ years of experience in designing and implementing scalable ETL/ELT pipelines, big data processing, and cloud-based data platforms. Proficient in Python, PySpark, and SQL for data transformation and automation. Skilled in AWS services (S3, Glue, EMR, Redshift, Lambda, Athena, CloudWatch) with strong expertise in Airflow and Step Functions for orchestration. Adept at data modeling, data lakes, and warehousing, with proven ability to deliver cost-effective, reliable, and business-focused data solutions. Collaborative team player experienced in Agile development, CI/CD pipelines, and Git-based workflows.

Overview

5
5
years of professional experience

Work History

Data Engineer

Johnson & Johnson
01.2023 - Current
  • Designed and implemented scalable ETL pipelines in Airflow and AWS Step Functions, improving reliability and reducing job failures by 30%.
  • Built PySpark jobs on EMR to process 1TB+ structured and semi-structured data daily, reducing processing time from 5 hours to 2 hours.
  • Developed AWS Glue jobs for schema discovery, data transformation, and partitioning in S3, cutting query costs by 25%.
  • Modeled and optimized Redshift schemas (star and snowflake models), enhancing BI reporting performance by 40%.
  • Integrated Athena for self-service querying and ad-hoc analytics, enabling business users to run queries without engineering support.
  • Implemented AWS Lambda functions for real-time ingestion and automated triggers, reducing batch dependencies.
  • Built CloudWatch dashboards and alarms for proactive monitoring of pipelines, improving SLA adherence by 20%.
  • Designed data validation scripts in Python and SQL to enforce quality checks, ensuring 99% accuracy in reporting datasets.
  • Optimized S3 storage with lifecycle policies, Parquet/ORC formats, and partitioning, reducing costs by 18%.
  • Supported cross-functional teams by delivering datasets for data science, ML pipelines, and BI dashboards.
  • Mentored junior engineers on AWS best practices, PySpark development, and SQL optimization.
  • Participated in Agile ceremonies, sprint planning, and code reviews to ensure timely and high-quality deliveries.

Environment: Python, PySpark, SQL, AWS (S3, Glue, EMR, Redshift, Lambda, Athena, CloudWatch), Airflow, Git, Docker, Terraform, Agile/Scrum.

Data Engineer

Discover Financial Services
11.2020 - 08.2022
  • Built Python automation scripts for ingestion from APIs, FTP, and databases, reducing manual effort by 40%.
  • Migrated on-prem SQL Server data warehouse to AWS (S3 + Redshift), improving scalability and reducing query time by 35%.
  • Developed ETL workflows to consolidate structured, semi-structured, and unstructured data sources into a unified data lake.
  • Created and optimized SQL queries, stored procedures, and triggers for reporting and validation.
  • Designed PySpark pipelines for batch processing and applied data cleansing, aggregation, and transformation rules.
  • Used Glue crawlers to catalog data assets, improving discoverability and schema management.
  • Optimized storage formats with columnar formats (Parquet/ORC), reducing query costs by 22%.
  • Built CloudWatch metrics and alarms to monitor ETL job performance and detect anomalies.
  • Partnered with BI teams to build Redshift-based analytics datasets for dashboards in Tableau and Power BI.
  • Assisted in building CI/CD pipelines for automated deployments of ETL jobs into AWS environments.
  • Documented data pipelines, workflows, and best practices to support onboarding and knowledge sharing.

Environment: Python, SQL, PySpark, AWS (S3, Glue, Redshift, CloudWatch), SQL Server, PostgreSQL, Airflow, Git, Jenkins, Agile/Scrum

Skills

  • Programming & Big Data: Python, PySpark, SQL
  • Cloud Platforms: AWS (S3, Glue, EMR, Redshift, Lambda, Athena, CloudWatch), Databricks
  • Data Engineering: ETL/ELT, Data Lake Architecture, Data Warehousing, Data Modeling
  • Orchestration Tools: Apache Airflow, AWS Step Functions
  • Databases: PostgreSQL, MySQL, SQL Server, Redshift, DynamoDB, MongoDB
  • DevOps & Tools: Git, CI/CD pipelines, Docker, Terraform, Jenkins
  • Methodologies: Agile/Scrum, Problem-Solving, Cross-functional Collaboration

Timeline

Data Engineer

Johnson & Johnson
01.2023 - Current

Data Engineer

Discover Financial Services
11.2020 - 08.2022
Mohan CH