Summary
Overview
Work History
Education
Skills
Timeline
background-images

Bhanu Prakash

Summary

Data Engineer with 4+ years of experience designing, building, and optimizing data pipelines across AWS cloud environments. Strong expertise in ETL/ELT workflows, PySpark, SQL, and distributed processing using Glue, EMR, and Databricks. Proven ability to migrate on-prem systems to cloud-native architectures, implement data quality frameworks, and deliver scalable ingestion solutions supporting analytics, dashboards, and machine learning use cases. Adept at collaborating across product, BI, and data science teams to translate business requirements into reliable, high-performance data solutions.

Overview

7
7
years of professional experience

Work History

Senior Data Engineer

Citi Bank
07.2025 - Current
  • Developed PySpark transformation scripts for cleansing, enrichment, deduplication, and incremental load logic.
  • Developed data pipelines to ensure seamless integration of financial data across systems.
  • Modernized legacy SQL/SSIS ETL pipelines by redesigning them into AWS Glue PySpark workflows.
  • Built a medallion-style (Bronze/Silver/Gold) data lake architecture using S3, Glue Catalog, and Delta Lake on Databricks.
  • Implemented CDC-based ingestion using MERGE strategies, partitioning, and file compaction techniques.
  • Built Airflow (MWAA) DAGs for orchestration, dependency scheduling, retry logic, and SLA monitoring.
  • Integrated Glue jobs with Lambda and Step Functions to automate validation, notifications, and shared error-handling libraries.
  • Tuned performance by optimizing Spark execution plans, handling skew, minimizing shuffle, and leveraging broadcast joins.
  • Automated CI/CD deployments of Glue Jobs, Jobs bookmarks, and Databricks notebooks using GitHub Actions.
  • Implemented complete monitoring and alerting using CloudWatch metrics, log filters, dashboards, and SNS notifications.
  • Collaborated with product, analytics, and ML teams to design schemas, ensure data contracts, and deliver high-quality curated datasets.

Data Engineer

HSBC Finance Corporation [LOAN IQ ]
01.2021 - 03.2023
  • Designed and implemented ETL processes to streamline data integration across multiple platforms.
  • Developed data models to support analytics and reporting initiatives within financial services.
  • Designed and built end-to-end ETL pipelines using AWS Glue, PySpark, and SQL to ingest data from RDS, APIs, and S3.
  • Modeled curated datasets in S3 using Parquet partitioning, optimized for Athena and Redshift Spectrum.
  • Migrated stored-procedure-based batch jobs into Glue jobs with reusable transform functions.
  • Implemented automated schema validation, row-count checks, data reconciliation reports, and quality gates.
  • Built reusable Python helper modules for logging, exception handling, metadata management, and error routing.
  • Developed SQL transformations including CTEs, window functions, date logic, and complex joins for analytics use cases.
  • Implemented Airflow DAGs for orchestration, dependency management, backfills, and alerting via Slack/Email.
  • Optimized Glue job performance by adjusting worker types, tuning parallelism, caching, and optimizing file formats.
  • Integrated curated datasets into BI and ML pipelines by preparing dimension/fact models and exposing them to downstream consumers.
  • Documented datasets, lineage maps, API contracts, and runbook for smooth handover to operations and analytics teams.

Junior Data Engineer

Tata Consultancy Services [TIAA -FSDF]
06.2019 - 03.2021
  • Assisted in migration from SQL Server/SSIS workloads into AWS Glue and ADF pipelines.
  • Developed SQL scripts, stored procedures, indexing strategies, and materialized views for performance improvements.
  • Created small Python utilities for data extraction, CSV preprocessing, and S3 file movement using Boto3.
  • Helped implement ETL logic including joins, aggregations, lookups, surrogate keys, and SCD-style updates.
  • Supported senior engineers in converting SSIS packages into Glue jobs and ADF pipelines.
  • Worked on daily/weekly batch job monitoring, reruns, issue triaging, and root cause analysis.
  • Created data profiling scripts to check duplicates, null percentages, and schema mismatches.
  • Built documentation including pipeline diagrams, table definitions, field mappings, and operational guides.
  • Helped configure IAM roles, S3 policies, Glue Catalog database/tables, and access management.
  • Participated in Agile ceremonies — backlog grooming, sprint planning, and release reviews.

Education

Master of Science - Information Technology

Lindsey Wilson College
Columbia, Kentucky, KY
04-2025

Bachelor of Science - Information Technology

KL University
Vijayawad , India
05-2018

Skills

    Cloud: AWS (S3, Glue, Lambda, EMR, EC2, IAM, Redshift Spectrum, Athena), Azure Data Factory (basics)
    ETL/Orchestration: AWS Glue, Airflow (MWAA), ADF, SSIS
    Big Data / Processing: PySpark, Databricks, Spark SQL, Spark Streaming, Kafka (basic)
    Programming: Python (Pandas, Boto3), SQL, Shell Scripting
    Data Modeling: Star/Snowflake schemas, Dimensional modeling, Medallion architecture
    Databases: PostgreSQL, MySQL, SQL Server, Redshift, Hive/Athena
    Tools/DevOps: Git, GitHub Actions, Bitbucket, Jenkins, Docker (intro), CI/CD
    Data Quality & Monitoring: AWS Glue DataBrew, Great Expectations (intro), CloudWatch, Logs/Alerts
    File Formats: Parquet, ORC, Avro, JSON, CSV

Timeline

Senior Data Engineer

Citi Bank
07.2025 - Current

Data Engineer

HSBC Finance Corporation [LOAN IQ ]
01.2021 - 03.2023

Junior Data Engineer

Tata Consultancy Services [TIAA -FSDF]
06.2019 - 03.2021

Master of Science - Information Technology

Lindsey Wilson College

Bachelor of Science - Information Technology

KL University
Bhanu Prakash