Summary
Overview
Work History
Education
Skills
Projects
Timeline
Generic

Shanmukh Kurra

Summary

Data engineer with expertise in optimizing cloud-based data infrastructures and designing scalable data pipelines. Proficient in AWS, Google BigQuery, Python, SQL, Apache Airflow, Spark, and Kafka. Experienced in developing robust ETL processes and ensuring data governance across complex systems. Collaborates effectively with cross-functional teams to deliver data-driven solutions that enhance operational reporting and business insights.

Overview

4
4
years of professional experience

Work History

Data Engineer

CrestPoint Analytics
Boston, MA
01.2024 - Current
  • Designed and implemented data pipelines using Python, Apache Spark, and Kafka to process large volumes of customer behavioral data for marketing analytics.
  • Maintained and optimized cloud data warehouses in AWS Redshift and Snowflake, improving data query efficiency by 25%.
  • Developed and orchestrated ETL workflows using Apache Airflow, ensuring reliable and timely data delivery.
  • Partnered with data science teams to productionize ML models, creating automated pipelines for feature engineering and data validation.
  • Created monitoring solutions with Datadog and Grafana to proactively detect and resolve pipeline failures.
  • Implemented data quality checks and anomaly detection, reducing data inconsistencies by 30%.
  • Engaged in peer code reviews and helped establish scalable coding standards and documentation practices.

Data Engineer

WVU Medicine
Morgantown, WV
09.2022 - 12.2023
  • Developed and deployed end-to-end ETL pipelines using Python, AWS Glue, and Amazon S3 to process high-volume clinical and operational data.
  • Built and maintained data models in Snowflake and Redshift, enabling efficient analytics for healthcare reporting and compliance teams.
  • Automated ingestion workflows from disparate sources (EHR systems, REST APIs, SFTP) using Apache Airflow, reducing manual intervention by 90%.
  • Designed and implemented robust data validation and anomaly detection checks using SQL, Great Expectations, and Slack alerts for data quality assurance.
  • Partnered with data scientists and clinical informatics teams to support real-time dashboards and predictive models in Tableau and Power BI.
  • Contributed to the modernization of legacy ETL infrastructure, improving reliability and reducing pipeline failures by over 40%.
  • Developed automated testing frameworks using Pytest and Great Expectations to validate pipeline outputs before deployment.

Junior Data Engineer

DataNest Technologies Pvt. Ltd
Hyderabad, Telangana
04.2021 - 07.2022
  • Created scalable data pipelines using Apache Airflow, Python, and AWS Lambda to handle high-frequency IoT device data.
  • Optimized PostgreSQL and Amazon Redshift data warehouses to enhance operations and analytics capabilities.
  • Directed migration from on-premise MySQL databases to AWS RDS, achieving 50% reduction in downtime.
  • Established robust data validation frameworks with automated alerts for end-to-end data integrity in ETL processes.
  • Collaborated with data analysts and product teams to model new data schemas and improve data accessibility for reporting and ML use cases.
  • Tuned Redshift queries and applied partitioning, compression, and sort keys, resulting in 30% faster query performance and reduced compute cost.

Education

Master of Science - Computer Science

University Of Missouri
Kansas City, MO
05-2024

Skills

Programming Skills:
Python, SQL, Bash, PySpark, Scala (basic), JavaScript (basic)

Databases:
PostgreSQL, MySQL, MongoDB, AWS Redshift, Snowflake, Google BigQuery

Web Technologies & Libraries:
Flask (API development), REST APIs, Pandas, NumPy, SQLAlchemy, Jupyter Notebooks, HTML/CSS (basic)

Cloud Platforms & Services:
Amazon Web Services (AWS): S3, RDS, Redshift, Lambda, Glue
Google Cloud Platform (GCP): BigQuery
Others: Azure (basic), Snowflake

Projects

Data Lake and ETL Framework for Operational Reporting

Company: DataNest Technologies Pvt. Ltd. – Hyderabad, India
Tech Stack: AWS S3, AWS Glue, Redshift, Python, Airflow, PostgreSQL

  • Designed and implemented a cloud-based data lake to store operational data from multiple business units.
  • Developed reusable ETL pipelines using AWS Glue and Airflow to process structured and unstructured data.
  • Built and optimized Redshift data models to support dashboarding and business intelligence.
  • Enabled real-time alerting for data quality issues using custom Python scripts and Slack integrations.
Healthcare Data Pipeline for Clinical Analytics

Company: WVU Medicine – Morgantown, WV
Tech Stack: Python, AWS Lambda, Redshift, Snowflake, Airflow, Great Expectations

  • Built secure and HIPAA-compliant ETL pipelines to ingest and standardize patient records from EHR systems.
  • Automated data ingestion using AWS Lambda and batch workflows with Apache Airflow, supporting daily clinical reporting.
  • Implemented data validation using Great Expectations, ensuring high reliability for compliance reporting.
  • Integrated processed data into Snowflake for downstream use in clinical dashboards (Power BI, Tableau).
Real-Time Financial Transaction Monitoring System

Tech Stack: Kafka, Spark Streaming, MongoDB, Grafana

  • Built a real-time data pipeline to detect anomalous financial transactions for a fintech use case.
  • Ingested live event streams via Apache Kafka, processed them with Spark Streaming, and stored insights in MongoDB.
  • Monitored data throughput and pipeline performance using Grafana dashboards and real-time alerts.
  • Designed rules-based detection logic, identifying potential fraud with
Healthcare Data Warehouse Modernization

Tech Stack: AWS RDS, Redshift, Python, Pandas, Power BI

  • Migrated legacy hospital database systems from on-premise MySQL to AWS RDS and Redshift.
  • Designed new schema models and implemented Python-based ETL scripts to standardize patient records and appointment data.
  • Supported compliance reporting and clinical dashboards using Power BI, enhancing access to KPIs for non-technical users.
  • Improved data accessibility and reliability, reducing downtime by 40% post-migration.

Timeline

Data Engineer

CrestPoint Analytics
01.2024 - Current

Data Engineer

WVU Medicine
09.2022 - 12.2023

Junior Data Engineer

DataNest Technologies Pvt. Ltd
04.2021 - 07.2022

Master of Science - Computer Science

University Of Missouri
Shanmukh Kurra