Shanmukh Kurra

CrestPoint Analytics

Boston, MA

01.2024 - Current

Designed and implemented data pipelines using Python, Apache Spark, and Kafka to process large volumes of customer behavioral data for marketing analytics.
Maintained and optimized cloud data warehouses in AWS Redshift and Snowflake, improving data query efficiency by 25%.
Developed and orchestrated ETL workflows using Apache Airflow, ensuring reliable and timely data delivery.
Partnered with data science teams to productionize ML models, creating automated pipelines for feature engineering and data validation.
Created monitoring solutions with Datadog and Grafana to proactively detect and resolve pipeline failures.
Implemented data quality checks and anomaly detection, reducing data inconsistencies by 30%.
Engaged in peer code reviews and helped establish scalable coding standards and documentation practices.

WVU Medicine

Morgantown, WV

09.2022 - 12.2023

Developed and deployed end-to-end ETL pipelines using Python, AWS Glue, and Amazon S3 to process high-volume clinical and operational data.
Built and maintained data models in Snowflake and Redshift, enabling efficient analytics for healthcare reporting and compliance teams.
Automated ingestion workflows from disparate sources (EHR systems, REST APIs, SFTP) using Apache Airflow, reducing manual intervention by 90%.
Designed and implemented robust data validation and anomaly detection checks using SQL, Great Expectations, and Slack alerts for data quality assurance.
Partnered with data scientists and clinical informatics teams to support real-time dashboards and predictive models in Tableau and Power BI.
Contributed to the modernization of legacy ETL infrastructure, improving reliability and reducing pipeline failures by over 40%.
Developed automated testing frameworks using Pytest and Great Expectations to validate pipeline outputs before deployment.

DataNest Technologies Pvt. Ltd

Hyderabad, Telangana

04.2021 - 07.2022

Created scalable data pipelines using Apache Airflow, Python, and AWS Lambda to handle high-frequency IoT device data.
Optimized PostgreSQL and Amazon Redshift data warehouses to enhance operations and analytics capabilities.
Directed migration from on-premise MySQL databases to AWS RDS, achieving 50% reduction in downtime.
Established robust data validation frameworks with automated alerts for end-to-end data integrity in ETL processes.
Collaborated with data analysts and product teams to model new data schemas and improve data accessibility for reporting and ML use cases.
Tuned Redshift queries and applied partitioning, compression, and sort keys, resulting in 30% faster query performance and reduced compute cost.

Summary