Sravya Kanthuri

Summary

Senior Data Engineer with 7+ years of experience building and scaling production-grade data pipelines, distributed processing systems, and lakehouse architectures across banking, healthcare, and telecom domains.
Expert in Python, PySpark, and SQL for large-scale data transformation and distributed processing, delivering reliable batch and streaming pipelines across Azure and AWS cloud environments.
Hands-on experience implementing Bronze/Silver/Gold medallion lakehouse architectures using Delta Lake with ACID transactions and CDC-based incremental load strategies.
Strong AWS expertise with Glue, EMR, Redshift, Step Functions, Lambda, S3, and Athena, building scalable ETL/ELT pipelines and analytics-ready datasets.
Proven record of measurable impact: reduced pipeline failures by 35%, cut SLA breaches by 40%, improved Snowflake query performance by 30–45%, and increased data accuracy by 25% through structured validation and optimization frameworks.
Designed near real-time streaming solutions using Apache Kafka, Spark Streaming, Azure Event Hubs, and Stream Analytics to process high-velocity event data with low latency.
Advanced Snowflake and Redshift data warehousing experience including schema design, clustering key tuning, materialized views, and complex SQL query optimization.
Implemented enterprise data governance using Databricks Unity Catalog with row and column-level security aligned with SOX, HIPAA, and GDPR compliance standards.
Proficient in dbt and CI/CD-driven deployment models using Terraform and Azure DevOps, reducing environment provisioning time by 60% and maintaining audit-ready data platforms.
Experienced in integrating 20+ heterogeneous data sources into governed analytics ecosystems using CDC-based incremental loads, reconciliation frameworks, and multi-tier data quality controls.

Overview

years of professional experience

Work History

Data Engineer

Fifth Third Bank

01.2024 - Current

Reduced manual reporting effort by 40% by automating end-to-end data ingestion and transformation pipelines using Azure Data Factory and PySpark, delivering curated, Power BI-ready Gold-layer tables for finance and compliance stakeholders.
Eliminated duplicate and failed transactions by implementing SOX-compliant data validation frameworks, automated reconciliation checks, and multi-tier data quality checkpoints across the Databricks lakehouse pipeline, strengthening data integrity for downstream financial reporting.
Decreased pipeline failures by 35% by engineering structured error handling, exponential-backoff retry logic, and real-time Azure Monitor alerting dashboards that proactively prevented business-impacting SLA breaches.
Designed and implemented a scalable Databricks lakehouse using the Bronze/Silver/Gold medallion pattern with Delta Lake, enabling ACID transactions, schema enforcement, Z-ordering, and time-travel auditing to support reliable incremental batch processing.
Built near real-time streaming pipelines using Azure Event Hubs, Azure Functions, and Stream Analytics to process high-velocity transactional events and publish operational analytics to Azure Data Explorer (ADX).
Accelerated new data source onboarding by 50% by developing reusable, schema-aware Python utilities for ingestion and transformation of CSV, JSON, and XML sources via parameterized ADF pipelines.
Delivered self-serve analytics by building optimized SQL-based reporting layers and KPI datasets in Azure Synapse Analytics, reducing engineering dependency for recurring compliance and fraud reporting.
Maintained 100% audit readiness by managing infrastructure-as-code with Terraform and automating deployments through Azure DevOps CI/CD pipelines with full data lineage tracking.
Enforced enterprise data security by configuring Databricks Unity Catalog with role-based row and column-level access controls aligned with GDPR and SOX compliance standards.

Data Engineer

Cencora

12.2020 - 08.2023

Reduced SLA failures and manual operations by 40% by redesigning recurring dataset delivery workflows and standardizing pipeline orchestration using AWS Step Functions with automated failure recovery, retry logic, and real-time alerting.
Improved Snowflake warehouse query performance by 30–45% by optimizing schema design, tuning clustering keys, refining materialized views, and rewriting high-cost SQL queries for critical analytics workloads.
Improved data accuracy and completeness by implementing CDC-based incremental load patterns in AWS Glue, applying multi-layer validation rules, and building automated reconciliation reports to proactively surface discrepancies across key datasets.
Integrated 20+ clinical, supply chain, and sales data sources into analytics-ready datasets using AWS S3, Glue ETL, Lambda, Athena, and Redshift within a scalable data lake and warehouse architecture.
Reduced average EMR PySpark job runtime by 30% and eliminated out-of-memory failures by applying strategic partitioning, broadcast joins, and in-memory caching for large-scale healthcare data processing.
Built a modular, version-controlled dbt transformation layer with automated data tests and lineage documentation, replacing fragmented ad-hoc SQL scripts and standardizing transformation logic across the team.
Improved operational data freshness from daily batch to near real-time by deploying Apache Kafka and Spark Streaming pipelines to process high-volume healthcare supply chain event streams with low end-to-end latency.
Strengthened enterprise data governance by configuring Databricks Unity Catalog with granular audit logging and attribute-based access controls to maintain HIPAA compliance across clinical and operational datasets.
Reduced infrastructure provisioning time by 60% by authoring reusable Terraform modules for repeatable AWS environment setup, integrated into CI/CD pipelines for automated deployment and environment parity.
Orchestrated ETL workflows using Apache Airflow, managing DAG scheduling, dependency handling, and retry policies for reliable pipeline execution.

Data Analyst

Accenture

05.2018 - 11.2020

Built and deployed predictive customer segmentation models using Python, scikit-learn, pandas, and SQL, enabling targeted marketing campaigns and contributing to a 20% improvement in campaign targeting accuracy for telecom clients.
Reduced dashboard load times by 40% by migrating 15+ legacy Excel reports to Power BI and rebuilding underlying SQL data models with optimized joins, indexing strategies, and pre-aggregated summary tables.
Developed scalable ETL pipelines using SQL, Python, and Informatica PowerCenter to consolidate telecom billing, usage, and subscriber data from 10+ source systems into a centralized analytics-ready data warehouse.
Accelerated nightly batch processing by 25% by modernizing SSIS jobs, performing root-cause analysis on long-running queries, and optimizing execution plans for sustained performance improvements.
Improved data pipeline reliability by automating validation, reconciliation, and exception-reporting processes using Bash scripting and Python, reducing manual QA effort by over 3 hours per weekly reporting cycle.
Designed and maintained PostgreSQL-based staging tables to support intermediate ETL transformations and data quality validation.
Collaborated with business analysts, product managers, and engineering teams to gather requirements, define KPI frameworks, and deliver analytics solutions aligned with telecom business objectives and compliance standards.

Education

Master of Science -

The University of Texas At Arlington

Arlington, TX

Skills

Programming : Python, SQL, PySpark, R, Bash/Shell Scripting
Big Data & Distributed Processing: Apache Spark, Spark Streaming, Apache Kafka, Hadoop
Cloud Platforms: AWS: S3, Glue, EMR, Redshift, Step Functions, Lambda, Athena, CloudWatch Azure: Data Factory (ADF), Databricks, ADLS Gen2, Synapse Analytics, Azure Data Explorer (ADX), Event Hubs, Azure Monitor
Databases & Data Warehousing: Snowflake, Amazon Redshift, Azure Synapse SQL, PostgreSQL, SQL Server
Data Engineering & Architecture: ETL/ELT, Lakehouse Architecture (Delta Lake), Dimensional Modeling, Star/Snowflake Schema, CDC, Incremental Loads, Data Quality & Reconciliation

Orchestration & Transformation: dbt, Apache Airflow, AWS Step Functions, SSIS, Informatica PowerCenter
DevOps & Infrastructure: Terraform, Azure DevOps, CI/CD Pipelines, Git
Governance & Security: Databricks Unity Catalog, SOX, HIPAA, GDPR, Audit Logging
Monitoring & Observability: Azure Monitor, AWS CloudWatch, Datadog, Splunk, Grafana
BI & Analytics: Power BI, Tableau, DAX, Power Query

Accomplishments

Software Delivery Excellence Recognition – Best Employee, Accenture (2020)
Awarded for outstanding performance, consistent delivery, and contribution to high-impact telecom analytics and data engineering projects.

Timeline

Data Engineer

Fifth Third Bank

01.2024 - Current

Data Engineer

Cencora

12.2020 - 08.2023

Data Analyst

Accenture

05.2018 - 11.2020

Master of Science -

The University of Texas At Arlington