Sravya Kanthuri

Dallas

Summary

Data Engineer with 7+ years building production-grade data platforms and Lakehouse architectures across banking, healthcare, and telecom, leading work end to end from requirements and design through release validation and production support.
Implemented Azure and AWS Lakehouse platforms end to end using Azure Databricks, Delta Live Tables, Unity Catalog, Auto Loader, and Medallion Architecture, establishing reusable pipeline and data modeling standards teams could build on confidently.
Built multi-cloud ETL and ELT pipelines across Azure (Data Factory, Databricks, ADLS Gen2, Synapse Analytics, Microsoft Fabric), AWS (Glue, EMR, Lambda, Step Functions, Redshift, Athena), and GCP (BigQuery), handling CDC, schema evolution, late-arriving data, and backfill strategies.
Developed Python and Scala backend services with REST APIs, GraphQL, OAuth2/JWT authentication, PostgreSQL state management, and Redis caching, owning production observability through Datadog monitoring, incident response, and postmortems.
Orchestrated complex workflows using Apache Airflow, Databricks Workflows, and AWS Step Functions, built governed ELT models using dbt and Snowflake, and partnered with analytics and business teams to define data contracts, data catalogs, and reduce recurring ad hoc requests.
Established data quality and governance programs using Delta Live Tables Expectations, Great Expectations, and Soda across HIPAA, SOX, and GDPR-regulated pipelines for audit readiness, data lineage, and regulatory compliance.
Translated ambiguous business requirements into well-scoped engineering deliverables, communicated progress clearly to finance, compliance, and clinical stakeholders, and managed competing priorities effectively across Agile sprint cycles.
Set pipeline, data modeling, and coding standards in PySpark, Scala, Python, and SQL, mentored engineers at different levels, conducted code reviews, and drove CI/CD through GitHub Actions, Azure DevOps, and Terraform.

Overview

years of professional experience

Work History

Data Engineer

Fifth Third Bank

01.2024 - Current

Designed and built a finance and compliance Lakehouse on Azure Databricks, aligned with audit and business stakeholders on regulatory requirements, delivered certified Gold datasets to Azure Synapse Analytics, and designed dimensional KPI models for fraud detection and compliance reporting that eliminated recurring ad hoc requests.
Built incremental ELT ingestion using Auto Loader on ADLS Gen2 and CDC-based source feeds with Structured Streaming micro-batch processing, handling late-arriving data, schema evolution, and idempotent execution.
Modeled an audit-ready reporting framework covering financial transactions, regulatory aggregations, and compliance events, consuming from Azure Event Hubs and Azure Service Bus, partnering with finance and compliance SMEs to ensure end-to-end traceability.
Developed reusable pipeline components in PySpark, Python, and SQL with standardized validation, error handling, and structured logging, reducing onboarding time for new engineers and new data sources.
Built a Python backend service exposing REST APIs for pipeline reruns, reconciliation status, and dataset health dashboards, with PostgreSQL state management, Redis caching, and OpenAPI documentation integrated via Azure DevOps.
Migrated legacy on-premises SQL Server batch jobs to Azure Databricks and Azure Data Factory, re-architecting brittle overnight runs into reliable incremental pipelines and cutting batch windows to under two hours.
Implemented Delta Live Tables with quality Expectations and quarantine handling, orchestrated end to end through Azure Data Factory and Databricks Workflows, and built reconciliation controls using control totals, variance thresholds, and exception workflows.
Enforced Unity Catalog governance including RBAC, row-level filters, column masks, and data lineage with Microsoft Entra integration, owned Azure Monitor and Datadog observability, and maintained production stability through incident response and postmortems.
Led Agile sprint delivery including story breakdown, estimation, technical design, peer code reviews, and release validation, and automated deployments through Azure DevOps CI/CD, Databricks Asset Bundles, and Terraform.
Cut Spark workload runtime by applying Delta table OPTIMIZE, Z-ORDER, and Liquid Clustering on Azure Databricks, improving SLA adherence across large-scale finance datasets while reducing compute cost.

Data Engineer

Cencora

12.2020 - 08.2023

Built a cloud-native healthcare analytics platform on AWS, standardized clinical, supply chain, and claims data from disparate source systems into a governed data lake, and collaborated with product and analytics teams to define data contracts and SLAs that improved care-gap closure rates.
Designed and orchestrated batch and event-driven ELT pipelines in Python and PySpark using AWS Glue and Glue Data Catalog, processed data through AWS EMR and Hadoop clusters using HDFS, Hive, and MapReduce patterns, with retries, idempotency, schema evolution, and backfill support built in from the start.
Developed governed ELT pipelines and dimensional data models using dbt, AWS Redshift, and AWS Athena, designed Redshift schemas with distribution and sort keys for petabyte-scale workloads, and reduced complex query runtimes in Snowflake through join optimization, partitioning, and clustering.
Extended the platform to GCP BigQuery for cross-cloud analytics, enabling federated queries across AWS and GCP so supply chain and clinical teams could run unified reporting without duplicating data movement between clouds.
Stood up near real-time Kafka and Spark Streaming pipelines for appointment and clinical event updates, handled late-arriving events gracefully, and validated event contracts with upstream teams before data reached clinical consumers.
Developed Python backend services with REST APIs, GraphQL endpoints, and AWS Lambda for high-throughput event state lookups and pipeline control tracking, integrated with AWS DynamoDB for low-latency state reads, and delivered a Python observability service with CloudWatch alerting for downstream dataset health visibility.
Migrated legacy Informatica ETL workflows and on-premises SQL Server batch jobs to AWS Glue and Spark on EMR, redesigning incremental load patterns and tuning query plans to cut end-to-end pipeline runtime from 3 hours to 45 minutes without increasing infrastructure cost.
Owned data quality end to end using Delta Live Tables Expectations and Soda for freshness, schema conformance, and duplicate detection, and enforced HIPAA-compliant PHI de-identification, IAM-based access controls, and AWS Lake Formation governance across all regulated healthcare data flows.
Established CI/CD with GitHub Actions and Terraform, wrote unit and integration tests using pytest, containerized workloads with Docker on Kubernetes, and maintained production stability through structured RCA, incident response, and postmortems.

Data Analyst

Accenture

05.2018 - 11.2020

Delivered telecom ETL, reporting, and analytics solutions consolidating billing, usage, and subscriber data from multiple source systems into a unified analytics platform, partnering with business and product stakeholders to define KPI logic and SLAs.
Developed ETL pipelines using SQL Server, Python, Informatica PowerCenter, and SSIS to ingest and transform billing and subscriber data, and designed PostgreSQL staging tables and transformation layers to support data quality checks and reporting readiness.
Migrated legacy Excel-based reporting workflows to Power BI, rebuilt the underlying SQL data models for performance, and resolved nightly batch bottlenecks through query tuning and execution plan analysis, improving dashboard load times and overnight batch reliability.
Built predictive customer segmentation models using Python, R, and SQL to support targeted marketing campaigns, and delivered outputs as dashboard-ready datasets in Power BI and Tableau validated against actual campaign outcomes.
Improved nightly batch performance by re-architecting slow ETL jobs through query plan optimization and execution tuning in SQL Server, reducing batch runtimes and freeing up compute windows for downstream reporting processes.
Built dashboard-ready reporting datasets and curated tables in Power BI, Tableau, DAX, and Power Query for operational KPI reporting, and coordinated releases with engineering and business teams to minimize disruption to live workflows.
Worked with business analysts, product teams, and engineering stakeholders in Agile environments to gather requirements, define KPI logic, and deliver analytics solutions aligned with business goals.

Education

Master of Science - Information Systems

The University of Texas At Arlington

Arlington, TX

Skills

Programming Languages: Python, SQL, Scala, Java, PySpark, Spark SQL, R, Bash/Shell Scripting, T-SQL
Cloud Platforms: Azure (Data Factory, Databricks, ADLS Gen2, Synapse Analytics, Event Hubs, Azure Functions, Azure Monitor, Microsoft Fabric), AWS (S3, Glue, EMR, Lambda, Step Functions, Athena, Redshift, CloudWatch, IAM, Lake Formation, KMS), GCP (BigQuery)
Databricks Lakehouse: Delta Lake, Unity Catalog, Delta Live Tables, Lakeflow Declarative Pipelines, Auto Loader, Databricks Workflows, Databricks SQL, Photon Engine, Databricks Asset Bundles
Data Engineering: ETL, ELT, Medallion Architecture, Lakehouse Architecture, CDC, APPLY CHANGES INTO, Incremental Processing, Schema Evolution, Backfills, Data Contracts, Data Catalog, Idempotency, Reconciliation, SLAs, dbt, Fivetran
Streaming and Orchestration: Apache Kafka, AWS Kinesis, Structured Streaming, Spark Streaming, Apache Airflow, Azure Data Factory, Databricks Workflows, AWS Step Functions, Auto Loader
Databases and Warehouses: Snowflake, AWS Redshift, Azure Synapse Analytics, BigQuery, PostgreSQL, SQL Server, AWS DynamoDB, Redis, NoSQL
APIs and Backend: REST APIs, GraphQL, FastAPI, Flask, OpenAPI, OAuth2/JWT, AWS Lambda, Databricks REST API and CLI

Data Quality and Observability: Delta Live Tables Expectations, Great Expectations, Soda, Datadog, Azure Monitor, AWS CloudWatch, Databricks System Tables
Data Governance: Unity Catalog, RBAC, Data Lineage, Audit Logging, AWS Lake Formation, Microsoft Entra, Data Masking, PHI De-identification
Compliance: HIPAA, PHI, FHIR, HL7, EHR, Healthcare Claims, SOX, GDPR
DevOps and Infrastructure: Git, CI/CD, GitHub Actions, Azure DevOps, Terraform, Docker, Kubernetes, Linux/Unix, pytest
Analytics and Visualization: Power BI, Tableau, DAX, Power Query, Looker, Presto, Trino, AWS Athena
Big Data: Apache Hadoop, HDFS, Hive, MapReduce, Apache Spark, AWS EMR

Accomplishments

Awarded “Best Employee” at Accenture for consistent delivery and contributions to key projects.

Timeline

Data Engineer

Fifth Third Bank

01.2024 - Current

Data Engineer

Cencora

12.2020 - 08.2023

Data Analyst

Accenture

05.2018 - 11.2020

Master of Science - Information Systems

The University of Texas At Arlington