Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Sravya Keesara

Summary

Results-driven AWS Data Engineer specializing in scalable ETL pipeline design and data model optimization. Leveraged AWS services and Apache Spark to enhance query performance significantly. Led code reviews and knowledge-sharing sessions to elevate team productivity, while implementing data governance and self-service analytics to generate impactful business insights.

Overview

10
10
years of professional experience
1
1
Certification

Work History

AWS Data Engineer

Baxter
Chicago, USA
03.2024 - Current
  • Developed real-time data ingestion pipelines with Kafka, AWS Lambda, and Kinesis, reducing data latency from hours to minutes across safety and compliance analytics.
  • Designed and maintained end-to-end ETL/ELT pipelines using Databricks, Airflow, and DBT, delivering production-ready datasets for analytics and ML workloads hosted on AWS Cloud.
  • Built and optimized data lake and warehouse models using Amazon S3, Redshift, and Glue, improving query performance by 35% and lowering storage costs by 20%.
  • Integrated IoT sensor data from MES and manufacturing systems into AWS Lakehouse, facilitating predictive analytics and proactive equipment maintenance.
  • Collaborated with DevOps and architecture teams to optimize Databricks cluster performance on AWS EMR, reducing runtime variance by 30%.
  • Orchestrated infrastructure-as-code deployments via Terraform and CloudFormation, ensuring consistent and auditable AWS resource provisioning.
  • Implemented CI/CD pipelines using Jenkins, GitLab, and Terraform, enabling automated build, test, and deployment of Databricks notebooks, DBT models, and AWS Glue jobs.
  • Developed and optimized incremental and CDC-based ingestion pipelines using Kafka, DBT snapshots, and Snowflake-style stream patterns on AWS, reducing full reload dependency and improving efficiency.
  • Led code reviews and knowledge-sharing sessions, establishing AWS-native best practices that improved pipeline reliability and developer productivity.
  • Designed and managed metadata, lineage, and cataloging using AWS Glue Data Catalog, S3, and CloudWatch, improving data discoverability and governance across the AWS Lakehouse.
  • Built monitoring and alerting frameworks using CloudWatch, SNS, and Lambda, ensuring early detection of pipeline failures and SLA breaches in Databricks and Glue environments.
  • Collaborated with data science teams to operationalize ML feature pipelines using PySpark, Databricks, S3, and SageMaker, supporting scalable model training and inference workflows on AWS.
  • Environment: AWS Cloud (S3, EMR, Glue, Lambda, Kinesis, Redshift, Step Functions, CloudWatch, SNS, IAM, SageMaker), Databricks, Apache Spark / PySpark, Apache Kafka, Airflow, DBT, Terraform, CloudFormation, Jenkins, GitLab, AWS Deequ, Lakehouse Architecture, CI/CD, Data Governance & Observability.

Data Engineer/ Software Engineer

JP Morgan Chase
Chicago, USA
05.2023 - 02.2024
  • Engineered and deployed end-to-end ETL pipelines on AWS EMR and Glue using PySpark, integrating large-scale structured and semi-structured financial data across multiple source systems.
  • Designed and maintained real-time streaming pipelines using Kafka, Redshift, and DynamoDB to deliver low-latency access to critical financial transaction data.
  • Automated infrastructure provisioning and CI/CD workflows using Terraform, Docker, and Jenkins, reducing manual deployment time by 70%.
  • Implemented data quality and validation frameworks in Python and Glue to ensure high data accuracy across production pipelines, enhancing data reliability.
  • Partnered with business and analytics teams to translate functional requirements into data models and transformation logic, cutting onboarding time for new datasets from 3 weeks to 3 days.
  • Optimized Spark jobs for parallel processing and partitioning, improving pipeline performance by 40%, and reducing EMR costs.
  • Built monitoring and alerting dashboards in CloudWatch and Grafana to enable proactive detection of pipeline failures and latency bottlenecks, improving operational oversight.
  • Designed and implemented incremental and CDC-based ingestion pipelines using PySpark, AWS Glue, and Kafka, minimizing full data reloads and improving pipeline efficiency.
  • Implemented end-to-end data security and access controls using AWS IAM, KMS, and VPC configurations, ensuring compliance with financial data governance standards.
  • Developed automated reconciliation and anomaly detection processes using Python, CloudWatch, and custom validation frameworks, improving trust in production datasets.
  • Orchestrated complex data workflows using AWS Step Functions and Glue triggers, enabling reliable scheduling and dependency management across batch pipelines.
  • Collaborated with DevOps teams to containerize Spark utilities using Docker and standardize deployment across EMR environments, improving portability and operational consistency.
  • Environment: AWS (EMR, Glue, Redshift, DynamoDB, S3, Lambda, Step Functions, CloudWatch, KMS, IAM, VPC), Apache Spark / PySpark, Apache Kafka, Docker, Terraform, Jenkins, Grafana, Python, SQL, CI/CD, Agile/Scrum.

Data Engineer

Early Warning
Scottsdale, USA
06.2021 - 04.2023
  • Designed and maintained scalable ETL workflows using Informatica PowerCenter and AWS Glue, supporting high-volume data migrations across enterprise banking systems.
  • Integrated Kafka-based event streaming pipelines into Snowflake and Redshift, enabling near real-time updates for fraud detection and transaction monitoring systems.
  • Optimized complex SQL queries, stored procedures, and transformation logic, reducing ETL processing time by 35%, and improving data throughput.
  • Automated data ingestion from multiple financial systems to S3, Glue Catalog, and Redshift, enhancing schema validation and lineage tracking.
  • Implemented data quality checks and reconciliation processes in Python, minimizing discrepancies across production and staging environments.
  • Collaborated with QA, DevOps, and business stakeholders to ensure zero-defect releases, increasing system reliability and user confidence.
  • Designed and implemented incremental and CDC-based ingestion pipelines using Kafka, AWS Glue, and Python, reducing full refresh cycles and improving processing efficiency.
  • Built and optimized analytics-ready data models in Snowflake and Amazon Redshift, leveraging partitioning, clustering, and compression for high-performance reporting.
  • Implemented end-to-end data governance and security controls with AWS IAM, encryption, and role-based access, ensuring adherence to banking regulatory standards.
  • Developed automated reconciliation and audit validation frameworks in Python and SQL, improving data accuracy across upstream and downstream financial systems.
  • Created monitoring and alerting mechanisms using CloudWatch and custom logging frameworks, facilitating proactive detection of ETL failures and SLA breaches.
  • Partnered with architecture teams to modernize legacy Informatica workflows, transitioning critical pipelines toward cloud-native Glue-based ETL solutions.
  • Environment: Informatica PowerCenter, AWS Glue, Apache Kafka, Snowflake, Amazon Redshift, Amazon S3, AWS Glue Data Catalog, CloudWatch, AWS IAM, Python, SQL, ETL/ELT, CDC Pipelines, Data Governance & Lineage, Banking Data Systems.

Big Data Engineer

AbbVie
Chicago, USA
08.2018 - 05.2021
  • Migrated legacy on-prem databases into AWS Redshift and S3, establishing a cloud-first analytics platform for global data integration.
  • Designed and developed Spark SQL pipelines in Python for cleaning, wrangling, and transforming structured and unstructured datasets, improving ML model accuracy by 20%.
  • Tuned Spark workloads through dynamic partitioning, broadcast joins, and caching strategies, improving overall pipeline efficiency by 25%.
  • Automated data ingestion and transformation workflows using AWS Glue and EMR, enhancing scalability and reducing manual data processing effort.
  • Collaborated with business analysts to develop Tableau dashboards connected to Redshift, delivering near real-time insights into product performance and clinical metrics that informed strategic decisions.
  • Implemented data quality checks and validation scripts to ensure consistency across multiple data layers in S3 and Redshift.
  • Designed and implemented incremental and CDC-based ingestion pipelines using AWS Glue, PySpark, and S3, reducing full data reloads and improving pipeline reliability.
  • Built and optimized analytics-ready dimensional and fact models in Amazon Redshift, leveraging distribution styles, sort keys, and compression for high-performance querying.
  • Developed automated reconciliation and anomaly detection frameworks using Python and SQL, improving data accuracy across upstream and downstream data layers.
  • Implemented end-to-end data security and access controls using AWS IAM, KMS, and bucket policies, ensuring compliance with enterprise data governance standards.
  • Orchestrated complex data workflows using AWS Glue triggers and EMR scheduling, enabling reliable dependency management across batch processing pipelines.
  • Partnered with data science teams to operationalize feature engineering pipelines using Spark SQL, PySpark, and Redshift, supporting scalable ML training and inference.
  • Environment: AWS (Redshift, S3, Glue, EMR, IAM, KMS, CloudWatch), Apache Spark (Spark SQL, PySpark), Python, SQL, Tableau, Cloud Data Lake & Warehouse Architecture, Data Quality & Governance, CDC Pipelines, ML Feature Engineering.

Data Analyst

Tenpath Solutions
, India
01.2016 - 07.2018
  • Ingested legacy SQL Server + Teradata data into Snowflake via S3 staging, enabling modernized analytics.
  • Developed Spark SQL and Scala jobs to process raw data into structured, analysis-ready datasets.
  • Conducted data profiling and validation to improve accuracy and consistency across multiple source systems.
  • Partnered with business stakeholders to translate requirements into actionable KPIs and dashboards, driving data-driven decision-making.
  • Automated recurring reporting workflows using Python and SQL, cutting manual effort by 40%, streamlining reporting processes.
  • Developed Tableau dashboards and Excel performance reports, cutting manual reporting time by 50%, enhancing reporting efficiency.
  • Performed ad-hoc analysis on large datasets to identify trends and anomalies, influencing strategic initiatives.
  • Optimized SQL queries and ETL processes, improving performance and reducing query runtime by 30%.
  • Designed interactive dashboards in Tableau/Power BI with drill-down capabilities, empowering self-service analytics for non-technical users.
  • Designed and implemented analytics-ready semantic layers on Snowflake and S3, enhancing KPI consistency and accelerating dashboard development.
  • Built and maintained automated data refresh and validation pipelines using Python, SQL, and Spark, ensuring reliable and timely delivery of BI datasets.
  • Collaborated with data engineering teams to optimize data models and source extracts for BI consumption, improving dashboard performance and reducing query latency.
  • Environment: Snowflake, Amazon S3, AWS Glacier, Apache Spark (Spark SQL, Scala), Python, SQL, Tableau, Power BI, Microsoft Excel, SQL Server, Teradata, BI & Analytics Platforms, Data Governance & Lineage.

Education

Bachelors - Electronics and Communications Engineering

JNTU Hyderabad
05-2017

Skills

  • AWS services (EMR, S3, Redshift, RDS, DynamoDB, Lambda, Glue, Athena, Kinesis, Step Functions, CloudFormation, CloudWatch, Lake Formation, SNS, SQS, KMS, IAM, VPC)
  • Apache Spark and PySpark
  • Spark SQL and Streaming
  • Databricks and Hadoop
  • Hive and Kafka
  • Kafka Streams and Airflow
  • Python and Scala
  • Batch and real-time pipelines
  • Event-driven architectures
  • Snowflake and Amazon Redshift
  • Informatica PowerCenter and DBT
  • Shell scripting and Jinja
  • Git (GitLab CI/CD)
  • Terraform and Jenkins
  • Feature engineering pipelines
  • PostgreSQL and MySQL
  • SQL Server and MongoDB
  • Cassandra and HDFS
  • Data governance (HIPAA & GDPR)
  • Role-based access control
  • Tableau and Power BI
  • KPI and self-service analytics

Certification

AWS Certified Solutions Architect – Associate, https://www.credly.com/badges/d262f39f-b1b4-417e-bcd7-5589487194cc/

Timeline

AWS Data Engineer

Baxter
03.2024 - Current

Data Engineer/ Software Engineer

JP Morgan Chase
05.2023 - 02.2024

Data Engineer

Early Warning
06.2021 - 04.2023

Big Data Engineer

AbbVie
08.2018 - 05.2021

Data Analyst

Tenpath Solutions
01.2016 - 07.2018

Bachelors - Electronics and Communications Engineering

JNTU Hyderabad
Sravya Keesara