Summary

Overview

Work History

Education

Skills

Certification

Timeline

Sravya Keesara

Summary

Results-driven AWS Data Engineer specializing in scalable ETL pipeline design and data model optimization. Leveraged AWS services and Apache Spark to enhance query performance significantly. Led code reviews and knowledge-sharing sessions to elevate team productivity, while implementing data governance and self-service analytics to generate impactful business insights.

Overview

years of professional experience

Certification

Work History

AWS Data Engineer

Baxter

Chicago, USA

03.2024 - Current

Developed real-time data ingestion pipelines with Kafka, AWS Lambda, and Kinesis, reducing data latency from hours to minutes across safety and compliance analytics.
Designed and maintained end-to-end ETL/ELT pipelines using Databricks, Airflow, and DBT, delivering production-ready datasets for analytics and ML workloads hosted on AWS Cloud.
Built and optimized data lake and warehouse models using Amazon S3, Redshift, and Glue, improving query performance by 35% and lowering storage costs by 20%.
Integrated IoT sensor data from MES and manufacturing systems into AWS Lakehouse, facilitating predictive analytics and proactive equipment maintenance.
Collaborated with DevOps and architecture teams to optimize Databricks cluster performance on AWS EMR, reducing runtime variance by 30%.
Orchestrated infrastructure-as-code deployments via Terraform and CloudFormation, ensuring consistent and auditable AWS resource provisioning.
Implemented CI/CD pipelines using Jenkins, GitLab, and Terraform, enabling automated build, test, and deployment of Databricks notebooks, DBT models, and AWS Glue jobs.
Developed and optimized incremental and CDC-based ingestion pipelines using Kafka, DBT snapshots, and Snowflake-style stream patterns on AWS, reducing full reload dependency and improving efficiency.
Led code reviews and knowledge-sharing sessions, establishing AWS-native best practices that improved pipeline reliability and developer productivity.
Designed and managed metadata, lineage, and cataloging using AWS Glue Data Catalog, S3, and CloudWatch, improving data discoverability and governance across the AWS Lakehouse.
Built monitoring and alerting frameworks using CloudWatch, SNS, and Lambda, ensuring early detection of pipeline failures and SLA breaches in Databricks and Glue environments.
Collaborated with data science teams to operationalize ML feature pipelines using PySpark, Databricks, S3, and SageMaker, supporting scalable model training and inference workflows on AWS.
Environment: AWS Cloud (S3, EMR, Glue, Lambda, Kinesis, Redshift, Step Functions, CloudWatch, SNS, IAM, SageMaker), Databricks, Apache Spark / PySpark, Apache Kafka, Airflow, DBT, Terraform, CloudFormation, Jenkins, GitLab, AWS Deequ, Lakehouse Architecture, CI/CD, Data Governance & Observability.

Data Engineer/ Software Engineer

JP Morgan Chase

Chicago, USA

05.2023 - 02.2024

Engineered and deployed end-to-end ETL pipelines on AWS EMR and Glue using PySpark, integrating large-scale structured and semi-structured financial data across multiple source systems.
Designed and maintained real-time streaming pipelines using Kafka, Redshift, and DynamoDB to deliver low-latency access to critical financial transaction data.
Automated infrastructure provisioning and CI/CD workflows using Terraform, Docker, and Jenkins, reducing manual deployment time by 70%.
Implemented data quality and validation frameworks in Python and Glue to ensure high data accuracy across production pipelines, enhancing data reliability.
Partnered with business and analytics teams to translate functional requirements into data models and transformation logic, cutting onboarding time for new datasets from 3 weeks to 3 days.
Optimized Spark jobs for parallel processing and partitioning, improving pipeline performance by 40%, and reducing EMR costs.
Built monitoring and alerting dashboards in CloudWatch and Grafana to enable proactive detection of pipeline failures and latency bottlenecks, improving operational oversight.
Designed and implemented incremental and CDC-based ingestion pipelines using PySpark, AWS Glue, and Kafka, minimizing full data reloads and improving pipeline efficiency.
Implemented end-to-end data security and access controls using AWS IAM, KMS, and VPC configurations, ensuring compliance with financial data governance standards.
Developed automated reconciliation and anomaly detection processes using Python, CloudWatch, and custom validation frameworks, improving trust in production datasets.
Orchestrated complex data workflows using AWS Step Functions and Glue triggers, enabling reliable scheduling and dependency management across batch pipelines.
Collaborated with DevOps teams to containerize Spark utilities using Docker and standardize deployment across EMR environments, improving portability and operational consistency.
Environment: AWS (EMR, Glue, Redshift, DynamoDB, S3, Lambda, Step Functions, CloudWatch, KMS, IAM, VPC), Apache Spark / PySpark, Apache Kafka, Docker, Terraform, Jenkins, Grafana, Python, SQL, CI/CD, Agile/Scrum.

Data Engineer

Early Warning

Scottsdale, USA

06.2021 - 04.2023

Designed and maintained scalable ETL workflows using Informatica PowerCenter and AWS Glue, supporting high-volume data migrations across enterprise banking systems.
Integrated Kafka-based event streaming pipelines into Snowflake and Redshift, enabling near real-time updates for fraud detection and transaction monitoring systems.
Optimized complex SQL queries, stored procedures, and transformation logic, reducing ETL processing time by 35%, and improving data throughput.
Automated data ingestion from multiple financial systems to S3, Glue Catalog, and Redshift, enhancing schema validation and lineage tracking.
Implemented data quality checks and reconciliation processes in Python, minimizing discrepancies across production and staging environments.
Collaborated with QA, DevOps, and business stakeholders to ensure zero-defect releases, increasing system reliability and user confidence.
Designed and implemented incremental and CDC-based ingestion pipelines using Kafka, AWS Glue, and Python, reducing full refresh cycles and improving processing efficiency.
Built and optimized analytics-ready data models in Snowflake and Amazon Redshift, leveraging partitioning, clustering, and compression for high-performance reporting.
Implemented end-to-end data governance and security controls with AWS IAM, encryption, and role-based access, ensuring adherence to banking regulatory standards.
Developed automated reconciliation and audit validation frameworks in Python and SQL, improving data accuracy across upstream and downstream financial systems.
Created monitoring and alerting mechanisms using CloudWatch and custom logging frameworks, facilitating proactive detection of ETL failures and SLA breaches.
Partnered with architecture teams to modernize legacy Informatica workflows, transitioning critical pipelines toward cloud-native Glue-based ETL solutions.
Environment: Informatica PowerCenter, AWS Glue, Apache Kafka, Snowflake, Amazon Redshift, Amazon S3, AWS Glue Data Catalog, CloudWatch, AWS IAM, Python, SQL, ETL/ELT, CDC Pipelines, Data Governance & Lineage, Banking Data Systems.

Big Data Engineer

AbbVie

Chicago, USA

08.2018 - 05.2021

Migrated legacy on-prem databases into AWS Redshift and S3, establishing a cloud-first analytics platform for global data integration.
Designed and developed Spark SQL pipelines in Python for cleaning, wrangling, and transforming structured and unstructured datasets, improving ML model accuracy by 20%.
Tuned Spark workloads through dynamic partitioning, broadcast joins, and caching strategies, improving overall pipeline efficiency by 25%.
Automated data ingestion and transformation workflows using AWS Glue and EMR, enhancing scalability and reducing manual data processing effort.
Collaborated with business analysts to develop Tableau dashboards connected to Redshift, delivering near real-time insights into product performance and clinical metrics that informed strategic decisions.
Implemented data quality checks and validation scripts to ensure consistency across multiple data layers in S3 and Redshift.
Designed and implemented incremental and CDC-based ingestion pipelines using AWS Glue, PySpark, and S3, reducing full data reloads and improving pipeline reliability.
Built and optimized analytics-ready dimensional and fact models in Amazon Redshift, leveraging distribution styles, sort keys, and compression for high-performance querying.
Developed automated reconciliation and anomaly detection frameworks using Python and SQL, improving data accuracy across upstream and downstream data layers.
Implemented end-to-end data security and access controls using AWS IAM, KMS, and bucket policies, ensuring compliance with enterprise data governance standards.
Orchestrated complex data workflows using AWS Glue triggers and EMR scheduling, enabling reliable dependency management across batch processing pipelines.
Partnered with data science teams to operationalize feature engineering pipelines using Spark SQL, PySpark, and Redshift, supporting scalable ML training and inference.
Environment: AWS (Redshift, S3, Glue, EMR, IAM, KMS, CloudWatch), Apache Spark (Spark SQL, PySpark), Python, SQL, Tableau, Cloud Data Lake & Warehouse Architecture, Data Quality & Governance, CDC Pipelines, ML Feature Engineering.

Data Analyst

Tenpath Solutions

, India

01.2016 - 07.2018

Ingested legacy SQL Server + Teradata data into Snowflake via S3 staging, enabling modernized analytics.
Developed Spark SQL and Scala jobs to process raw data into structured, analysis-ready datasets.
Conducted data profiling and validation to improve accuracy and consistency across multiple source systems.
Partnered with business stakeholders to translate requirements into actionable KPIs and dashboards, driving data-driven decision-making.
Automated recurring reporting workflows using Python and SQL, cutting manual effort by 40%, streamlining reporting processes.
Developed Tableau dashboards and Excel performance reports, cutting manual reporting time by 50%, enhancing reporting efficiency.
Performed ad-hoc analysis on large datasets to identify trends and anomalies, influencing strategic initiatives.
Optimized SQL queries and ETL processes, improving performance and reducing query runtime by 30%.
Designed interactive dashboards in Tableau/Power BI with drill-down capabilities, empowering self-service analytics for non-technical users.
Designed and implemented analytics-ready semantic layers on Snowflake and S3, enhancing KPI consistency and accelerating dashboard development.
Built and maintained automated data refresh and validation pipelines using Python, SQL, and Spark, ensuring reliable and timely delivery of BI datasets.
Collaborated with data engineering teams to optimize data models and source extracts for BI consumption, improving dashboard performance and reducing query latency.
Environment: Snowflake, Amazon S3, AWS Glacier, Apache Spark (Spark SQL, Scala), Python, SQL, Tableau, Power BI, Microsoft Excel, SQL Server, Teradata, BI & Analytics Platforms, Data Governance & Lineage.

Education

Bachelors - Electronics and Communications Engineering

JNTU Hyderabad

05-2017

Skills

AWS services (EMR, S3, Redshift, RDS, DynamoDB, Lambda, Glue, Athena, Kinesis, Step Functions, CloudFormation, CloudWatch, Lake Formation, SNS, SQS, KMS, IAM, VPC)
Apache Spark and PySpark
Spark SQL and Streaming
Databricks and Hadoop
Hive and Kafka
Kafka Streams and Airflow
Python and Scala
Batch and real-time pipelines
Event-driven architectures
Snowflake and Amazon Redshift
Informatica PowerCenter and DBT

Shell scripting and Jinja
Git (GitLab CI/CD)
Terraform and Jenkins
Feature engineering pipelines
PostgreSQL and MySQL
SQL Server and MongoDB
Cassandra and HDFS
Data governance (HIPAA & GDPR)
Role-based access control
Tableau and Power BI
KPI and self-service analytics

Certification

AWS Certified Solutions Architect – Associate, https://www.credly.com/badges/d262f39f-b1b4-417e-bcd7-5589487194cc/

Timeline

AWS Data Engineer

Baxter

03.2024 - Current

Data Engineer/ Software Engineer

JP Morgan Chase

05.2023 - 02.2024

Data Engineer

Early Warning

06.2021 - 04.2023

Big Data Engineer

AbbVie

08.2018 - 05.2021

Data Analyst

Tenpath Solutions

01.2016 - 07.2018

Bachelors - Electronics and Communications Engineering

JNTU Hyderabad

Sravya Keesara

Summary

Overview

Work History

AWS Data Engineer

Data Engineer/ Software Engineer

Data Engineer

Big Data Engineer

Data Analyst

Education

Bachelors - Electronics and Communications Engineering

Skills

Certification

Timeline

AWS Data Engineer

Data Engineer/ Software Engineer

Data Engineer

Big Data Engineer

Data Analyst

Bachelors - Electronics and Communications Engineering

Similar Profiles

Shanmukha MathpatiShanmukha Mathpati

Harshith Sai PeramHarshith Sai Peram

Thangabalaji SwamynathanThangabalaji Swamynathan

Suhana Suhana null