Summary

Overview

Work History

Education

Skills

Timeline

Yeswanth Prathipati

Dallas

Summary

Data Engineer with 8 years of experience building enterprise-grade data platforms that turn raw, high-volume data into reliable business intelligence. Specialized in real-time streaming architectures and cloud-native pipeline design, with a track record of reducing data processing times and enabling analytics at scale. Comfortable operating across the full data lifecycle — from ingestion and transformation to orchestration and observability — in fast-paced, cross-functional environments.

Overview

years of professional experience

Work History

Senior Data Engineer

Capital One

Plano, TX

09.2023 - Current

Built a distributed data platform with PySpark and Spark Structured Streaming. Handled high-throughput ACH, Rewards, TSYS, and Worldpay feeds for scalable ingestion and transformation of multi-terabyte financial data.
Implemented and managed cloud-native data pipelines via AWS Glue, Step Functions, EventBridge rules, and S3 in orchestration for large-scale financial data flows, offering curated and high-quality datasets for Snowflake analytics and enterprise reporting services.
Implemented and optimized real-time batch data processing pipelines with Apache Spark, PySpark, Kafka, and SQL for ingesting and transforming streaming payment events to enable the processing of large-scale data for near real-time analytics on multi-terabyte datasets.
Developed Distributed Reconciliation Engine for validating and reconciling source data across Sybase, Oracle, PostgreSQL, SQL Server, and DB2 against Athena using AWS Glue jobs, Step Functions orchestration, and EventBridge triggers, maintaining financial accuracy and cross-system consistency across enterprise pipelines.
Led Discover data integration on Capital One pipelines, integrating schema evolution, transformation logic, and data contracts, enabled migration of diverse financial data sources.
Developed a secure cloud-native ingest and processing pipeline using DynamoDB, Step Functions, Glue, KMS encryption, and Snowflake to control important financial data points and applications while implementing automation for organizing, encrypted storage, and scalable analytics distribution.
Optimized partitioning, execution plan, and resource allocation to enhance distributed Spark workload performance in clusters, increasing pipeline throughput and supporting large-scale financial transaction processing.
Designed distributed data pipelines for financial datasets from Oracle, SQL Server, PostgreSQL, Kafka, and files into AWS S3 data lake. Implemented automated quality checks, loading curated data into Snowflake warehouse for analytics and reporting.
Boosted PySpark data pipelines with Kafka and Snowflake, enhancing distributed data efficiency by 30% and supporting high-throughput transformation.

Data Engineer

Walmart

Bentonville, TX

07.2024 - 09.2025

Developed, implemented, and maintained distributed Databricks PySpark data pipelines to ingest high-volume data from enterprise APIs, relational databases, and BigQuery, processing multi-terabyte financial datasets and increasing reporting pipeline throughput by 30% while orchestrating production workflows on Apache Airflow.
Designed Spark frameworks with Databricks for financial dataset transformation and validation for SAP HANA. Enhanced partitioning and execution reduced pipeline runtime by 35%, accelerating financial metric generation across reporting dimensions.
Led execution of the processes, ensuring consistent API scaling through distributed Spark processing and PySpark ready to integrate with pharmacy and financial reporting systems, driving end-to-end integration within Pharmacy Rx data pipelines.
Built Spring Boot microservices and REST APIs that scale to provide real-time Rx performance metrics to an Angular analytics dashboard, supported by CosmosDB and allowed near real-time view into various KPIs.
Enhanced data platform reliability and governance via automated validation, BigQuery and SAP HANA reconciliation, optimized PySpark workloads, and fault-tolerant Airflow DAGs with retries, SLA monitoring, and failure recovery.

Data Engineer

Bank Of America

Addison, TX

10.2019 - 03.2021

Designed and implemented scalable data ingestion pipelines to process high-volume structured and streaming datasets from SQL Server, PostgreSQL, and Kafka, enabling reliable data ingestion for downstream analytics, reporting systems, and external partner integrations.
Developed data workflows with Apache Spark (Scala) to process large datasets. Optimized partitioning and execution, then loaded outputs into HDFS and Hive for analytical queries.
Created Hive models and transformation logic to generate datasets for BI dashboards boosting data quality, scalability, and query performance.

Data Engineer

Relus Cloud

Atlanta, GA

05.2018 - 07.2018

Contributed to the development of distributed data processing pipelines using Apache Spark (Scala) and Apache Airflow to process and orchestrate large-scale datasets, enabling reliable ETL workflows and scalable data processing within the AWS cloud data platform.
Managed cloud-native data workflows using AWS (EMR, EC2, Lambda, S3, Redshift). Transformed raw data to analytical layers, enhancing pipeline automation, and reliability.

Education

Master of Science - Applied Computer Science

Northwest Missouri State University

Maryville, MO

12-2017

Bachelor of Technology - Computer Science and Engineering

KL University

India

05-2016

Skills

Languages: Python, SQL/PL-SQL, Java, Scala
Cloud: AWS, Azure, GCP
Data Platforms: Databricks, Snowflake, BigQuery
Frameworks: Spring Boot, Flask

Big Data & Streaming: Apache Spark, Spark Streaming, Kafka, Hive, HDFS, MapReduce, Sqoop
Databases: PostgreSQL, MySQL, Oracle, Teradata, SQL Server, MongoDB, Cosmos DB, Sybase, Db2
Orchestration & DevOps: Apache Airflow, Docker, Kubernetes, Git/Bitbucket, Maven, Gradle
Monitoring & Observability: Prometheus, Grafana, Splunk, OpenObserve, SonarQube

Timeline

Data Engineer

Walmart

07.2024 - 09.2025

Senior Data Engineer

Capital One

09.2023 - Current

Data Engineer

Bank Of America

10.2019 - 03.2021

Data Engineer

Relus Cloud

05.2018 - 07.2018

Bachelor of Technology - Computer Science and Engineering

KL University

Master of Science - Applied Computer Science

Northwest Missouri State University