Summary
Overview
Work History
Education
Skills
Timeline
Generic

Yeswanth Prathipati

Dallas

Summary

Data Engineer with 8 years of experience building enterprise-grade data platforms that turn raw, high-volume data into reliable business intelligence. Specialized in real-time streaming architectures and cloud-native pipeline design, with a track record of reducing data processing times and enabling analytics at scale. Comfortable operating across the full data lifecycle — from ingestion and transformation to orchestration and observability — in fast-paced, cross-functional environments.

Overview

8
8
years of professional experience

Work History

Senior Data Engineer

Capital One
Plano, TX
09.2023 - Current
  • Built a distributed data platform with PySpark and Spark Structured Streaming. Handled high-throughput ACH, Rewards, TSYS, and Worldpay feeds for scalable ingestion and transformation of multi-terabyte financial data.
  • Implemented and managed cloud-native data pipelines via AWS Glue, Step Functions, EventBridge rules, and S3 in orchestration for large-scale financial data flows, offering curated and high-quality datasets for Snowflake analytics and enterprise reporting services.
  • Implemented and optimized real-time batch data processing pipelines with Apache Spark, PySpark, Kafka, and SQL for ingesting and transforming streaming payment events to enable the processing of large-scale data for near real-time analytics on multi-terabyte datasets.
  • Developed Distributed Reconciliation Engine for validating and reconciling source data across Sybase, Oracle, PostgreSQL, SQL Server, and DB2 against Athena using AWS Glue jobs, Step Functions orchestration, and EventBridge triggers, maintaining financial accuracy and cross-system consistency across enterprise pipelines.
  • Led Discover data integration on Capital One pipelines, integrating schema evolution, transformation logic, and data contracts, enabled migration of diverse financial data sources.
  • Developed a secure cloud-native ingest and processing pipeline using DynamoDB, Step Functions, Glue, KMS encryption, and Snowflake to control important financial data points and applications while implementing automation for organizing, encrypted storage, and scalable analytics distribution.
  • Optimized partitioning, execution plan, and resource allocation to enhance distributed Spark workload performance in clusters, increasing pipeline throughput and supporting large-scale financial transaction processing.
  • Designed distributed data pipelines for financial datasets from Oracle, SQL Server, PostgreSQL, Kafka, and files into AWS S3 data lake. Implemented automated quality checks, loading curated data into Snowflake warehouse for analytics and reporting.
  • Boosted PySpark data pipelines with Kafka and Snowflake, enhancing distributed data efficiency by 30% and supporting high-throughput transformation.

Data Engineer

Walmart
Bentonville, TX
07.2024 - 09.2025
  • Developed, implemented, and maintained distributed Databricks PySpark data pipelines to ingest high-volume data from enterprise APIs, relational databases, and BigQuery, processing multi-terabyte financial datasets and increasing reporting pipeline throughput by 30% while orchestrating production workflows on Apache Airflow.
  • Designed Spark frameworks with Databricks for financial dataset transformation and validation for SAP HANA. Enhanced partitioning and execution reduced pipeline runtime by 35%, accelerating financial metric generation across reporting dimensions.
  • Led execution of the processes, ensuring consistent API scaling through distributed Spark processing and PySpark ready to integrate with pharmacy and financial reporting systems, driving end-to-end integration within Pharmacy Rx data pipelines.
  • Built Spring Boot microservices and REST APIs that scale to provide real-time Rx performance metrics to an Angular analytics dashboard, supported by CosmosDB and allowed near real-time view into various KPIs.
  • Enhanced data platform reliability and governance via automated validation, BigQuery and SAP HANA reconciliation, optimized PySpark workloads, and fault-tolerant Airflow DAGs with retries, SLA monitoring, and failure recovery.

Data Engineer

Bank Of America
Addison, TX
10.2019 - 03.2021
  • Designed and implemented scalable data ingestion pipelines to process high-volume structured and streaming datasets from SQL Server, PostgreSQL, and Kafka, enabling reliable data ingestion for downstream analytics, reporting systems, and external partner integrations.
  • Developed data workflows with Apache Spark (Scala) to process large datasets. Optimized partitioning and execution, then loaded outputs into HDFS and Hive for analytical queries.
  • Created Hive models and transformation logic to generate datasets for BI dashboards boosting data quality, scalability, and query performance.

Data Engineer

Relus Cloud
Atlanta, GA
05.2018 - 07.2018
  • Contributed to the development of distributed data processing pipelines using Apache Spark (Scala) and Apache Airflow to process and orchestrate large-scale datasets, enabling reliable ETL workflows and scalable data processing within the AWS cloud data platform.
  • Managed cloud-native data workflows using AWS (EMR, EC2, Lambda, S3, Redshift). Transformed raw data to analytical layers, enhancing pipeline automation, and reliability.

Education

Master of Science - Applied Computer Science

Northwest Missouri State University
Maryville, MO
12-2017

Bachelor of Technology - Computer Science and Engineering

KL University
India
05-2016

Skills

  • Languages: Python, SQL/PL-SQL, Java, Scala
  • Cloud: AWS, Azure, GCP
  • Data Platforms: Databricks, Snowflake, BigQuery
  • Frameworks: Spring Boot, Flask
  • Big Data & Streaming: Apache Spark, Spark Streaming, Kafka, Hive, HDFS, MapReduce, Sqoop
  • Databases: PostgreSQL, MySQL, Oracle, Teradata, SQL Server, MongoDB, Cosmos DB, Sybase, Db2
  • Orchestration & DevOps: Apache Airflow, Docker, Kubernetes, Git/Bitbucket, Maven, Gradle
  • Monitoring & Observability: Prometheus, Grafana, Splunk, OpenObserve, SonarQube

Timeline

Data Engineer

Walmart
07.2024 - 09.2025

Senior Data Engineer

Capital One
09.2023 - Current

Data Engineer

Bank Of America
10.2019 - 03.2021

Data Engineer

Relus Cloud
05.2018 - 07.2018

Bachelor of Technology - Computer Science and Engineering

KL University

Master of Science - Applied Computer Science

Northwest Missouri State University