Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Vinay Sappidi

Herndon,USA

Summary

Experienced Data Engineer and Validation Engineer with over 6 plus years of expertise in designing, optimizing, and automating scalable ETL pipelines and workflows. Proficient in Apache Spark, Scala, Python, Databricks, and cloud platforms such as Google Cloud Platform (GCP), and Microsoft Azure. Adept at leveraging advanced data processing tools to support real-time analytics, machine learning models, and digital transformations. Strong focus on data integrity, performance optimization, and cross-functional collaboration to deliver secure, compliant data solutions.

Overview

7
7
years of professional experience
1
1
Certification

Work History

Data Engineer

Medica
Madison, WI
03.2023 - Current
  • Designed and maintained scalable ETL pipelines using Apache Spark and Scala, improving processing speed by 45% for large scale healthcare data ingestion and transformation
  • Tuned Spark jobs for optimal performance using techniques like broadcast joins, partitioning, and caching, reducing execution time by 50% across critical batch workflows
  • Built and optimized data processing workflows in Databricks, enabling near real-time analytics and reducing batch processing times by 40%
  • Utilized Google Cloud Platform (GCP) services, including BigQuery for fast, SQL-based analytics; Google Cloud Storage (GCS) for cost-effective, scalable data storage; and Dataproc for distributed data processing, enabling scalable solutions across Medica's infrastructure
  • Developed and maintained unified data models and implemented robust data integration strategies across Postgres, MongoDB
  • Automated data quality checks and validation frameworks using Python, PySpark, and Jupyter, to enhance data reliability
  • Collaborated cross-functionally with data science, DevOps, analytics, and compliance teams to deliver high-performance, HIPAA-compliant data solutions supporting ML models, reporting, and regulatory audits
  • Led efforts to integrate legacy systems with modern cloud-native architecture, accelerating Medica's digital transformation and reducing infrastructure costs by 30%

Data Engineer

AgFirst Farm Credit Bank
Columbia, SC
06.2021 - 01.2023

Project Description: As part of the enterprise data engineering team, I designed and implemented robust, scalable data pipelines to support AgFirst’s internal and customer-facing banking platforms. This included critical services like Digital Banking, Loan Checks, Quick Code, Cash Services, and the Advanced Commercial Banking System.

Responsibilities:

  • Developed and maintained real-time and batch data pipelines using Scala, PySpark, and Python to handle high-volume banking transactions
  • Leveraged Apache Spark and Apache Kafka for real-time data processing, improving system responsiveness and transaction update accuracy
  • Managed secure and optimized data storage using PostgreSQL and MySQL, supporting analytics and operational reporting needs.
  • Utilized Jupyter Notebooks for data exploration, testing, and cross-functional collaborations
  • Built and deployed scalable data workflows on Azure Databricks, integrated with Azure DevOps for CI/CD automation
  • Partnered with the Informatica team to design and implement reliable ETL processes, ensuring accurate and compliant data integration across cloud and legacy systems
  • Ensured high data quality, reduced processing times, and improved the performance of critical banking services
  • Collaborated with application teams and business analysts to deliver end-to-end data solutions that enhanced the bank’s digital offerings and customer experience

Data Engineer

Dropbox
San Francisco, CA
05.2019 - 04.2021

Project description:

As a Data Engineer at Dropbox, I led the development and optimization of data integration and analytics pipelines to support key business processes. Utilizing Dell Boomi, I designed and implemented ETL workflows to extract, transform, and load data from Salesforce, SAP, Oracle, and Master Data Management (MDM) systems into Google Cloud Platform (GCP) services, such as BigQuery, Google Cloud Storage (GCS), and Dataproc for scalable analytics.

Responsibilities:

  • Designed and deployed scalable ETL pipelines using Dell Boomi to extract and transform data from Salesforce, SAP, Oracle into Google Cloud Platform services
  • Developed distributed data processing jobs using Apache Spark and Scala on Google Cloud Dataproc to support real-time and batch analytics workflows
  • Leveraged Google Cloud Storage and BigQuery for secure, high-performance data storage and analytical processing of structured and semi-structured data
  • Utilized Pub/Sub for event-driven messaging and decoupled ingestion pipelines, enhancing reliability and fault tolerance
  • Wrote complex, optimized SQL queries in BigQuery to support interactive dashboards, ad hoc reporting, and machine learning pipelines
  • Integrated Boomi AtomSphere workflows with cloud-native tools and embedded custom Groovy/JavaScript logic for dynamic data transformations
  • Implemented error handling and alerting in Boomi; integrated with PagerDuty to auto-trigger incident alerts and emails on API and data processing failures
  • Ensured secure API connectivity through OAuth 2.0 and implemented both REST and SOAP-based interfaces for data synchronization
  • Collaborated closely with data analysts, architects, and business stakeholders to deliver high-impact, production-ready data solutions
  • Strong problem-solving mindset with excellent communication and teamwork abilities in agile, cross-functional teams

Connected Service Validation Engineer

Fiat Chrysler Automobiles
Auburn Hills, MI
02.2018 - 02.2019

Project description: Contributed to the development and testing of a digital platform for Chrysler Mopar, focusing on automating the testing of web applications and APIs to ensure high-quality, seamless functionality, and user experience across different releases. The project involved enhancing both front-end and back-end testing, utilizing tools like Java Selenium, JavaScript, Swagger, and Jenkins for continuous integration and deployment.

Responsibilities :

  • Contributed to the digital platform by testing and automating websites that utilized content from the Chrysler Mopar user interface
  • Contributed to the digital platform by testing and automating websites integrated with the Chrysler Mopar user interface, ensuring seamless functionality and a superior user experience
  • Developed end-to-end test scenarios based on detailed Business Requirement Specifications (DBRS), including customer registration, secure login, product search, and purchase workflows, ensuring comprehensive coverage of business logic
  • Created a comprehensive Test Matrix that mapped all test scenarios to corresponding business requirements, ensuring full test coverage and alignment with project objectives
  • Implemented the Page Object Model (POM) design pattern for UI automation, defining both static and dynamic locators for efficient and maintainable test scripts
  • Designed reusable methods for UI automation, incorporating static and dynamic waits to optimize element handling and improve test execution speed and reliability
  • Specialized in automating web applications built with React, utilizing Java Selenium for web automation, to ensure robust testing and efficient execution of UI-related tasks
  • Enhanced automation capabilities with JavaScript, enabling the handling of dynamic interactions within web applications, improving test robustness, and implementing error handling mechanisms to catch exceptions, log failures, and ensure smooth test execution even during unexpected events
  • Conducted System Integration Testing (SIT), Regression Testing, Functional Testing, and Smoke Testing to ensure application stability and functionality across multiple releases
  • Established and standardized a framework for testing REST/SOAP microservices, defining request/response templates, test data, service URLs, and supported HTTP methods (GET, POST, PATCH, DELETE)
  • Leveraged Swagger to validate API responses and understand API schemas, creating automated tests to verify API functionality and conformity to specifications
  • Configured Jenkins jobs for daily automated test runs as part of CI/CD pipelines, ensuring continuous testing and prompt feedback during development
  • Managed automation code repositories in GitHub, utilizing SourceTree for version control and smooth collaboration across the development and testing teams
  • Wrote SQL queries for data validation and retrieval, ensuring data integrity across test environments and supporting data-driven testing processes
  • Provided detailed test effort estimates, daily test execution updates, and reports to the Test Manager, ensuring clear communication of testing progress
  • Actively participated in Agile ceremonies such as sprint planning, daily stand-ups, sprint reviews, and retrospectives, ensuring alignment with the team and tracking progress against sprint goals
  • Analyzed and logged system defects using JIRA, ensuring proper defect tracking and timely resolution through collaboration with development teams

Education

Master of Science - Computer Science

CHICAGO STATE UNIVERSITY
Chicago, IL, USA
05.2017

Bachelor of Technology - Information Technology

GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING
Vishakhapatnam, India
05.2014

Skills

  • Programming Languages : Scala, Python, Java, JavaScript, SQL
  • Data Engineering: Apache Spark, PySpark, Databricks, Apache Kafka
  • Cloud Platforms: Azure, GCP
  • Data Storage & Databases: PostgreSQL, MySQL, MongoDB, Google Cloud Storage (GCS), and BigQuery
  • ETL Tools: Apache Spark, Dataproc, Dell Boomi, Informatica
  • Testing & Debugging: Selenium, JUnit, Swagger, RSpec, Postman
  • Version Control & Collaboration: GitHub, GitLab, SourceTree, JIRA
  • Monitoring & Logging : PagerDuty, System logs
  • Methodologies : Agile (Scrum), TDD, CI/CD, and Test Automation

Certification

  • Dell Boomi Associate Developer
  • AZ – 400 : Designing and Implementing Microsoft DevOps Solutions

Timeline

Data Engineer

Medica
03.2023 - Current

Data Engineer

AgFirst Farm Credit Bank
06.2021 - 01.2023

Data Engineer

Dropbox
05.2019 - 04.2021

Connected Service Validation Engineer

Fiat Chrysler Automobiles
02.2018 - 02.2019

Master of Science - Computer Science

CHICAGO STATE UNIVERSITY

Bachelor of Technology - Information Technology

GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING
Vinay Sappidi