Summary
Overview
Work History
Education
Skills
Certification
Projects
Websites
Timeline
Generic

Vyshnavi Gollapudi

Falls Church,USA

Summary

Accomplished Data Engineer with a proven track record at Clover, specializing in ETL optimization and cloud data migration. Expert in Python and Apache Airflow, I enhanced system stability and reduced downtime, driving significant improvements in data accessibility and processing efficiency. Strong collaborator, adept at delivering impactful solutions in fast-paced environments.

Overview

5
5
years of professional experience
1
1
Certification

Work History

Data Engineer

Clover
, CA
01.2023 - Current
  • Led the migration from self-managed Apache Airflow to a managed BFD AFAAS instance, supporting ~3,11 pipelines (~8,000 jobs) in a single optimized environment, in collaboration with Astronomer consultation services.
  • Contributed to the Data Foundation Access Management team, maintaining datasets that govern access for suppliers based on legal ownership—serving as Walmart’s single source of truth across internal and external data assets.
  • Maintained and optimized ETL pipelines built with Scala/Spark, managing item-related data (e.g., Global DUNS numbers, Company Names), and orchestrated workflows using the Atomic web interface.
  • Improved system stability and reduced downtime by optimizing Google Dataproc configurations; enhanced database design to support efficient storage, retrieval, and data integrity.
  • Partnered closely with Google to fine-tune performance settings (e.g., driver memory overhead, max submission rates) for improved resource utilization.
  • Designed and implemented a pipeline to transfer data from BigQuery to Azure Databricks, enabling dynamic table extraction and broader data accessibility for SCT-related analytics.
  • Introduced robust error propagation and alerting mechanisms using Python, including email and Slack notifications to promptly detect and resolve job or platform failures.
  • Developed features and enhancements such as Historical Backfill Manual Processes and Databricks Deferrable Operators, which optimized memory usage and improved resource allocation.
  • Built a Kafka ingestion pipeline to land streaming data in SCT BigQuery, improving real-time data processing and integration across the ecosystem.
  • Migrated datasets and ETL/ELT workloads from on-premises to Google Cloud Platform (GCP); created Hive external tables over GCS-based datasets, automating aggregation pipelines for downstream use.
  • Developed CI/CD pipelines using tools like Looper (Jenkins), Git, and GKE, leveraging Docker and Kubernetes for scalable deployment and testing workflows.
  • Wrote Airflow DAGs and PySpark transformation jobs in GCP to compute daily/weekly benefits metrics; validated outputs using Jupyter Notebooks.
  • Built APIs using Denodo Design Studio and tested via Postman, fostering effective cross-functional collaboration and integration of analytics services.
  • Created dashboards to monitor Airflow DAG success rates and published Tableau reports to track dashboard usage, helping identify potential stakeholders across the organization.
  • Utilized Apache Cassandra to manage real-time data pipelines, optimizing NoSQL data ingestion and query performance for high-scale systems.

Data Engineer

Wipro
04.2021 - 12.2022
  • Supported Fund Accounting systems, worked on implementations of global investment and asset management systems, including pricing equity, fixed income securities and funds, as well as large scale application development for the financial services industry over the globe for initiatives of ~$50M+.
  • Worked on large initiatives such as DCT (Data center Transformation), Application Retirement, Fund decoupling and migration and more.
  • Assisted in the development of data pipelines and reports using AWS Redshift, contributing to improved data accuracy and reporting speed.
  • Conducted data analysis and generated actionable insights using SQL and Python, supporting various business functions.
  • Supported the migration of legacy data systems to modern cloud-based solutions, enhancing data accessibility and analysis capabilities.
  • Leveraged Amazon S3 for scalable data storage solutions, enabling seamless access and analysis of big data. Managed large-scale data processing workflows on AWS EMR, reducing processing time and improving data accuracy.
  • Developed and optimized ETL pipelines using AWS Glue and Amazon Redshift to ensure efficient data integration and transformation.
  • Touchpoints across entire software development lifecycle (SDLC), multiple technologies/practices such as MF/Cognos, Tibco, Java, VB.Net, Unix/Linux, AWS, DevOps, shell scripting and others.
  • Executed SQL Queries, Stored procedures for data retrieval from databases and trigger timely Actions.
  • Designed and developed various development/enhancement projects to construct business logic by following Agile Methodology.
  • Implemented AWS Lambda functions to automate the triggering of AWS Glue jobs upon file uploads to Amazon S3, streamlining ETL workflows and ensuring data processing in serverless architecture.

Data Platform Engineer

Cognizant
08.2020 - 04.2021
  • Engineered real-time streaming solutions with Kafka, processing 50,000+ events/minute to power dashboards and downstream applications, improving data freshness for inventory and procurement analytics.
  • Developed Tableau dashboards to enhance market intelligence, reducing status update delays by 30%.
  • Collaborated with SAP MM and SAP BW teams to integrate financial and operational datasets into cloud analytics platforms
  • Developed Tableau dashboards for cross-functional teams by modeling complex datasets into 10+ KPIs, reducing time-to-insight by 30% and enabling data-driven decisions for leadership.
  • Optimized data storage and query performance in Redshift through partitioning, indexing, and cost-efficient schema design, cutting warehouse costs by 15% while maintaining SLA compliance.
  • Proposed and implemented innovative ETL automation techniques using Apache Airflow and Terraform, reducing operational overhead.
  • Automated healthcare claims processing by building Python scripts, reducing manual effort by 30% and improving data accuracy for critical patient workflows.
  • Extracted and cleaned data from SQL databases, CSV files, and APIs using Python (Pandas, NumPy), preparing structured datasets for financial and operational reporting.
  • Utilized Python libraries (OpenPyXL) to transform raw data into structured formats, enabling seamless integration with relational databases and Power BI dashboards.
  • Contributed to the development of innovative features and solutions by leveraging a deep understanding of algorithms, data structures, and software engineering best practices.
  • Designed interactive Power BI dashboards for 10+ projects, reducing leadership decision-making time by 40 hours/project through real-time KPI tracking.
  • Optimized ETL workflows using PySpark, processing 10TB of data weekly and reducing latency by 150 hours/year while improving workflow efficiency by 200 hours/month.
  • Performed statistical analysis, including time series forecasting (Statsmodels, Prophet), to predict trends for 5+ KPIs and support data-driven strategic planning.

Education

Master of Science - Information Systems

Trine University
IN, USA
05.2024

Bachelor of Technology - Electronics and Communication

TKR College of Engineering and Technology
Hyderabad, India
05.2021

Skills

  • Programming and Scripting Languages: Python (NumPy, Pandas, PySpark, Scikit-learn, TensorFlow), SQL, HiveQL, Pig Latin, R, Java, TypeScript, C, C#, Shell Scripting (Linux/Unix)
  • Visualization Tools: Tableau, Power BI
  • Version Control Software & CI/CD: Git, GitLab, JIRA, Jenkins, Confluence
  • ERP Systems: SAP MM (Material Management), SAP Charm
  • Data Analysis: Data Encryption, Data Security, GDPR Compliance
  • Data Modelling: Star Schema, Snowflake Schema, Dimensional Modeling
  • Databases: MySQL, PostgreSQL, MS SQL Server, MongoDB, Oracle DB
  • Machine Learning & Statistical Analysis: Supervised/Unsupervised Learning, Spark ML, Predictive Modeling, Statistical Analysis
  • Data Engineering & ETL Tools: Apache Airflow, AWS Glue, SSIS, Informatica, Snowflake, DBT, Redshift
  • Big Data Technologies: Hadoop (HDFS, Hive, MapReduce), PySpark, Apache Kafka, Spark Framework
  • Cloud Platforms: AWS (S3, Lambda, RDS, Glue, Athena, EC2, Kinesis, IAM, EMR), Azure (Data Factory, Databricks, Synapse Analytics, Blob Storage), Terraform
  • Data pipeline development
  • Apache Airflow
  • ETL optimization
  • Cloud data migration

Certification

  • Hacker Rank Certified: SQL (Advanced)
  • Python for Data Science
  • Understanding the Statistics for Data Science by Internshala.

Projects

Library Management System Database, Web Scraping and API Data Extraction, Developed a Python script to scrape data from websites and extract additional data using an API., Implemented MySQL database integration to store the collected data for further analysis and visualization., Data Migration and Visualization, Created ETL workflows in Alteryx to extract data from multiple sources (SQL Server, XML, Excel, CSV) to, HDFS and scheduled jobs.

Timeline

Data Engineer

Clover
01.2023 - Current

Data Engineer

Wipro
04.2021 - 12.2022

Data Platform Engineer

Cognizant
08.2020 - 04.2021

Master of Science - Information Systems

Trine University

Bachelor of Technology - Electronics and Communication

TKR College of Engineering and Technology
Vyshnavi Gollapudi