Summary

Overview

Work History

Education

Skills

Timeline

Vyshnavi Annamaneni

Dallas,TX

Summary

Senior Data Engineer with 7+ years of experience specializing in Databricks, Apache Spark, and cloud-native data platforms.
Expert in designing and delivering large-scale ETL/ELT pipelines, Delta Lake architectures, real-time streaming solutions, and data lakehouses for analytics.
Skilled in leveraging the full Databricks ecosystem including Unity Catalog, MLflow, Databricks SQL, Auto Loader, and Delta Live Tables for governance, ML enablement, and advanced analytics.
Proven success in migrating legacy ETL systems (Informatica, SSIS, Hadoop/Hive) to Databricks, modernizing pipelines and improving reliability.
Designed bronze/silver/gold lakehouse architectures that improved data consistency, traceability, and self-service analytics adoption.
Implemented robust CI/CD workflows with Terraform, Jenkins, and Databricks Repos, ensuring reliable deployments and version control.
Optimized Spark clusters through autoscaling, partition pruning, Z-Ordering, broadcast joins, and caching, reducing compute costs by up to 30%.
Built real-time streaming pipelines with Kafka, Event Hubs, and Structured Streaming to process 10M+ daily events with sub-second latency.
Developed and deployed machine learning pipelines with MLflow for model training, experiment tracking, and monitoring in production.
Enforced data governance, RBAC, PII tokenization, and audit trails via Unity Catalog to meet HIPAA, GDPR, and CCPA compliance.
Collaborated with cross-functional teams to deliver BI dashboards and Databricks SQL queries powering executive decision-making.
Conducted training sessions and workshops on Databricks, Delta Lake best practices, and data engineering methodologies to improve adoption by 50%.
Recognized for delivering cost-optimized, secure, and high-performance data pipelines that enabled advanced analytics and AI/ML use cases.

Overview

years of professional experience

Work History

Senior Data Engineer – Databricks

Capital One

04.2024 - Current

Architected and managed large-scale ETL/ELT pipelines on Azure Databricks using PySpark, Delta Lake, and Auto Loader to process 20TB+ financial datasets daily.
Designed Delta Lakehouse with gold, silver, and bronze layers to enforce governance and optimize analytics pipelines, reducing data duplication by 35%.
Implemented Unity Catalog for centralized governance, fine-grained RBAC, and audit trails across 100+ datasets, ensuring GDPR and CCPA compliance.
Developed streaming pipelines using Kafka, Event Hubs, and Spark Structured Streaming, reducing fraud detection latency to under 2 seconds.
Optimized Spark workloads via autoscaling, adaptive query execution, Z-Ordering, and partition pruning, lowering compute costs by 30%.
Implemented Delta Live Tables for automated incremental processing, reducing ETL pipeline runtime by 40%.
Developed business-ready analytics dashboards using Databricks SQL and Power BI, improving decision-making for senior executives.
Integrated MLflow for model versioning, experiment tracking, and deployment pipelines for credit risk and fraud models.
Automated deployments using Terraform, GitHub Actions, and Databricks Repos to enforce DevOps best practices.
Implemented REST API–based monitoring integrated with Splunk and CloudWatch to proactively manage job health and incident alerts.
Collaborated with security and compliance teams to establish PII tokenization, encryption at rest, and secure data sharing via Unity Catalog.
Conducted training sessions for analysts and data scientists on Databricks notebooks, SQL, and Delta Lake best practices, driving 50% increase in adoption.
Migrated batch pipelines into Databricks workflows and jobs API, achieving higher reliability and visibility of execution.

Data Engineer – Databricks

Oracle Health

01.2021 - 07.2023

Architected and managed large-scale ETL/ELT pipelines on Azure Databricks using PySpark, Delta Lake, and Auto Loader to process 20TB+ financial datasets daily.
Designed Delta Lakehouse with gold, silver, and bronze layers to enforce governance and optimize analytics pipelines, reducing data duplication by 35%.
Implemented Unity Catalog for centralized governance, fine-grained RBAC, and audit trails across 100+ datasets, ensuring GDPR and CCPA compliance.
Developed streaming pipelines using Kafka, Event Hubs, and Spark Structured Streaming, reducing fraud detection latency to under 2 seconds.
Optimized Spark workloads via autoscaling, adaptive query execution, Z-Ordering, and partition pruning, lowering compute costs by 30%.
Implemented Delta Live Tables for automated incremental processing, reducing ETL pipeline runtime by 40%.
Developed business-ready analytics dashboards using Databricks SQL and Power BI, improving decision-making for senior executives.
Integrated MLflow for model versioning, experiment tracking, and deployment pipelines for credit risk and fraud models.
Automated deployments using Terraform, GitHub Actions, and Databricks Repos to enforce DevOps best practices.
Implemented REST API–based monitoring integrated with Splunk and CloudWatch to proactively manage job health and incident alerts.
Collaborated with security and compliance teams to establish PII tokenization, encryption at rest, and secure data sharing via Unity Catalog.
Conducted training sessions for analysts and data scientists on Databricks notebooks, SQL, and Delta Lake best practices, driving 50% increase in adoption.
Migrated batch pipelines into Databricks workflows and Jobs API, achieving higher reliability and visibility of execution.

Data Engineer

Cognizant

06.2016 - 12.2020

Developed data lake ingestion pipelines using Spark and Databricks to centralize enterprise data across multiple lines of business.
Migrated Hadoop/Hive ETL jobs into Databricks workflows, reducing query runtimes by 35% and cutting operational overhead.
Implemented advanced optimization techniques in Delta Lake including Z-Ordering, bucketing, and Bloom filters, improving query performance.
Built PySpark ETL frameworks for reusable ingestion and transformation across multiple projects, improving delivery speed by 20%.
Automated ingestion pipelines with Databricks Auto Loader and event-based triggers from Azure Event Hubs.
Developed real-time monitoring solutions with Databricks REST APIs, integrated into Splunk and Azure Monitor.
Created Databricks SQL dashboards for BI and Tableau integration, providing real-time insights to business stakeholders.
Collaborated with data science teams to deliver machine learning feature pipelines on Databricks.
Designed partitioning and retention strategies to optimize cloud storage costs while ensuring query performance.
Established robust version control of notebooks and data models using Git and Databricks Repos, enabling collaborative workflows.

Education

Master of Science - Data Science

University of North Texas

Denton, TX

Bachelor of Technology - Computer Science

Kakatiya Institute of Technology And Science

Warangal, India

Skills

Programming & Scripting

Python, SQL, Scala, Java

Big Data & Processing

Databricks, Apache Spark, PySpark, Delta Lake, Databricks SQL, MLflow, Auto Loader, Delta Live Tables, Hadoop

Cloud Platforms & Data Warehousing

Azure (Databricks, Data Factory, Data Lake, Synapse, Event Hubs, Functions, Logic Apps)
AWS (S3, Glue, Redshift, EMR, Lambda, Athena, Kinesis, IAM)
Snowflake, Azure Synapse, Redshift, Google BigQuery

Orchestration & Workflow

Apache Airflow, Apache NiFi, Azure Data Factory

Streaming & Messaging

Kafka, Event Hubs, Spark Structured Streaming

Data Visualization & BI

Power BI, Tableau, Databricks SQL Dashboards

DevOps & CI/CD

Jenkins, GitHub Actions, Terraform, Docker, Kubernetes

Data Governance & Monitoring

Unity Catalog, Apache Atlas, Collibra, AWS CloudWatch, Splunk

Timeline

Senior Data Engineer – Databricks

Capital One

04.2024 - Current

Data Engineer – Databricks

Oracle Health

01.2021 - 07.2023

Data Engineer

Cognizant

06.2016 - 12.2020

Master of Science - Data Science

University of North Texas

Bachelor of Technology - Computer Science

Kakatiya Institute of Technology And Science

Vyshnavi Annamaneni

Summary

Overview

Work History

Senior Data Engineer – Databricks

Data Engineer – Databricks

Data Engineer

Education

Master of Science - Data Science

Bachelor of Technology - Computer Science

Skills

Timeline

Senior Data Engineer – Databricks

Data Engineer – Databricks

Data Engineer

Master of Science - Data Science

Bachelor of Technology - Computer Science

Similar Profiles

CHERYL TaylorCHERYL Taylor

Alexus WardAlexus Ward

Ema Rosell-DiazEma Rosell-Diaz

Yazan "Zack" SafadiYazan "Zack" Safadi

Julia KidderJulia Kidder