Summary
Overview
Work History
Education
Skills
Timeline
Generic

Vyshnavi Annamaneni

Dallas,TX

Summary

  • Senior Data Engineer with 7+ years of experience specializing in Databricks, Apache Spark, and cloud-native data platforms.
  • Expert in designing and delivering large-scale ETL/ELT pipelines, Delta Lake architectures, real-time streaming solutions, and data lakehouses for analytics.
  • Skilled in leveraging the full Databricks ecosystem including Unity Catalog, MLflow, Databricks SQL, Auto Loader, and Delta Live Tables for governance, ML enablement, and advanced analytics.
  • Proven success in migrating legacy ETL systems (Informatica, SSIS, Hadoop/Hive) to Databricks, modernizing pipelines and improving reliability.
  • Designed bronze/silver/gold lakehouse architectures that improved data consistency, traceability, and self-service analytics adoption.
  • Implemented robust CI/CD workflows with Terraform, Jenkins, and Databricks Repos, ensuring reliable deployments and version control.
  • Optimized Spark clusters through autoscaling, partition pruning, Z-Ordering, broadcast joins, and caching, reducing compute costs by up to 30%.
  • Built real-time streaming pipelines with Kafka, Event Hubs, and Structured Streaming to process 10M+ daily events with sub-second latency.
  • Developed and deployed machine learning pipelines with MLflow for model training, experiment tracking, and monitoring in production.
  • Enforced data governance, RBAC, PII tokenization, and audit trails via Unity Catalog to meet HIPAA, GDPR, and CCPA compliance.
  • Collaborated with cross-functional teams to deliver BI dashboards and Databricks SQL queries powering executive decision-making.
  • Conducted training sessions and workshops on Databricks, Delta Lake best practices, and data engineering methodologies to improve adoption by 50%.
  • Recognized for delivering cost-optimized, secure, and high-performance data pipelines that enabled advanced analytics and AI/ML use cases.

Overview

9
9
years of professional experience

Work History

Senior Data Engineer – Databricks

Capital One
04.2024 - Current
  • Architected and managed large-scale ETL/ELT pipelines on Azure Databricks using PySpark, Delta Lake, and Auto Loader to process 20TB+ financial datasets daily.
  • Designed Delta Lakehouse with gold, silver, and bronze layers to enforce governance and optimize analytics pipelines, reducing data duplication by 35%.
  • Implemented Unity Catalog for centralized governance, fine-grained RBAC, and audit trails across 100+ datasets, ensuring GDPR and CCPA compliance.
  • Developed streaming pipelines using Kafka, Event Hubs, and Spark Structured Streaming, reducing fraud detection latency to under 2 seconds.
  • Optimized Spark workloads via autoscaling, adaptive query execution, Z-Ordering, and partition pruning, lowering compute costs by 30%.
  • Implemented Delta Live Tables for automated incremental processing, reducing ETL pipeline runtime by 40%.
  • Developed business-ready analytics dashboards using Databricks SQL and Power BI, improving decision-making for senior executives.
  • Integrated MLflow for model versioning, experiment tracking, and deployment pipelines for credit risk and fraud models.
  • Automated deployments using Terraform, GitHub Actions, and Databricks Repos to enforce DevOps best practices.
  • Implemented REST API–based monitoring integrated with Splunk and CloudWatch to proactively manage job health and incident alerts.
  • Collaborated with security and compliance teams to establish PII tokenization, encryption at rest, and secure data sharing via Unity Catalog.
  • Conducted training sessions for analysts and data scientists on Databricks notebooks, SQL, and Delta Lake best practices, driving 50% increase in adoption.
  • Migrated batch pipelines into Databricks workflows and jobs API, achieving higher reliability and visibility of execution.

Data Engineer – Databricks

Oracle Health
01.2021 - 07.2023
  • Architected and managed large-scale ETL/ELT pipelines on Azure Databricks using PySpark, Delta Lake, and Auto Loader to process 20TB+ financial datasets daily.
  • Designed Delta Lakehouse with gold, silver, and bronze layers to enforce governance and optimize analytics pipelines, reducing data duplication by 35%.
  • Implemented Unity Catalog for centralized governance, fine-grained RBAC, and audit trails across 100+ datasets, ensuring GDPR and CCPA compliance.
  • Developed streaming pipelines using Kafka, Event Hubs, and Spark Structured Streaming, reducing fraud detection latency to under 2 seconds.
  • Optimized Spark workloads via autoscaling, adaptive query execution, Z-Ordering, and partition pruning, lowering compute costs by 30%.
  • Implemented Delta Live Tables for automated incremental processing, reducing ETL pipeline runtime by 40%.
  • Developed business-ready analytics dashboards using Databricks SQL and Power BI, improving decision-making for senior executives.
  • Integrated MLflow for model versioning, experiment tracking, and deployment pipelines for credit risk and fraud models.
  • Automated deployments using Terraform, GitHub Actions, and Databricks Repos to enforce DevOps best practices.
  • Implemented REST API–based monitoring integrated with Splunk and CloudWatch to proactively manage job health and incident alerts.
  • Collaborated with security and compliance teams to establish PII tokenization, encryption at rest, and secure data sharing via Unity Catalog.
  • Conducted training sessions for analysts and data scientists on Databricks notebooks, SQL, and Delta Lake best practices, driving 50% increase in adoption.
  • Migrated batch pipelines into Databricks workflows and Jobs API, achieving higher reliability and visibility of execution.

Data Engineer

Cognizant
06.2016 - 12.2020
  • Developed data lake ingestion pipelines using Spark and Databricks to centralize enterprise data across multiple lines of business.
  • Migrated Hadoop/Hive ETL jobs into Databricks workflows, reducing query runtimes by 35% and cutting operational overhead.
  • Implemented advanced optimization techniques in Delta Lake including Z-Ordering, bucketing, and Bloom filters, improving query performance.
  • Built PySpark ETL frameworks for reusable ingestion and transformation across multiple projects, improving delivery speed by 20%.
  • Automated ingestion pipelines with Databricks Auto Loader and event-based triggers from Azure Event Hubs.
  • Developed real-time monitoring solutions with Databricks REST APIs, integrated into Splunk and Azure Monitor.
  • Created Databricks SQL dashboards for BI and Tableau integration, providing real-time insights to business stakeholders.
  • Collaborated with data science teams to deliver machine learning feature pipelines on Databricks.
  • Designed partitioning and retention strategies to optimize cloud storage costs while ensuring query performance.
  • Established robust version control of notebooks and data models using Git and Databricks Repos, enabling collaborative workflows.

Education

Master of Science - Data Science

University of North Texas
Denton, TX

Bachelor of Technology - Computer Science

Kakatiya Institute of Technology And Science
Warangal, India

Skills

Programming & Scripting

  • Python, SQL, Scala, Java

Big Data & Processing

  • Databricks, Apache Spark, PySpark, Delta Lake, Databricks SQL, MLflow, Auto Loader, Delta Live Tables, Hadoop

Cloud Platforms & Data Warehousing

  • Azure (Databricks, Data Factory, Data Lake, Synapse, Event Hubs, Functions, Logic Apps)
  • AWS (S3, Glue, Redshift, EMR, Lambda, Athena, Kinesis, IAM)
  • Snowflake, Azure Synapse, Redshift, Google BigQuery

Orchestration & Workflow

  • Apache Airflow, Apache NiFi, Azure Data Factory

Streaming & Messaging

  • Kafka, Event Hubs, Spark Structured Streaming

Data Visualization & BI

  • Power BI, Tableau, Databricks SQL Dashboards

DevOps & CI/CD

  • Jenkins, GitHub Actions, Terraform, Docker, Kubernetes

Data Governance & Monitoring

  • Unity Catalog, Apache Atlas, Collibra, AWS CloudWatch, Splunk

Timeline

Senior Data Engineer – Databricks

Capital One
04.2024 - Current

Data Engineer – Databricks

Oracle Health
01.2021 - 07.2023

Data Engineer

Cognizant
06.2016 - 12.2020

Master of Science - Data Science

University of North Texas

Bachelor of Technology - Computer Science

Kakatiya Institute of Technology And Science
Vyshnavi Annamaneni