Senior Data Engineer with 7+ years of experience specializing in Databricks, Apache Spark, and cloud-native data platforms.
Expert in designing and delivering large-scale ETL/ELT pipelines, Delta Lake architectures, real-time streaming solutions, and data lakehouses for analytics.
Skilled in leveraging the full Databricks ecosystem including Unity Catalog, MLflow, Databricks SQL, Auto Loader, and Delta Live Tables for governance, ML enablement, and advanced analytics.
Proven success in migrating legacy ETL systems (Informatica, SSIS, Hadoop/Hive) to Databricks, modernizing pipelines and improving reliability.
Designed bronze/silver/gold lakehouse architectures that improved data consistency, traceability, and self-service analytics adoption.
Implemented robust CI/CD workflows with Terraform, Jenkins, and Databricks Repos, ensuring reliable deployments and version control.
Optimized Spark clusters through autoscaling, partition pruning, Z-Ordering, broadcast joins, and caching, reducing compute costs by up to 30%.
Built real-time streaming pipelines with Kafka, Event Hubs, and Structured Streaming to process 10M+ daily events with sub-second latency.
Developed and deployed machine learning pipelines with MLflow for model training, experiment tracking, and monitoring in production.
Enforced data governance, RBAC, PII tokenization, and audit trails via Unity Catalog to meet HIPAA, GDPR, and CCPA compliance.
Collaborated with cross-functional teams to deliver BI dashboards and Databricks SQL queries powering executive decision-making.
Conducted training sessions and workshops on Databricks, Delta Lake best practices, and data engineering methodologies to improve adoption by 50%.
Recognized for delivering cost-optimized, secure, and high-performance data pipelines that enabled advanced analytics and AI/ML use cases.
Overview
9
9
years of professional experience
Work History
Senior Data Engineer – Databricks
Capital One
04.2024 - Current
Architected and managed large-scale ETL/ELT pipelines on Azure Databricks using PySpark, Delta Lake, and Auto Loader to process 20TB+ financial datasets daily.
Designed Delta Lakehouse with gold, silver, and bronze layers to enforce governance and optimize analytics pipelines, reducing data duplication by 35%.
Implemented Unity Catalog for centralized governance, fine-grained RBAC, and audit trails across 100+ datasets, ensuring GDPR and CCPA compliance.
Developed streaming pipelines using Kafka, Event Hubs, and Spark Structured Streaming, reducing fraud detection latency to under 2 seconds.
Optimized Spark workloads via autoscaling, adaptive query execution, Z-Ordering, and partition pruning, lowering compute costs by 30%.
Implemented Delta Live Tables for automated incremental processing, reducing ETL pipeline runtime by 40%.
Developed business-ready analytics dashboards using Databricks SQL and Power BI, improving decision-making for senior executives.
Integrated MLflow for model versioning, experiment tracking, and deployment pipelines for credit risk and fraud models.
Automated deployments using Terraform, GitHub Actions, and Databricks Repos to enforce DevOps best practices.
Implemented REST API–based monitoring integrated with Splunk and CloudWatch to proactively manage job health and incident alerts.
Collaborated with security and compliance teams to establish PII tokenization, encryption at rest, and secure data sharing via Unity Catalog.
Conducted training sessions for analysts and data scientists on Databricks notebooks, SQL, and Delta Lake best practices, driving 50% increase in adoption.
Migrated batch pipelines into Databricks workflows and jobs API, achieving higher reliability and visibility of execution.
Data Engineer – Databricks
Oracle Health
01.2021 - 07.2023
Architected and managed large-scale ETL/ELT pipelines on Azure Databricks using PySpark, Delta Lake, and Auto Loader to process 20TB+ financial datasets daily.
Designed Delta Lakehouse with gold, silver, and bronze layers to enforce governance and optimize analytics pipelines, reducing data duplication by 35%.
Implemented Unity Catalog for centralized governance, fine-grained RBAC, and audit trails across 100+ datasets, ensuring GDPR and CCPA compliance.
Developed streaming pipelines using Kafka, Event Hubs, and Spark Structured Streaming, reducing fraud detection latency to under 2 seconds.
Optimized Spark workloads via autoscaling, adaptive query execution, Z-Ordering, and partition pruning, lowering compute costs by 30%.
Implemented Delta Live Tables for automated incremental processing, reducing ETL pipeline runtime by 40%.
Developed business-ready analytics dashboards using Databricks SQL and Power BI, improving decision-making for senior executives.
Integrated MLflow for model versioning, experiment tracking, and deployment pipelines for credit risk and fraud models.
Automated deployments using Terraform, GitHub Actions, and Databricks Repos to enforce DevOps best practices.
Implemented REST API–based monitoring integrated with Splunk and CloudWatch to proactively manage job health and incident alerts.
Collaborated with security and compliance teams to establish PII tokenization, encryption at rest, and secure data sharing via Unity Catalog.
Conducted training sessions for analysts and data scientists on Databricks notebooks, SQL, and Delta Lake best practices, driving 50% increase in adoption.
Migrated batch pipelines into Databricks workflows and Jobs API, achieving higher reliability and visibility of execution.
Data Engineer
Cognizant
06.2016 - 12.2020
Developed data lake ingestion pipelines using Spark and Databricks to centralize enterprise data across multiple lines of business.
Migrated Hadoop/Hive ETL jobs into Databricks workflows, reducing query runtimes by 35% and cutting operational overhead.
Implemented advanced optimization techniques in Delta Lake including Z-Ordering, bucketing, and Bloom filters, improving query performance.
Built PySpark ETL frameworks for reusable ingestion and transformation across multiple projects, improving delivery speed by 20%.
Automated ingestion pipelines with Databricks Auto Loader and event-based triggers from Azure Event Hubs.
Developed real-time monitoring solutions with Databricks REST APIs, integrated into Splunk and Azure Monitor.
Created Databricks SQL dashboards for BI and Tableau integration, providing real-time insights to business stakeholders.
Collaborated with data science teams to deliver machine learning feature pipelines on Databricks.
Designed partitioning and retention strategies to optimize cloud storage costs while ensuring query performance.
Established robust version control of notebooks and data models using Git and Databricks Repos, enabling collaborative workflows.
Education
Master of Science - Data Science
University of North Texas
Denton, TX
Bachelor of Technology - Computer Science
Kakatiya Institute of Technology And Science
Warangal, India
Skills
Programming & Scripting
Python, SQL, Scala, Java
Big Data & Processing
Databricks, Apache Spark, PySpark, Delta Lake, Databricks SQL, MLflow, Auto Loader, Delta Live Tables, Hadoop
Cloud Platforms & Data Warehousing
Azure (Databricks, Data Factory, Data Lake, Synapse, Event Hubs, Functions, Logic Apps)