Summary
Overview
Work History
Education
Skills
Technical Skills
Timeline
Generic

Dheepa Natarajan

Washington

Summary

Senior Data Engineer with over 10+ years of experience architecting, developing, and optimizing scalable data platforms across GCP, Azure, and hybrid cloud infrastructures. Expert in designing Data Mesh architectures that enable decentralized, domain-oriented ownership with integrated governance via APIs and catalog services. Specialized in deploying high-performance, cloud-native solutions using BigQuery, Databricks, Synapse Analytics, and GKE Autopilot. Proven track record in developing fault-tolerant ETL/ELT pipelines using PySpark, Delta Lake, and Apache Hive, supporting multi-terabyte batch workloads and near-real-time streaming.

Overview

12
12
years of professional experience

Work History

Lead Data Engineer

Securian
Scottsdale
02.2025 - Current
  • Spearheaded design and rollout of GCP-based Data Mesh using BigQuery, Cloud Spanner, and Vertex AI pipelines, establishing domain-oriented data ownership and enhancing automated governance.
  • Engineered end-to-end data pipelines on Vertex AI and GKE Autopilot, automating feature extraction and data quality validation, and implementing canary deployments via Istio, while instituting SLO/SLI dashboards in Looker for real-time platform monitoring.
  • Optimized multi-terabyte analytical workloads by refactoring BigQuery SQL with partitioning, clustering, and materialized views, slashing query latency and boosting performance.
  • Automated cross-region infrastructure provisioning with Terraform and Anthos Config Management, embedding Binary Authorization and Forseti Security to enforce zero-trust networking and strengthen least-privilege IAM policies.

Senior Data Engineer – Platform Lead

Signify Health
Dallas
09.2023 - 01.2025
  • Architected and deployed an Azure Data Mesh framework that unified Azure Purview, Unity Catalog (Databricks), and Microsoft Fabric, delivering domain-oriented ownership, end-to-end lineage, and searchable metadata across the platform.
  • Built event-driven ingestion pipelines with Azure Durable Functions and Logic Apps, automating metadata catalog updates and real-time health notifications, accelerating issue detection and enabling quicker remediation.
  • Engineered reusable Terraform and Azure Bicep modules to provision Azure Private Link, Cosmos DB, and Microsoft Defender for Cloud, embedding Azure Policy for automated compliance and streamlining configuration processes.
  • Implemented Azure Monitor and Log Analytics dashboards with optimized KQL queries, enabling rapid identification of ingestion delays, schema drift, and anomalous query patterns, enhancing overall platform observability.

Senior Data Engineer

Virtusa Corporation
New Jersey
12.2021 - 08.2023
  • Constructed scalable PySpark transformation workflows in Azure Databricks, converting raw operational data into Delta Lake tables optimized for time-series analysis and reporting.
  • Engineered end-to-end ETL/ELT pipelines in Azure Data Factory V2, ingesting telemetry from Azure Event Hubs into Azure Data Lake Gen2 with partitioning, schema evolution, and incremental load strategies.
  • Automated provisioning of storage hierarchies, ADF pipelines, and Synapse schemas via Azure DevOps and ARM templates, enabling consistent and repeatable Infrastructure as Code deployments.
  • Integrated Application Insights and Azure Key Vault to centralize pipeline monitoring and enforce least-privilege access, improving reliability and security for batch and streaming processes.

Data Engineer

Cognizant
Bangalore
10.2018 - 08.2021
  • Designed and deployed real-time ingestion pipelines using Azure Stream Analytics and Event Grid, delivering high-velocity event logs to Azure Data Lake for AIOps anomaly detection models.
  • Engineered Azure Data Factory ETL workflows mapping schemas and performing incremental loads, streamlining telemetry ingestion from Blob Storage to Azure SQL Database for enhanced data accessibility.
  • Optimized PySpark transformations on HDInsight by refining partitioning and aggregation logic, accelerating downstream query performance.
  • Automated pipeline and infrastructure provisioning with Azure CLI and ARM templates and secured secrets via Azure Key Vault, ensuring repeatable, compliant deployments for improved operational reliability.

Data Engineer

Persistent Systems
Pune
02.2014 - 09.2018
  • Engineered end-to-end ETL pipelines in Python and Hive, applying partitioning and bucketing that accelerated query execution.
  • Processed 200–500 GB of operational and transactional data daily via Python-based ETL pipelines, leveraging Hive partitioning and bucketing to accelerate query performance.
  • Achieved 30% reduction in query execution time through partitioning and aggregation optimizations, enhancing analyst productivity.
  • Automated data loading from MySQL/PostgreSQL to HDFS using Sqoop, enabling schema mapping and facilitating incremental loads.
  • Refactored legacy Pig jobs to HiveQL, delivering approximately 30% runtime reduction and about 40% decrease in maintenance effort.
  • Orchestrated real-time log ingestion with Apache Flume and Bash scripts, streaming events to HDFS for AIOps analysis.
  • Managed ingestion of 1–3 TB of telemetry data daily via Azure Data Factory.
  • Processed 2–5 TB daily with PySpark on HDInsight, delivering scalable transformations for downstream reporting.
  • Delivered real-time telemetry to four downstream consumers, supporting anomaly detection, analytics, and incident response for 25+ operational users, reducing end-to-end latency by 30–40% through optimized partitioning and Stream Analytics windows.
  • Enabled automated retention policies on Amazon S3 and applied compression to RDS snapshots before HDFS ingestion, contributing to reduced storage costs.

Education

B.Tech - Information Technology

Anna University

MIT - Data Science and Machine Learning Program

Skills

1 Cloud Platforms & Services: GCP(BigQuery, Cloud Spanner, Vertex AI, GKE Autopilot, Anthos Config Management, Forseti Security, Security Command Center), Azure (Azure Data Factory V2, Azure Synapse Analytics, Azure Databricks, Azure Cosmos DB, Azure Monitor, Azure Log Analytics, Azure Logic Apps, Azure Active Directory, Azure Policy, Azure Private Link)

2 Languages & Scripting: Python, SQL, Bash

3 Data Engineering & Processing: PySpark, Delta Lake, Apache Hive, Apache Sqoop, Apache Flume, Spark SQL, HDInsight,

4 Infrastructure as Code & Orchestration: Terraform, Azure Bicep, ARM Templates, Azure DevOps, Jenkins,

  • Data Warehousing

5 Security & Governance: RBAC, KQL, Azure Purview, Unity Catalog, Microsoft Fabric, Policy Intelligence, SLO/SLI Modeling,

6 Networking & Service Mesh: Istio Service Mesh, GKE Autopilot, Azure Private Link, VPC Service Controls

7 Analytics & Visualization: Looker, Application Insights

Technical Skills

1. Cloud Platforms & Services: GCP(BigQuery, Cloud Spanner, Vertex AI, GKE Autopilot, Anthos Config Management, Forseti Security, Security Command Center), Azure (Azure Data Factory V2, Azure Synapse Analytics, Azure Databricks, Azure Cosmos DB, Azure Monitor, Azure Log Analytics, Azure Logic Apps, Azure Active Directory, Azure Policy, Azure Private Link)

2. Languages & Scripting: Python, SQL, Bash

3. Data Engineering & Processing: PySpark, Delta Lake, Apache Hive, Apache Sqoop, Apache Flume, Spark SQL, HDInsight,

4. Infrastructure as Code & Orchestration: Terraform, Azure Bicep, ARM Templates, Azure DevOps, Jenkins,

5. Security & Governance: RBAC, KQL, Azure Purview, Unity Catalog, Microsoft Fabric, Policy Intelligence, SLO/SLI Modeling,

6. Networking & Service Mesh: Istio Service Mesh, GKE Autopilot, Azure Private Link, VPC Service Controls

7. Analytics & Visualization: Looker, Application Insights

Timeline

Lead Data Engineer

Securian
02.2025 - Current

Senior Data Engineer – Platform Lead

Signify Health
09.2023 - 01.2025

Senior Data Engineer

Virtusa Corporation
12.2021 - 08.2023

Data Engineer

Cognizant
10.2018 - 08.2021

Data Engineer

Persistent Systems
02.2014 - 09.2018

B.Tech - Information Technology

Anna University

MIT - Data Science and Machine Learning Program

Dheepa Natarajan