Sanjana Varadaraju Shantha

Austin,TX

Summary

Senior Data Engineer with 8+ years of experience designing, building, and scaling enterprise-grade data platforms across Big Tech and consulting environments, including Apple, Deloitte, and EY. Proven expertise in cloud-native data architectures, spanning Snowflake and dbt–driven analytics engineering, large-scale Spark and AWS EMR big data systems, and Azure SQL–based enterprise data warehousing.

At Apple, led the development of Snowflake-centric analytics platforms built on a medallion architecture over AWS S3, orchestrated with Airflow and integrated with Kafka, enabling low-latency, AI-driven ad decisioning at scale. At Deloitte, served as a technical lead for AWS-based big data platforms, modernizing legacy Hadoop workloads, optimizing Spark pipelines, and delivering highly reliable, cost-efficient data systems processing large financial datasets. At EY, specialized in Azure SQL–focused data warehouse solutions, implementing robust data models, high-performance T-SQL transformations, and governed analytics layers supporting enterprise reporting and Power BI consumption.

Recognized for strong ownership, deep technical rigor, and the ability to translate complex business requirements into scalable, high-performance data solutions. Experienced in partnering closely with data scientists, analysts, and product teams to deliver trusted datasets that power analytics, reporting, and AI/ML workflows in production.

Overview

years of professional experience

Work History

Data Engineer, Ad platform engineering - Contract

Apple

Austin, Texas

11.2024 - Current

Led the design and implementation of a Snowflake-based Enterprise Data Warehouse for Apple Ads analytics, built on a medallion architecture (Bronze/Silver/Gold) over AWS S3, enabling scalable, governed, and analytics-ready data consumption.
Architected dbt-driven ELT pipelines transforming high-volume App Store search and engagement events from Kafka streams and batch sources into curated Snowflake fact and dimension models.
Orchestrated end-to-end data workflows using Apache Airflow, ensuring reliable scheduling, dependency management, SLA monitoring, and alerting for business-critical analytics pipelines.
Built and optimized Snowflake data models supporting ad performance reporting and AI/ML feature generation for real-time ad placement and ad selection systems.
Leveraged Snowflake micro-partitioning, clustering, secure views, and warehouse auto-scaling to handle high-concurrency workloads with predictable performance.
Implemented dbt incremental models, snapshots, and data quality tests, guaranteeing freshness, historical accuracy, and trust in downstream analytics and ML pipelines.
Reduced end-to-end analytics latency by ~30% through optimized Snowflake SQL, dbt materializations, and efficient S3-to-Snowflake ELT patterns.
Integrated Kafka-based ingestion with batch ELT pipelines to support near–real-time availability of App Store signals.
Partnered closely with ML engineers, data scientists, and product teams to align data models with model input requirements, improving feature quality for AI-driven ad targeting.
Established CI/CD best practices for dbt and Airflow using Git-based workflows and automated validation before production deployments.
Served as a Snowflake, dbt, and Airflow subject matter expert, driving analytics engineering standards and best practices across the org.

Lead Big Data Engineer

Deloitte

Bangalore, India

08.2021 - 07.2024

Led architecture and delivery of large-scale Big Data platforms on AWS, processing high-volume insurance and financial datasets using Spark (Scala), Hadoop, Oozie, and AWS EMR.
Designed and operated distributed data pipelines leveraging S3 as a data lake, enabling cost-efficient storage and scalable downstream analytics.
Orchestrated complex batch workflows using Oozie and Airflow, ensuring reliable scheduling, retries, and fault tolerance for mission-critical reporting pipelines.
Migrated legacy on-prem Hadoop workloads to AWS EMR, optimizing cluster sizing, resource utilization, and job parallelism, resulting in ~60% reduction in processing time.
Developed and optimized Spark and Spark SQL jobs to handle large-scale transformations, aggregations, and joins across terabytes of data.
Integrated Kafka-based streaming ingestion with batch pipelines to support near–real-time data availability for downstream analytics.
Implemented AWS-native services including S3, EMR, Glue, and Athena to build flexible, cloud-native big data solutions.
Automated 20+ production workflows, achieving a 60% reduction in manual effort and significantly improving pipeline reliability.
Led performance tuning, failure handling, and root-cause analysis across distributed systems, ensuring high availability and data accuracy.
Acted as technical lead for cross-functional teams, mentoring engineers and driving best practices in big data engineering, cloud adoption, and CI/CD

Data Engineer

E&Y

Bangalore, India

10.2017 - 08.2021

Designed and implemented Azure SQL–centric enterprise data warehouses supporting 100+ recurring financial and operational reports for large U.S. insurance clients.
Architected relational and dimensional data models in Azure SQL Database and Azure SQL DW (Synapse), supporting datasets ranging from hundreds of millions of rows across claims, policy, and finance domains.
Built and orchestrated ADF-based ETL pipelines ingesting data from 10+ heterogeneous source systems (SQL Server, Oracle, flat files), ensuring reliable daily and monthly refresh cycles.
Developed and optimized complex T-SQL stored procedures, views, and transformations, improving query performance by 30–40% through indexing, partitioning, and execution plan tuning.
Implemented data quality and reconciliation frameworks in Azure SQL, reducing downstream reporting discrepancies by ~25% and increasing stakeholder trust in analytics outputs.
Delivered analytics-ready Azure SQL datasets powering Power BI dashboards used by business, finance, and operations teams, enabling faster decision-making.
Led cloud migration initiatives, moving legacy SQL Server workloads to Azure SQL with zero data loss and minimal downtime.
Partnered with analysts and business stakeholders to translate requirements into scalable data models, serving as a go-to resource for Azure SQL performance and design best practices.

Education

Master of Science - Data Science

University of Cumberlands

Williamsburg

12-2025

Bachelor of Science - Electrical, Electronics And Communications Engineering

PES College of Engineering

India

05-2017

Skills

Programming & Query: Python, Scala, SQL, UNIX Shell Scripting
Data Warehousing & Analytics: Snowflake, SQL Data Warehouse, dbt, Iceberg, Delta Lake
Big Data & Processing: Apache Spark, Hadoop, Hive, HBase
Orchestration & ETL: Apache Airflow, MWAA, Apache NiFi, Oozie, Azure Data Factory, AWS Glue
Databases: PostgreSQL, MySQL, Oracle, Microsoft SQL Server
Streaming: Kafka

Cloud Platforms: AWS, Azure
AWS Services: S3, EMR, Lambda, CloudWatch, CloudShell, Amazon Q
Azure Services: Azure SQL, Blob Storage, Data Lake Analytics, Databricks
DevOps & Version Control: Git, Rio
BI & Visualization: Power BI

Timeline