
Senior Data Engineer with 13+ years of experience architecting and optimizing large-scale, cloud-native data platforms across AWS, Azure, and GCP. Expert in building high-performance, fault-tolerant data ecosystems using Spark, Flink, Kafka, Delta Lake, and modern Data Mesh and Lakehouse architectures. Skilled in developing streaming and batch pipelines, implementing CDC frameworks, and enforcing metadata-driven governance, lineage, and observability at enterprise scale. Proficient in Python, Scala, Rust, SQL, and Terraform, with advanced expertise in schema evolution, data virtualization (Trino, Denodo, Starburst), and distributed query optimization. Adept at enabling feature stores, real-time analytics (Pinot, Druid, Materialize), and ML data pipelines, while driving platform scalability, cost efficiency, and DataOps automation across complex, federated data environments.
1. Multi-Cloud Lakehouse Integration Platform
Tech: Delta Lake, Apache Iceberg, Spark, Airflow, Dagster, AWS, Azure, Apache Atlas, OpenMetadata
Description:
Designed and built a unified multi-cloud Lakehouse platform enabling cross-cloud analytics across AWS and Azure. Implemented Delta Lake and Apache Iceberg for versioned storage, schema evolution, and ACID guarantees. Created metadata-driven ingestion workflows using Airflow and Dagster with full lineage tracking via Apache Atlas and OpenMetadata. Introduced modular data zones, governed schema propagation, and scalable compute layers for structured, semi-structured, and unstructured workloads.
2. Real-Time Streaming & CDC Pipeline Framework
Tech: Apache Flink, Kafka Streams, Kafka Connect, Debezium, Kubernetes, Terraform
Description:
Engineered a distributed streaming framework delivering continuous data synchronization across operational and analytical systems. Implemented Debezium-based CDC flows for relational sources and built transformation layers with Flink and Kafka Streams. Containerized and deployed the entire stack on Kubernetes using Terraform for infrastructure provisioning. Added schema-registry-driven compatibility rules and event routing patterns for consistent, deterministic streaming behavior across microservices.
3. Data Quality & Observability Automation Layer
Tech: Spark, Python, Deequ, Great Expectations, Prometheus, OpenTelemetry, Grafana
Description:
Developed a comprehensive DataOps observability stack combining data validation, lineage propagation, and pipeline health monitoring. Automated quality checks using Deequ and Great Expectations integrated directly into Spark ETL and ELT workflows. Implemented OpenTelemetry-based tracing across pipelines and instrumented Prometheus exporters for systemlevel metrics. Built Grafana dashboards for operational visibility, schema drift detection, anomaly surfacing, and pipeline reliability insights.