
Data Engineer with 5+ years of experience building scalable distributed data systems and custom ETL frameworks in high-volume cloud environments. Strong expertise in Python, PySpark, Spark, Hive, and Redshift, with hands-on experience designing data ingestion systems processing millions to billions of records weekly. Experienced in building API-driven extraction pipelines, optimizing distributed workloads, and collaborating closely with Data Scientists and Product teams in SaaS environments. Passionate about writing production-grade code and designing resilient, scalable data infrastructure.
Programming:
Python (Advanced), PySpark, SQL, Scala, JavaScript (Nodejs), TypeScript (Working Knowledge)
Big Data & Distributed Systems:
Apache Spark, Hive, Hadoop, Databricks, Spark Streaming
Databases & Storage:
Redshift, PostgreSQL, MongoDB, Elasticsearch, S3
ETL & Data Engineering:
Custom Python ETL frameworks, API ingestion, Data validation pipelines
Cloud Platforms:
AWS (S3, Glue, Lambda, EMR, MSK, IAM), Azure
Orchestration & Monitoring:
Airflow, CloudWatch, Kibana
DevOps:
Terraform, Docker, CI/CD, Git
Databricks Certified Data Engineer Associate — Issued July 2025
Technologies: Spark, Kafka, Python, Redshift
Technologies: Python, REST APIs, S3, Redshift