
Senior Data Engineer with 8+ years of experience building scalable data pipelines and distributed systems for fraud detection, healthcare analytics, and enterprise data platforms. Proven track record of architecting multi-terabyte ETL workflows, optimizing Spark performance by 40%+, reducing compute costs by 30-40%, and migrating legacy systems to modern frameworks, reducing code complexity by 60%. Certified Databricks Developer with expertise in AWS EMR Serverless, Apache Spark, Kafka streaming, Azure Databricks, and ML feature engineering pipelines.
TECHNICAL SKILLS
Cloud Platforms: AWS (EMR Serverless, S3, Lambda, Redshift, DynamoDB, Glue, Athena, EC2, CloudWatch), Azure (Databricks, Data Factory, Synapse, Storage, Key Vault, Logic Apps)
Big Data Technologies: Apache Spark, PySpark, Spark SQL, QuickETL, Delta Lake, Databricks, Hadoop, Hive, HDFS, Kafka, EventBus, Airflow, Snowflake
ETL Tools: Azure Data Factory, AWS Glue, SSIS, Talend, QuickETL, Meghdoot, BPP (Batch Processing Pipeline)
Databases: PostgreSQL, SQL Server, Amazon Redshift, DynamoDB, HBase, Cassandra
Languages: Python, SQL, Scala, PL/SQL, T-SQL, HiveQL, Shell Scripting, HOCON
Data Modeling: Star Schema, Snowflake Schema, Dimensional Modeling, SCD Type-1/Type-2, CDC, Medallion Architecture
DevOps & Tools: Jenkins, CI/CD, Git, Docker, Terraform, CloudWatch, Unity Catalog
Methodologies: Agile, Scrum, Waterfall, Test-Driven Development (TDD)