
Data Engineer with 4+ years of experience designing, building, and optimizing data pipelines across AWS cloud environments. Strong expertise in ETL/ELT workflows, PySpark, SQL, and distributed processing using Glue, EMR, and Databricks. Proven ability to migrate on-prem systems to cloud-native architectures, implement data quality frameworks, and deliver scalable ingestion solutions supporting analytics, dashboards, and machine learning use cases. Adept at collaborating across product, BI, and data science teams to translate business requirements into reliable, high-performance data solutions.
Cloud: AWS (S3, Glue, Lambda, EMR, EC2, IAM, Redshift Spectrum, Athena), Azure Data Factory (basics)
ETL/Orchestration: AWS Glue, Airflow (MWAA), ADF, SSIS
Big Data / Processing: PySpark, Databricks, Spark SQL, Spark Streaming, Kafka (basic)
Programming: Python (Pandas, Boto3), SQL, Shell Scripting
Data Modeling: Star/Snowflake schemas, Dimensional modeling, Medallion architecture
Databases: PostgreSQL, MySQL, SQL Server, Redshift, Hive/Athena
Tools/DevOps: Git, GitHub Actions, Bitbucket, Jenkins, Docker (intro), CI/CD
Data Quality & Monitoring: AWS Glue DataBrew, Great Expectations (intro), CloudWatch, Logs/Alerts
File Formats: Parquet, ORC, Avro, JSON, CSV