Summary
Overview
Work History
Education
Skills
Certification
Personal Information
Timeline
Generic

Shabby M

San Jose,CA

Summary

Experienced Senior Data Engineer with over a decade of work in building reliable, secure, and scalable data environments on AWS, Azure, and GCP. Skilled in ETL/ELT development, data modeling, streaming, governance, and automation using modern cloud tools such as Snowflake, Databricks, and dbt. Known for practical problem solving, clear documentation, and collaboration across data science, analytics, and DevOps teams. Familiar with compliance frameworks including HIPAA, SOC 2, and FedRAMP. Focused on quality, maintainability, and data-driven decision making.

Overview

11
11
years of professional experience
1
1
Certification

Work History

Principle Data Engineer

Corro Health
05.2022 - Current
  • Designed and implemented a Snowflake + Databricks data platform supporting healthcare analytics and reporting
  • Developed ETL pipelines with dbt and Azure Data Factory to automate recurring data processes.
  • Introduced data governance practices using Collibra and Monte Carlo for lineage and documentation.
  • Built Kafka and Kinesis streams for event data and near-real-time dashboards.
  • Worked with analysts and data scientists to integrate SageMaker models for clinical and financial insights.
  • Guided a small engineering group on version control, Terraform automation, and deployment standards.

Lead Data Engineer

Halo Branded Solutions
04.2019 - 04.2022
  • Led design of an Azure Synapse warehouse integrating multiple sales and operations systems.
  • Created data pipelines in Azure Data Factory and Airflow for structured and semi-structured data sources.
  • Established CI/CD workflows with Jenkins and Terraform to standardize deployments.
  • Built reusable dbt models for analytics and reporting teams.
  • Collaborated with data scientists to deploy forecasting models through MLflow in Databricks.
  • Implemented data quality checks and alerting to maintain reliability of business dashboards.
  • Partnered with security and compliance teams on access control and auditing.

Data & Cloud Engineer

Employers
01.2016 - 03.2019
  • Migrated existing ETL jobs to AWS Glue and Airflow for better scalability and maintenance.
  • Designed data models using Kimball and Data Vault 2.0 techniques for insurance data domains.
  • Added automated testing and validation for new data pipelines.
  • Set up monitoring dashboards with Prometheus and Grafana for job health tracking.
  • Supported deployment of machine-learning models in AWS SageMaker for fraud detection use cases.
  • Worked closely with database administrators on partitioning, optimization, and access policies.

Data Engineer

GiftHealth
07.2014 - 12.2016
  • Built and maintained ETL processes in SSIS and Talend to move healthcare data into SQL Server.
  • Created Power BI dashboards for operations and compliance reporting.
  • Automated daily refresh jobs and notifications using Python scripts.
  • Supported containerization of ETL workloads with Docker and Jenkins.
  • Helped enforce HIPAA standards for sensitive data storage and access.
  • Worked closely with compliance teams to ensure data masking and HIPAA security measures.
  • Contributed to documentation and user training for business users and data stewards.

Education

Bachelor of Science - Computer Science

Punjab University

Skills

  • Programming & Scripting:Python (NumPy, Pandas, PySpark), SQL (T-SQL, PL/pgSQL, Spark SQL), Scala, Java, R, Go, C, Bash, PowerShell, TypeScript, JavaScript, YAML, JSON

  • Cloud Platforms & Services:AWS (S3, Redshift, Glue, Lambda, Kinesis, SageMaker, Aurora, CloudFormation) Azure (Synapse, Data Factory, Databricks, Cosmos DB, Event Hubs, AKS, Power BI Service) GCP (BigQuery, Dataflow, Cloud Composer, Vertex AI) Snowflake

  • ETL / ELT & Data Orchestration: dbt, Apache Airflow, Dagster, NiFi, Talend, Informatica, SSIS, ADF, Matillion, StreamSets, Alteryx, Fivetran, Airbyte, Prefect

  • Data Engineering & Streaming: Apache Spark (Python & Scala), Kafka, Flink, Beam, Hadoop, Hive, Delta Lake, Presto, Trino, Databricks SQL

  • Modeling & Warehousing: Dimensional Modeling (Kimball, Inmon), Data Vault 20, Star/Snowflake Schemas, Semantic Layers, Data Marts, Lakehouse Architecture

  • Databases (SQL / NoSQL / Graph): PostgreSQL, MySQL, SQL Server, Oracle, MongoDB, Cassandra, Cosmos DB, DynamoDB, Redis, Elasticsearch, Neo4j, ArangoDB

  • Governance & Security: Collibra, Alation, Apache Atlas, Monte Carlo, Data Cataloging, Lineage, MDM, GDPR, HIPAA, SOC 2, RBAC, IAM, Encryption at Rest/In Transit

  • Business Intelligence & Visualization: Power BI (Advanced DAX), Tableau, Looker, Sigma, Qlik, QuickSight, Mode, Metabase, Grafana, Plotly, Matplotlib, Seaborn, Jupyter Notebooks

  • Machine Learning, AI & MLOps:SageMaker, MLflow, TensorFlow, Scikit-learn, Feature Stores, Model Deployment

  • DevOps, CI/CD & DataOps:Git, Jenkins, Terraform, Docker, Kubernetes, Azure DevOps

  • APIs, Integration & Event Architecture:REST APIs, GraphQL, Microservices, Event-Driven Architecture, Kafka Messaging

  • Knowledge Graphs & Semantic Web:Ontology Design, RDF, SPARQL, Neo4j

  • Domain Expertise:Healthcare Data, Financial Analytics, E-Commerce Analytics, Government Data Systems

  • Leadership & Professional Skills:Technical Mentorship, Agile/Scrum Delivery, Stakeholder Communication, Cross-Team Collaboration

Certification

  • Databricks Certified Data Engineer Professional

  • Microsoft Certified: Azure Data Engineer Associate (DP-203)

  • AWS Certified Data Engineer – Associate

Personal Information

Timeline

Principle Data Engineer

Corro Health
05.2022 - Current

Lead Data Engineer

Halo Branded Solutions
04.2019 - 04.2022

Data & Cloud Engineer

Employers
01.2016 - 03.2019

Data Engineer

GiftHealth
07.2014 - 12.2016

Bachelor of Science - Computer Science

Punjab University
Shabby M