Summary

Overview

Work History

Education

Skills

Certification

Timeline

Publications

Hitesh Parab

Schaumburg,IL

Summary

Lead Data Engineer | Cloud Engineer | ETL/ELT Specialist | AI Engineering

Results-driven and certified Lead Data Engineer with over 10 years of experience designing and implementing enterprise-grade data solutions across AWS, Azure, and GCP . Proven expertise in data architecture , building robust ETL/ELT pipelines , and developing metadata-driven data platforms using tools like Databricks, Snowflake, and DBT Labs .

Curious mind at the intersection of data engineering and AI—exploring how intelligent systems can shape the future of scalable, smart data platforms.

Overview

years of professional experience

Certification

Work History

Lead Data Engineer

Exeliq Consulting Inc

01.2017 - Current

Lead Data Engineer

Ascension: Healthcare

05.2025 - Current

RTHE is a real-time data platform that helps Ascension healthcare teams make faster, more informed decisions, automate workflows and improve patient care. It continuously updates data as event happens, ensuring access to the most current information. The Broader use case is a Real-time Auto suggested DRG (Predicting hospital length of stay in a real-time and many more.
Designed and deployed scalable Python-based microservices to process over 5 TB of healthcare data daily using Pub/Sub, transforming and ingesting data into Cloud Spanner, resulting in a 40% improvement in processing efficiency.
Developed and implemented CI/CD pipelines using Terraform and Jenkins to automate GCP infrastructure provisioning, reducing deployment latency by 80%.
Optimized Cloud Run and Pub/Sub architectures to enable parallel microservice processing, reducing infrastructure costs by 25%.
Implemented robust data validation workflows using JSON schemas stored in GCS, reducing data inconsistencies by 30% and ensuring compliance with Protected Health Information Record (PHIR) standards.
Utilized BigQuery for centralized log storage and analytics, cutting annual storage costs by $50,000 and boosting query performance by 20%.

Lead Data Engineer

Exeliq Professional Partners

07.2024 - 04.2025

This product is built to address Spark-based data engineering frameworks and runtime generic data quality challenges on the Databricks platform. This product aims to provide a scalable, efficient, and customizable solution for handling data ingestion, data transformation, and data quality requirements, ensuring data integrity, and supporting high-quality analytics and reporting. The solution is designed to integrate seamlessly with Databricks on different cloud platforms.
Developed a metadata-driven data ingestion, transformation, and data quality framework for automated and standardized checks across Databricks.
Designed and implemented a metadata-driven framework using Databricks, Snowpark, and DBT Labs to enable seamless ingestion and scalable transformation of data from various sources, ensuring operational efficiency and standardized processing pipelines.
Integrated API-based data ingestion by extracting data from external REST APIs, processing it in PySpark, and storing it within a Medallion Architecture (Bronze, Silver, Gold layers) to ensure structured and optimized data management.
Integrated Great Expectations with Databricks to automate data validation throughout the ETL process, reducing manual effort and improving accuracy.
Implemented seamless interoperability between Snowflake and Databricks Unity Catalog by leveraging the Databricks Iceberg REST Catalog interface. This enabled direct querying of Unity Catalog tables from Snowflake, streamlining data access, reducing latency, and simplifying data engineering workflows.
Integrated the framework with a Databricks dashboard and Overwatch for real-time monitoring of data quality metrics, failed validations, pipeline health, and FinOps.
Built real-time custom alerting and logging to flag and log data quality issues for quick resolution and transparency.

Senior Data Engineer

FedEx Express

04.2023 - 06.2024

This project is analyzing FedEx express itinerary management & operations data from different source systems (coordinate handling unit trip, shipment, load handling, clearance domain) using Azure ADX, TSQL, Python, PySpark, & Databricks and creating observability dashboard for reporting and analytics using PowerBI and ML.
Created end to end near real time structured streaming data pipelines using Azure Databricks and Azure Event Hub.
Implemented API-driven data ingestion by retrieving shipment tracking and logistics data from FedEx APIs, cleaning and transforming nested JSON data using PySpark, and organizing it within a structured architecture (Landing, Harmonization, and Consumption layers) to improve data accessibility, traceability, and analytical insights.
Created POC to integrate Collibra for metadata management and data cataloging, enabling automated lineage tracking and business glossary enrichment across data domains to improve data governance, compliance, and discoverability for operational analytics.
After setting up and completing deployments, I maintained full responsibility for monitoring and supporting the end-to-end Data pipelines and their operation.

Cloud Data Engineer

Advent Health

11.2022 - 03.2023

Company Overview: a leading heartcare provider
Worked on a critical healthcare data modernization initiative for Advent Health, a leading heartcare provider, as part of their digital transformation journey. This project is migrating different on-prem data sources (Oracle, MySQL, Salesforce etc.) to azure cloud/snowflake. Building automated metadata driven framework and pipelines using azure data factory, creating data lake in ADLS and loading data to Snowflake for further reporting and analytics.
Built Metazoo automation framework for salesforce metadata generation.
Automated source/salesforce schema extraction, schema processing, and job generation using a Python-based framework that can map Oracle, salesforce, MySQL data to Snowflake.
Built parameterized ADF pipelines from extracted metadata as input parameters and ingested data into Azure data lake storage.
Used Azure Databricks to cleanse & transform data before loading into Snowflake.
Ingested extracted parquet data into Snowflake tables and created views on top for further analysis.
Implemented different load strategies full/initial load, incremental load, and Type2 while loading data into Snowflake.
Built a DBT Labs and Snowpark-based transformation framework to standardize and optimize data processing after ingestion.
Built Automation of data pipelines and CI/CD using Gitlab.
Test end-to-end ADF data pipeline and data validation for ingested data.
Document the end-to-end process, and performance analysis on confluence.
Overall, it provided below value addition to the client: Seamless Migration to Azure, Reusable & reliable Ingestion Framework and data pipelines, Strategic Cloud Enablement.
A leading heartcare provider

Sr. Data Engineer

MyFitnessPal

02.2022 - 10.2022

Company Overview: one of the best weight loss apps and fitness apps
MyFitnessPal is one of the best weight loss apps and fitness apps, helping nearly 1 million members reach their nutrition, health and fitness goals every year. This project is migrating their application data to snowflake data warehouse for their BI needs as well as implementing ETL & Data Warehousing using Snowflake and orchestrates & automates complete end-to-end flow using Airflow jobs.
Create and manage data pipelines using MWAA airflow DAGs to load data from Oracle, AWS s3, Kafka topics to snowflake.
Created ETL jobs using snowflake to copy raw data into the landing schema of snowflake.
Implemented delta/incremental load with type 2, overwrite and append load strategies from landing/raw layer to staging layer.
Transformed and performed data curation & cleansing on raw variant data into suitable structured format using snowflake scripts.
Used snowflake streams to identify inserts, updates, and deletes operations on raw data.
Created parameterized DAGs for different environments (PROD, DEV & QA) to orchestrate and schedule complete end-to-end ETL process.
Developed a metadata-driven process to create 'as of date' and 'as of month' tables, efficiently appending daily and monthly snapshots into respective historical tables. This streamlined data versioning process enhances historical data tracking and supports advanced time-based analytics.
Developed a custom log snowflake operator in Airflow for logging, debugging, and auditing of Airflow jobs.
Integrated Fivetran to automate data ingestion from cloud-based sources such as Salesforce and other SaaS platforms, streamlining ELT processes.
Configured HVR to connect and replicate data from on-premises systems, enabling secure and real-time data synchronization with the cloud environment.
Worked closely with different stakeholders, BA, solution architect, QA as well as BI team to achieve project goals and meet project timelines.
Worked on process flow, lineage and different SOP documentation.
One of the best weight loss apps and fitness apps

Cloud Engineer

HSBC

03.2021 - 01.2022

This project is data migration from on-prem to google cloud and implementing data ingestion strategies from GCP bucket to BigQuery using Airflow as orchestration tool.
Migration of source files with different file formats (.csv, .cobol, fixed width,.avro) from on prem servers to Google cloud storage using Juniper data migration tool.
Created juniper feeds for transferring files from on-prem virtual machine to GCS buckets.
Developed parameterized python scripts to perform data conversion, audit process, reconciliation of data before loading it into a Bigquery table.
Wrote Cobol parser in python to read fixed width files and to load into target big query tables.
Replaced existing Control-M orchestration to Airflow.
Created Airflow DAGs to orchestrate complete end-to-end ingestion process and scheduling.
Performed data validations and unit testing using python.
Created interdependent DAGs in Airflow using triggerdagrunoperator and task sensors in airflow.
Created SOP documents for complete end-to-end ingestion process using confluence.

Data Engineer

Exeliq Consulting Inc.

07.2020 - 02.2021

This project is implementing and migrating informatica ETL to Databricks PySpark & test Spark automation framework.
Implementing data model from existing PostgreSQL, Oracle to databricks PySpark.
Converting RDBMS SQL stored procedure into Spark program using Spark libraries.
Migrating informatica ETL into Spark transformations and loading data in PostgreSQL.
Use data from AWS S3 for processing and upload data back to AWS S3 using KMS security.
Processing input text files and dimension table in csv format to load in PostgreSQL.
Parsing, extracting data from COBOL file using PySpark Jobs.
Implementing testing framework to compare existing processed Target file extracts from Informatica and new PySpark processed files.
Optimize the Spark code for large data processing using spark recommended performance tuning techniques.
Debug existing testing framework and make changes according to the requirements.
Migrate complete local testing framework to Azure Databricks.

Big Data Engineer

Ingredion

05.2019 - 06.2020

Data Xform provides a seamless journey for data migration and transformation from a plethora of legacy databases to the cloud environment. It works on database discovery, assessment, and migration by using an industry specific architecture and ensuring minimal downtime and data loss while switching over to the cloud-hosted providers. The tool also ensures that integration of data across various databases is done efficiently and effectively.
Created and managed Single automated hybrid data integration framework using Apache Spark.
Data ingestion to Azure Data Lake from various Data sources like CSV, EXCEL, SQL server, MongoDB, Kafka etc. using Azure Data Factory v2.
Performed Data cleansing, Data profiling on raw data using Spark-Scala in azure Databricks.
Implement End-to-end ETL automated framework on Azure Databricks platform.
Created Databricks template to load data to Datamart using different load-strategies like append, upsert, overwrite, Type-2 etc. using Spark.
Optimizes workflows and data pipeline in Azure.
Continuously monitor and manage data pipeline from a single console.
Cost-efficient and fully managed cloud data transformation tool that scales on demand & Reduce Overhead cost.

Snowflake Developer

Johnson and Johnson

08.2018 - 04.2019

Company Overview: a global leader in pharmaceuticals and consumer health products
Worked on a Bill of Material (BOM) management system for Johnson & Johnson, a global leader in pharmaceuticals and consumer health products. This project focused on building a centralized data framework to track and manage raw materials, active ingredients, and formulations used in manufacturing life-saving medical devices and pharmaceutical products.
As Part of this project, we built the Bill of Material (BOM) process for various J&J facilities. Created data structure for end-to-end tracking of components, materials, formulas and Ingredients used to build J&J Products using Snowflake data model.
Collaborated with Business SMEs to understand business problems and technical requirements.
Developed a scalable Medallion Architecture in Snowflake for Data Lake to enhance data management, ensure data reliability, efficient storage, and optimized query performance for analytics.
Engineered data resiliency processes for seamless data ingestion from both internal and external stages (S3) into Snowflake, utilizing Snowflake Stored Procedures and Tasks to guarantee consistent and reliable data loading.
Implemented incremental and full table refresh, significantly reducing manual intervention and ensuring up-to-date data for business insights.
Created a Proof of Concept (POC) for schema evolution with Iceberg, demonstrating enhanced flexibility and adaptability to evolving data requirements while maintaining data integrity.
A global leader in pharmaceuticals and consumer health products

Data Engineer

Exeliq Consulting Inc. / Trustmark

02.2018 - 07.2018

The 'Cloud Governance' tool streamlines the overall governance of the client-side cloud environment after migrating to the cloud. The tool ensures that the cloud environment ensures ease of compliance, enhanced security, optimum utilization of resources, cost optimization, and standardization of Processes for seamless scaling of the environment.
Built and setup end-to-end Cloud governance framework for Client Azure cloud environment.
Customized cloud governance as per the client's needs.
Created Python framework with specific local and global industry compliance standards.
Optimized workloads and resource allocations for Significant cost optimization.
Studied and tested insightful reports and recommendations for a continuous cloud cost & resource optimization process.
Automated centralized cloud monitoring which enabled Audit, Security, and Compliance with the cloud platform.
Created and managed role-based access control for enhanced security compliance, granular level security, and policy management using the Python framework.
Used Github for version control and Github actions for integration and deployment.
Built cloud resource and cost-monitoring customized dashboards using Tableau.

Cloud Engineer

Caterpillar

08.2017 - 01.2018

This project is automation and orchestration of complete BOM and BOD Pre and Post validation process.
Environment: Google Cloud Composer, Google Databricks, Google Storage, Google Cloud Functions, Google Compute Engine.

Associate Data Engineer

Numerator

01.2017 - 07.2017

This project is implementing a Data warehouse using Pentaho and Snowflake. Also, migrating Pentaho to airflow for distributed processing & automation.

Education

Bachelor of Engineering - EXTC

Mumbai University

India

Skills

Cloud Data Engineering (AWS, Azure, GCP)

Databricks, Snowflake, DBT Labs

Scalable ETL/ELT Pipelines

Metadata-Driven Frameworks

Real-time & Streaming Data Processing

Data Governance & Quality

Big Data Processing & Optimization

API development

NoSQL databases

SQL programming

Devops, CI/CD Automation (Gitlab, Github,Terraform, Jenkins)

Certification

Databricks Certiﬁed Developer – Apache Spark 2.x
SnowPro Core Certification
Databricks Accreditation Generative AI Fundamentals
Google Cloud Certiﬁed Professional Data Engineer
M101J: MongoDB for Java Developers.

Timeline

Lead Data Engineer

Ascension: Healthcare

05.2025 - Current

Lead Data Engineer

Exeliq Professional Partners

07.2024 - 04.2025

Senior Data Engineer

FedEx Express

04.2023 - 06.2024

Cloud Data Engineer

Advent Health

11.2022 - 03.2023

Sr. Data Engineer

MyFitnessPal

02.2022 - 10.2022

Cloud Engineer

HSBC

03.2021 - 01.2022

Data Engineer

Exeliq Consulting Inc.

07.2020 - 02.2021

Big Data Engineer

Ingredion

05.2019 - 06.2020

Snowflake Developer

Johnson and Johnson

08.2018 - 04.2019

Data Engineer

Exeliq Consulting Inc. / Trustmark

02.2018 - 07.2018

Cloud Engineer

Caterpillar

08.2017 - 01.2018

Lead Data Engineer

Exeliq Consulting Inc

01.2017 - Current

Associate Data Engineer

Numerator

01.2017 - 07.2017

Bachelor of Engineering - EXTC

Mumbai University

Publications

Artificial Intelligence, https://medium.com/@hitesh09parab/ai-ml-meets-data-engineering-automating-table-field-mapping-with-semantic-intelligence-634683692cec
Artificial Intelligence, https://medium.com/@hitesh09parab/from-order-takers-to-problem-solvers-the-evolution-from-ai-agents-to-agentic-ai-c7cb61c78f7f
Snowflake & DBT, https://medium.com/@hitesh09parab/dbt-reusable-snowflake-transformations-using-macros-and-jinja-f472d0f76875
Snowflake & DBT, https://medium.com/@hitesh09parab/how-to-read-databricks-tables-from-snowflake-4584114bba77
Snowflake & DBT, https://medium.com/@hitesh09parab/harness-the-power-of-snowflake-stages-dynamic-views-using-snowpark-made-easy-7783427b1e3d
Databricks, https://medium.com/@hitesh09parab/data-quality-framework-in-databricks-using-great-expectations-84baf00bb196
Databricks, https://medium.com/@hitesh09parab/part-2-data-quality-dashboard-a-visual-approach-to-monitoring-expectations-in-databricks-4c490fc25891
Databricks, DevOps, https://medium.com/@hitesh09parab/how-to-automate-aws-databricks-jobs-workflows-to-execute-notebooks-from-gitlab-repo-using-service-2bc71b26476d
Databricks, DevOps, https://www.linkedin.com/posts/hitesh-parab-0988a59a_databricks-gitlab-cicd-activity-7092259065166233600-dANy/
Databricks, FinOps, https://medium.com/@hitesh09parab/databricks-observability-cfaefe65ab85
Databricks, https://medium.com/@hitesh09parab/databricks-connect-for-jupyter-36c62339c58d

Summary

Overview

Work History

Lead Data Engineer

Lead Data Engineer

Lead Data Engineer

Senior Data Engineer

Cloud Data Engineer

Sr. Data Engineer

Cloud Engineer

Data Engineer

Big Data Engineer

Snowflake Developer

Data Engineer

Cloud Engineer

Associate Data Engineer

Education

Bachelor of Engineering - EXTC

Skills

Certification

Timeline

Lead Data Engineer

Lead Data Engineer

Senior Data Engineer

Cloud Data Engineer

Sr. Data Engineer

Cloud Engineer

Data Engineer

Big Data Engineer

Snowflake Developer

Data Engineer

Cloud Engineer

Lead Data Engineer

Associate Data Engineer

Bachelor of Engineering - EXTC

Publications

Similar Profiles

Nikhil ParabNikhil Parab

Deepak satamDeepak satam

AKRAM TOTEAKRAM TOTE

Emily GonzalezEmily Gonzalez

Sai BhandarSai Bhandar