
Senior Data Engineer with 10+ years of experience designing, building, and optimizing large-scale ELT/ETL pipelines and big data platforms on Microsoft Azure. Proven expertise across Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage Gen2, Delta Lake, PySpark, Python, SQL, Snowflake, and Unix shell scripting. Skilled in migrating legacy SQL Server, Teradata, and on-premises ETL workloads to Azure cloud, implementing CI/CD with GitHub Actions, and enforcing enterprise data governance using Unity Catalog, Microsoft Purview, Azure Active Directory, RBAC, and Azure Key Vault. Experienced in dimensional modeling, star schema design, partitioning, performance tuning, real-time and batch processing, and delivering analytics-ready datasets for Power BI dashboards, executive reporting, and data-driven decision-making across retail, gaming, and healthcare domains.
• Engineered an end-to-end retail data platform on Microsoft Azure, building data ingestion pipelines with Azure Data Factory (ADF) to extract, transform, and load (ETL) customer transaction data from point-of-sale (POS) systems into Azure Data Lake Storage Gen2 (ADLS Gen2), processing optimized datasets in Delta Lake tables on Azure Databricks with Photon-enabled clusters for sales analytics, customer segmentation, and executive BI dashboards.
• Designed dimensional data models with star schema architecture in Azure Synapse Analytics dedicated SQL pools, and developed PySpark and Python notebooks in Azure Databricks for customer 360 analytics, shopping pattern analysis, sales forecasting, and product affinity modeling, unifying e-commerce data from Azure SQL Database with in-store purchase history to drive merchandising optimization and personalized marketing campaigns.
• Implemented Delta Lake change data capture (CDC) for incremental loading and merge operations on daily transaction updates; built real-time inventory tracking using Azure Stream Analytics and Azure Functions; established automated data quality validation frameworks in Azure Databricks using Python to verify completeness, accuracy, and consistency of sales, inventory, and customer data before publishing to downstream Delta Lake tables.
• Enforced enterprise data governance with Unity Catalog in Azure Databricks, managing role-based access controls (RBAC) over sensitive customer purchase history, product pricing, and sales performance metrics across Delta Lake tables to ensure compliance with retail data privacy regulations and organizational security standards.
• Engineered a Medallion Architecture on Azure Databricks, implementing Bronze, Silver, and Gold layers to automate data ingestion from multiple sources while ensuring data integrity through validation and transformation.
• Designed and implemented scalable ETL/ELT pipelines using Azure Data Factory (ADF), Azure Databricks, Python, PySpark, and Apache Spark, ingesting payroll transactions, employee records, timesheets, deductions, and compensation data from multiple HR and payroll source systems into Azure Data Lake Storage Gen2 with hierarchical namespaces, encryption, and lifecycle management.
• Built and optimized Azure Synapse Analytics data warehouses with star schemas, partitioning strategies, and SQL/Python query tuning; migrated legacy on-premises SQL Server databases and payroll ETL workflows to Azure Synapse, ADF, and ADLS Gen2, reducing processing time for multi-terabyte payroll datasets and improving scalability, maintainability, and overall platform performance.
• Automated end-to-end workflow orchestration using Azure Data Factory pipelines, Azure Logic Apps, and SQL Agent scheduling, integrating Azure Databricks, Synapse Analytics, and ADLS Gen2 for reliable batch and micro-batch ETL; implemented data quality validation frameworks with Great Expectations, Python, and PySpark to verify completeness, accuracy, and consistency of payroll and employee datasets before downstream analytics ingestion.
• Implemented enterprise data governance and metadata management using Microsoft Purview for cataloging, lineage tracking, and discoverability across payroll and HR datasets; enforced secure access controls using Azure Active Directory (Entra ID), Azure RBAC, and Azure Key Vault encryption; established automated backup, archival, and disaster recovery using ADLS Gen2 lifecycle policies.
• Built interactive dashboards and analytical reports using Power BI integrated with Azure Synapse Analytics, visualizing payroll KPIs, compensation summaries, employee data trends, and operational metrics; implemented logging and auditing with Azure Monitor, ADF, and Synapse Analytics to capture ETL execution details, data access patterns, and change history, ensuring full transparency and audit compliance for HR, finance, and audit stakeholders.
• Designed and implemented ETL/ELT pipelines using SQL Server Integration Services (SSIS), T-SQL, and Python, ingesting electronic health records (EHR) and claims data from source systems including Teradata data warehouse; leveraged Teradata FastExport for high-volume extraction and transformed raw healthcare datasets into analytics-ready formats for reporting, operational monitoring, and HIPAA compliance.
• Developed batch ETL workflows for claims processing and patient data ingestion using SSIS packages, Python scripts, and Teradata Parallel Transporter (TPT); used Teradata MultiLoad (MLOAD) for high-volume batch inserts and updates to claims transaction tables and Teradata TPUMP for near-real-time incremental loading of patient registration updates into Teradata enterprise data warehouse and SQL Server data marts.
• Implemented healthcare data staging and archival on SQL Server and Teradata using Teradata Archive/Recovery Utility (ARC), FastExport for large-scale offloading, table partitioning, and compression strategies to provide secure, compliant storage of protected health information (PHI) and claims datasets for analytics, regulatory reporting, and historical trend analysis.
• Automated workflow orchestration with SSIS package configurations, SQL Server Agent jobs, and Teradata scheduling utilities, coordinating BTEQ scripts, MLOAD jobs, FastExport extractions, and TPUMP incremental loads across EHR, claims, and billing datasets, reducing manual intervention and ensuring repeatable nightly and weekly batch processing.
• Developed data quality validation frameworks using Python, T-SQL stored procedures, and Teradata BTEQ scripts to validate completeness, accuracy, referential integrity, and consistency of patient records, claims, and billing datasets loaded via MLOAD and TPUMP, maintaining healthcare compliance standards and audit readiness.
• Mentored junior ETL developers on SSIS, Teradata SQL, BTEQ scripting, Teradata load utilities (MLOAD, FastExport, TPUMP, TPT), Python data engineering workflows, and healthcare-specific data processing patterns, fostering knowledge sharing while ensuring strict adherence to HIPAA, HITECH, and internal data governance policies.