
Experienced Data Engineer with 6+ years in developing, optimizing, and automating complex ETL/ELT pipelines leveraging AWS, Azure, Spark, and Snowflake. Skilled in Python and SQL for advanced data transformations and analysis. Certified AWS Associate Solutions Architect with hands-on experience. Proficient in data warehousing with Snowflake, Synapse, and relational and NoSQL databases. Expertise in ETL tools such as DBT and Informatica, as well as real-time data streaming and processing with Apache Kafka and Spark Streaming. Managed deploying containerized applications using Docker and Kubernetes, and managing infrastructure as code with Terraform. Forte in project management and SDLC methodologies, utilizing JIRA, GIT, Jenkins for CI/CD, Agile practices, and innovation strategies.
• Built and optimized scalable ETL pipelines for a healthcare project leveraging PySpark in Databricks, TSQL, and ADF to ingest and transform data from different data sources for analytics and reporting.
• Managed data storage and retrieval using Azure Data Lake Storage (ADLS) and SQL Server Management Studio (SSMS).
• Leveraged Delta Lake on ADLS Gen2 to store and sync data from Azure SQL Hyperscale into Databricks.
• Designed and maintained data repositories in Azure Synapse Analytics for master and UI tables, improving data accessibility and selfservice reporting for business users.
• Contributed to the design and management of endtoend Medicaid claims submission workflows using Azure Data Factory and Azure Databricks, improving processing efficiency by 30% and ensuring 100% compliance with 10+ state regulations.
• Processed 5 million+ Medicaid claims daily, coordinating with crossfunctional teams to submit critical healthcare data to Edifecs and CMS.
• Implemented robust data governance, security, and access controls across ADLS, Synapse, Snowflake, and Databricks environments, ensuring HIPAA and CMS regulatory compliance.
• Utilized AI-powered coding assistants such as Mosaic AI and Codium AI within Databricks notebooks and GIT to accelerate PySpark scripting, automate repetitive coding tasks, and enhance code quality, resulting in faster delivery cycles and reduced manual workload.
• Environment: Azure, Databricks, PySpark, Delta Lake, Azure Data Factory, Azure Data Lake Gen2, Azure Synapse Analytics, Azure SQL (Hyperscale), SQL, Snowflake, Git, Mosaic AI, Codium AI, HIPAA/CMS Compliance