Data Engineer
- Designed and built a high-performance ETL pipeline to process 2+ million Health Care Professional (HCP) emails weekly, improving processing efficiency by 40% and ensuring compliance with healthcare data regulations.
- Led the migration of 100+ data sources across three full-scale data pipeline transitions, resulting in a 50% improvement in scalability, 40% enhancement in data consistency, 60% faster data processing, and a 30% reduction in operational costs, ensuring seamless integration across multiple platforms.
- Processed 25+ million healthcare records of HCOs and HCPs, including affiliations, NPI, and specialty data, ensuring compliance with DEA, OIG, and Ohio TDDD while optimizing large-scale data processing with Spark SQL and PySpark for enhanced query performance and reporting efficiency.
- Developed an automated data quality framework to validate files before production, ensuring data accuracy, integrity, and compliance while minimizing errors.
- Built a 24/7 automation framework for file processing, which runs daily and automatically pushes processed files to production via APIs, reducing manual intervention by 90% and ensuring high availability and reliability.
- Managed the production environment for Data Operations, ensuring system stability, high availability, and performance optimization for critical data workflows.
- Implemented and optimized Apache Airflow DAGs for workflow orchestration and automated data ingestion pipelines using Python and Shell scripting, improving scheduling, monitoring, data availability, and operational efficiency.
- Developed automated Tableau reports and dynamic dashboards, ensuring real-time distribution of key business metrics to stakeholders and customers.
