Rushi Patel - Data Engineer

Summary

Results-driven Data Engineer with 5 years of experience in designing, building, and optimizing scalable data pipelines. Proficient in Apache Airflow, AWS, Linux, and Spark SQL, with expertise in ETL development, workflow automation, and healthcare reference data management. Passionate about data infrastructure, performance optimization, and continuous innovation.

Work History

Data Engineer

Veeva Systems Inc.

2020 - 2025

Designed and built a high-performance ETL pipeline to process 2+ million Health Care Professional (HCP) emails weekly, improving processing efficiency by 40% and ensuring compliance with healthcare data regulations.
Led the migration of 100+ data sources across three full-scale data pipeline transitions, resulting in a 50% improvement in scalability, 40% enhancement in data consistency, 60% faster data processing, and a 30% reduction in operational costs, ensuring seamless integration across multiple platforms.
Processed 25+ million healthcare records of HCOs and HCPs, including affiliations, NPI, and specialty data, ensuring compliance with DEA, OIG, and Ohio TDDD while optimizing large-scale data processing with Spark SQL and PySpark for enhanced query performance and reporting efficiency.
Developed an automated data quality framework to validate files before production, ensuring data accuracy, integrity, and compliance while minimizing errors.
Built a 24/7 automation framework for file processing, which runs daily and automatically pushes processed files to production via APIs, reducing manual intervention by 90% and ensuring high availability and reliability.
Managed the production environment for Data Operations, ensuring system stability, high availability, and performance optimization for critical data workflows.
Implemented and optimized Apache Airflow DAGs for workflow orchestration and automated data ingestion pipelines using Python and Shell scripting, improving scheduling, monitoring, data availability, and operational efficiency.
Developed automated Tableau reports and dynamic dashboards, ensuring real-time distribution of key business metrics to stakeholders and customers.

Education

Bachelor of Science - Software Engineering

Drexel University

Philadelphia, PA

06-2020

Skills

Data Engineering: ETL Development, Data Pipelines, Data Modeling, Data Analysis, Data Warehousing, Data Governance
Technologies & Tools: Apache Airflow, AWS (S3, Lambda, EC2, RDS, Glue, Redshift), MongoDB, PostgreSQL, MySQL, Spark SQL, Visual Studio Code
Programming Languages: Python, SQL, Shell Scripting, Java, Scala

Infrastructure & DevOps: Linux, CI/CD (Jenkins, GitHub, GitLab)
Big Data & Analytics: Apache Spark, PySpark, Pandas
Reporting & Visualization: Tableau, Excel

Timeline

Data Engineer

Veeva Systems Inc.

2020 - 2025

Bachelor of Science - Software Engineering

Drexel University