Data Engineer with 9+ years of experience specializing in developing and optimizing data pipelines using Python, Pyspark, AWS and Google Cloud. Demonstrates strong skills in machine learning model deployment and data governance, enhancing business outcomes through strategic data integration and automation. Known for effective collaboration with teams to deliver robust, data-driven solutions that drive operational efficiency and growth.
Overview
9
9
years of professional experience
Work History
Data Engineer
Better Being
07.2024 - Current
Develop data pipelines, enhance database performance, ensure data integrity through validation
Design and deploy machine learning models for predictive analytics, streamline data workflows
Collaborate with teams to document processes, establish data governance frameworks
Lead data integration, resolve data quality issues, automate reporting for efficiency
Architect scalable ETL pipelines and optimize database performance while implementing robust data validation protocols for enhanced system reliability
Spearhead implementation of cutting-edge machine learning solutions for business intelligence, focusing on real-time data processing capabilities
Drive cross-functional initiatives to strengthen data governance standards, creating comprehensive documentation for sustainable data practices
Transform legacy data systems through strategic integration projects, delivering automated reporting solutions that enhance operational efficiency
Data Engineer
Tausight
01.2024 - 06.2024
Engineered a data pipeline on Google Cloud, leading to a $700K ARR increase for CrowdStrike Marketplace
Enhanced ePHI data pipeline efficiency on GCP, halving processing time
Designed and implemented advanced telemetry logging systems, incorporating NoSQL databases to strengthen healthcare data protection measures
Refined data monitoring architecture using Falcon Logscale, enabling precise threat detection and maintaining regulatory compliance standards
Optimized healthcare data pipeline performance on GCP, integrating NoSQL databases and advanced logging systems for robust ePHI protection
Architected cloud-based data solutions that generated $700K ARR increase through strategic CrowdStrike Marketplace integration
Streamlined GCP healthcare data pipeline architecture, integrating telemetry logging with NoSQL databases, resulting in substantial processing efficiency gains
Spearheaded CrowdStrike Marketplace integration strategy, deploying data solutions that directly contributed to $700K annual recurring revenue
Data Engineer Intern
Tausight
08.2023 - 12.2023
Led data transformation using Google Cloud Dataflow, enhancing data analysis
Developed robust monitoring using GCP, improving tracking of ePHI files
Leveraged technical skills in data engineering to provide innovative solutions
Implemented real-time data pipeline architecture in Google Cloud Dataflow, optimizing healthcare data processing and strengthening ePHI security protocols
Streamlined cloud monitoring systems using GCP stack, enabling proactive detection of data anomalies and ensuring regulatory compliance
Partnered with cross-functional teams to integrate monitoring solutions, establishing robust data governance practices across platforms
Engineered real-time data pipelines in Google Cloud Dataflow to strengthen ePHI security, implementing robust monitoring systems for healthcare data processing
Designed comprehensive GCP monitoring solutions for ePHI files, enabling proactive anomaly detection and maintaining strict regulatory compliance
Data Engineer
AT&T
01.2023 - 08.2023
Led team in creating data solutions using PySpark & Kafka
Optimized real-time data ingestion and processing
Delivered scalable AWS-based data engineering solutions
Streamlined communication with stakeholders to drive project success
Continuously improved system performance and cost efficiency
Engineered high-performance data pipelines integrating Kafka and PySpark, enhancing system reliability and processing capabilities for large-scale analytics
Architected cloud-native data solutions on AWS, implementing best practices for data governance and cost optimization across enterprise systems
Fostered cross-functional partnerships to align data engineering initiatives with business objectives, ensuring seamless integration of solutions
Pioneered real-time data processing improvements through advanced streaming architectures, reducing latency and enhancing data accessibility
Implemented robust data quality frameworks and monitoring systems, maintaining high standards of data integrity across distributed platforms
Developed advanced data pipeline solutions using PySpark and Kafka, optimizing real-time processing capabilities while maintaining robust data governance standards
Data Engineer co-op
American Tire Distributors (ATD)
01.2022 - 08.2022
Led ETL development using Python, PySpark, SQL, integrating data from 7 distinct sources into Snowflake
Enhanced SQL query performance by 25% during Oracle & GCP to Snowflake migration
Devised an automation model, reducing manual checks by 80% and accelerating deployment process by 30+ hours per sprint
Implemented real-time data pipeline optimization strategies in Snowflake, resulting in substantial query cost reduction and improved system efficiency
Streamlined cross-platform data integration workflows between Oracle, GCP, and Snowflake, delivering measurable performance improvements
Architected automated data validation framework using Python and PySpark, significantly reducing manual intervention in deployment cycles
Orchestrated end-to-end ETL processes across multiple data sources, ensuring data accuracy and maintaining robust documentation standards
Partnered with cross-functional teams to optimize data pipeline architecture, establishing efficient data transformation protocols
Developed robust ETL solutions in Snowflake utilizing Python and PySpark, optimizing data integration from multiple sources while achieving 25% SQL performance boost
Data Engineer
Apollo TeleHealth
02.2016 - 12.2020
Led technical design and implementation of data solutions
Developed ETL pipelines, handling large data volumes with AWS Glue and Lambda
Streamlined real-time data ingestion and processing using Kafka and AWS Kinesis
Optimized data processing jobs for batch and real-time workloads
Delivered optimal data engineering solutions aligning with business objectives
Engineered scalable data architecture using AWS services, implementing robust ETL solutions that enhanced system reliability and processing efficiency
Integrated Kafka streaming platform with AWS Kinesis, reducing data latency and enabling real-time analytics for healthcare monitoring systems
Designed microservices-based data pipelines, modernizing legacy systems and improving data accessibility across healthcare platforms
Optimized query performance and data warehouse operations, establishing efficient data retrieval patterns for clinical applications
Partnered with cross-functional teams to implement data governance protocols, ensuring compliance with healthcare data regulations
Spearheaded AWS data architecture modernization, implementing microservices-based pipelines that enhanced healthcare data accessibility and processing efficiency
Education
Master of Science - Information Systems
Northeastern University
Boston, MA
12.2022
Bachelor of Technology - Computer Science and Engineering
Indian Institute of Technology
06.2014
Skills
Python
SQL
NoSQL
Unix
PySpark
Snowflake
Google Cloud
Airflow
AWS
Alteryx
Talend
Tableau
Power BI
MySQL
Kafka
Git
Jira
Docker
Data Analysis
Data Integration
Data Modeling
ETL Development
ETL development
Data warehousing
Data modeling
Data pipeline design
Data migration
Big data processing
Performance tuning
Timeline
Data Engineer
Better Being
07.2024 - Current
Data Engineer
Tausight
01.2024 - 06.2024
Data Engineer Intern
Tausight
08.2023 - 12.2023
Data Engineer
AT&T
01.2023 - 08.2023
Data Engineer co-op
American Tire Distributors (ATD)
01.2022 - 08.2022
Data Engineer
Apollo TeleHealth
02.2016 - 12.2020
Bachelor of Technology - Computer Science and Engineering