Summary
Overview
Work History
Education
Skills
Timeline
Generic

Dhruv Gupta

United States

Summary

  • Over 4 years of hands-on experience in data engineering, focusing on designing and optimizing complex data pipelines.
  • Proficient in ETL processes, data integration, and real-time data processing using tools like Apache Spark, Kafka, and AWS Glue.
  • Expertise in cloud platforms including AWS and Azure, with a strong background in deploying scalable data solutions.
  • Skilled in working with Big Data technologies such as Hadoop, Hive, and MapReduce to manage and process large datasets.
  • Extensive experience in SQL and Python for data manipulation, analysis, and building data-driven solutions.
  • Strong knowledge of data warehousing concepts, with hands-on experience in Redshift, Snowflake, and similar technologies.
  • Adept at building CI/CD pipelines using Jenkins, Docker, and Kubernetes, ensuring efficient and reliable deployment processes.
  • Proficient in using Control-M for batch processing and job scheduling, optimizing workflow automation.
  • Experienced in Unix/Linux environments, leveraging shell scripting for automation and system management.
  • Familiar with machine learning frameworks like Keras and TensorFlow, integrating ML models into data pipelines.
  • Strong problem-solving skills with a focus on improving data processing efficiency and reducing operational costs.
  • Excellent collaboration and communication skills, working effectively with cross-functional teams to deliver data-driven insights.

Overview

4
4
years of professional experience

Work History

Data Engineer

Meditab Software Inc.
03.2023 - 07.2024
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability
  • Collaborated with cross-functional teams to implement CI/CD pipelines using Jenkins and Terraform, streamlining deployment process by 30%
  • Built and optimized Snowflake data models, enhancing data warehousing efficiency and improving query performance by 15%
  • Developed Spark applications using PySpark and optimized processing efficiency, resulting in 20% reduction in processing time for large datasets
  • Orchestrated end-to-end AWS-based data pipelines, integrating data from various sources into Amazon S3, and performed data transformations using AWS Glue

Research Assistant

University Of Pittsburgh Joseph M. Katz Graduate School Of Business
09.2022 - 12.2023
  • Organized research materials, maintaining well-ordered workspace conducive to productivity.
  • Designed ML algorithms (TensorFlow, PyTorch, Bayesian Networks) for breast cancer severity assessment, achieving 95% accuracy and aiding early diagnosis
  • Streamlined ETL processes with Hive and designed scalable data pipelines using Apache Spark, boosting data processing efficiency by 30% and real-time analytics capabilities by 25%, enabling more informed business decisions
  • Integrated Linear Regression for sales forecasting and K-Means clustering for customer segmentation into marketing, sales, and customer service workflows, boosting departmental efficiency and overall operational effectiveness by 15%

Computer Vision Engineer

Human Engineering Research Laboratories
05.2023 - 09.2023
  • Enhanced YOLOv3 for real-time object detection to 99.93% accuracy, reducing navigation errors by 50%. Integrated ORB-SLAM for mapping and A* and RRT path planning algorithms, improving task completion time by 20%
  • Refined Random Forest models to boost precision in predicting user behaviors, achieving 30% accuracy gain and 25% efficiency improvement over prior algorithms, enhancing insights into user preferences and engagement patterns
  • Developed and integrated collaborative filtering and matrix factorization algorithms, increasing customer engagement by 18% and boosting sales by 15% through tailored recommendations

AI Developer

Dosepacker
04.2020 - 07.2022
  • Proficiency in configuring, working with big data technologies achieving a 20% progress in performance using Hadoop, Sqoop, Spark, Hive and HBase on AWS EMR clusters
  • Assisted in the migration of legacy data systems to AWS Redshift, reducing infrastructure costs by 15%
  • Analyzed large datasets to identify trends and patterns in customer behaviors.
  • Implemented data pipelines with Apache Airflow for scheduling and monitoring ETL workflows, ensuring data accuracy and timeliness
  • Applied Snowflake features like Snow Pipes and stages to process 100GB data every day in real- time and ingest the data into Snowflake tables
  • Employed Apache Spark with Random Forest and Gradient Boosting to enhance model accuracy. Optimized data processing with AWS Glue, enhancing efficiency. Enhanced sales forecasting, inventory management, and customer behavior analysis, boosting supply chain productivity
  • Developed targeted marketing campaigns using K-means clustering for advanced customer segmentation. Launched to optimize campaign effectiveness and personalize customer interactions, achieving a 20% boost in ROI

Education

Master of Science - Information Science

University of Pittsburgh
Pittsburgh, PA
04.2024

Bachelor of Science - Information Technology

VIT University
Vellore, India
06.2022

Skills

  • Scripting Languages: Python, SQL, C, Java, ROS
  • Big Data Processing: Hadoop, MapReduce, Spark
  • Machine Learning: TensorFlow, Keras, PyTorch, NumPy, Pandas
  • ETL Tools: AWS Glue, Apache Airflow, Control-M
  • Cloud Platforms: AWS ( S3, Lambda, Redshift, EMR, RDS), Azure ( Databricks, Data Lake)
  • Databases: MySQL, MongoDB, Snowflake
  • OS: Unix/Linux
  • CI/CD Tools: Docker, Kubernetes, Terraform, AWS CodePipeline
  • Version Control: Git, GitHub, GitLab

Timeline

Computer Vision Engineer

Human Engineering Research Laboratories
05.2023 - 09.2023

Data Engineer

Meditab Software Inc.
03.2023 - 07.2024

Research Assistant

University Of Pittsburgh Joseph M. Katz Graduate School Of Business
09.2022 - 12.2023

AI Developer

Dosepacker
04.2020 - 07.2022

Master of Science - Information Science

University of Pittsburgh

Bachelor of Science - Information Technology

VIT University
Dhruv Gupta