Summary
Overview
Work History
Education
Skills
Certification
Personal Information
Technical Profile
Timeline
Generic

Dhruv R Gandhi

Dallas

Summary

Lead Data Engineer at Infogain with strong expertise in data architecture and team leadership. Designed scalable data pipelines that improved analytics trust and reduced data issues by 30%. Proficient in PySpark and Azure technologies, delivering impactful solutions in logistics and carbon emissions calculations.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Lead Data Engineer

Infogain
Dallas
04.2023 - Current
  • Designed end-to-end data architecture for logistics applications, enhancing scalability and cost-efficiency across cloud and on-prem systems.
  • Collaborated with Product, Engineering, and Business teams to gather requirements and create robust data models.
  • Managed a team of 4 data engineers, ensuring development of high-quality, scalable data pipelines.
  • Built reusable data ingestion and transformation frameworks integrating over 10 upstream sources for near real-time data access.
  • Developed essential logic for logistics processes including route planning, shipment tracking, and carbon emission calculations.
  • Served as a liaison between offshore and onsite teams to facilitate knowledge transfer and reduce development time.
  • Implemented data quality checks and validation layers, decreasing downstream data issues by 30% and enhancing analytics trust.

Data Engineer

Infogain
Bangalore
09.2021 - 03.2023
  • Designed and implemented real-time data pipeline integrating over 500 million raw records from multiple sources using Azure Event Hubs and PySpark..
  • Developed complex logic in Azure Databricks with PySpark to calculate carbon emissions at package level for subsidiary companies.
  • Established best practices for continuous process automation in data ingestion and pipeline workflows.
  • Executed partition pruning and query optimization on PostgreSQL database, enhancing performance by 80%.
  • Created generic test framework for data comparison across systems.
  • Contributed to architectural design of project, ensuring optimal data structure and flow.

Data Engineer

Accenture
Mumbai
01.2020 - 08.2021
  • Company Overview: Client: Linde, Project: Probabilistic Demand Forecasting
  • Analyze and understand business requirement
  • Worked on Azure Data Factory, Azure Databricks, ADLS Gen 2, Azure Blob Storage, Azure SQL
  • Designed and developed 100+ Azure Data factory pipelines for Data ingestion and transformation
  • Worked on SCD1 and SCD2 ADF pipelines
  • Automated ADF using event or scheduled based Triggers
  • Created logic app for alert trigger and data refresh
  • Implemented CI/CD pipeline using Azure DevOps in cloud
  • Client: Linde, Project: Probabilistic Demand Forecasting

Informatica Data Quality Developer

Accenture
Mumbai
01.2019 - 08.2021
  • Company Overview: Client: Linde, Project: Vendor Duplicate
  • Used various IDQ transformations like Match, Standardization, Labeler, Address Validator
  • Used major components like parsers, mappers and streamers in data transformation studio for conversion of XML files to other formats
  • Experience in using Informatica Analyst tool to find out content, quality, and structure of source data by auto profiling
  • Worked on Data Cleansing by developing complex logic using Regular functions and Java code
  • Developed complex Unix shell scripts to modify multiple data quality reports generated through data quality tool
  • Client: Linde, Project: Vendor Duplicate

SQL/ETL Developer

Accenture
Mumbai
02.2017 - 08.2021
  • Company Overview: Client: Linde, Project: MaRS
  • Implemented slowly changing dimension methodology for accessing full history via Informatica power center
  • Performed incremental aggregation to load incremental data into aggregate tables
  • Used complex Informatica transformations like Normalizer, Aggregator, Java, Update strategy, Lookup, Expression
  • Developed Unix shell scripts for event waits, wait for files and pre-session tasks
  • Performed database defragmentation while optimizing entire SQL queries, improving database performance and speed by 70%
  • Utilized joins and sub-queries to simplify complex queries involving multiple tables while optimizing procedures and triggers in production
  • Developed complex logic for scheduling of Informatica jobs with help of Oracle query and Unix shell scripts
  • Client: Linde, Project: MaRS

Education

B.Tech - Electrical and Electronics Eng

Vellore Institute of Technology
Vellore
06.2016

XII - Science

Central Board of Secondary Education (CBSE)
Kota, Rajasthan
04.2012

Skills

  • Team leadership
  • PySpark and Spark
  • Java programming
  • Data architecture
  • Azure PostgreSQL and Oracle
  • Microsoft SQL Server and DB2
  • Databricks and Data Factory
  • Event Hub and Blob storage
  • Delta Lake management
  • Unix shell scripting
  • Terraform automation
  • Hive querying
  • MapReduce processing

Certification

  • Cloud Data Quality & Data Governance
  • Microsoft AZ – 900: Azure Fundamentals

Personal Information

Title: Data Engineer

Technical Profile

8, Pyspark, Java, Azure PostgreSQL, Oracle 12g, Microsoft SQL Server, DB2, Databricks, Data Factory, Event Hub, Blob, Delta Lake, Unix shell script, Terraform, Hive, Sqoop, Spark, Map Reduce

Timeline

Lead Data Engineer

Infogain
04.2023 - Current

Data Engineer

Infogain
09.2021 - 03.2023

Data Engineer

Accenture
01.2020 - 08.2021

Informatica Data Quality Developer

Accenture
01.2019 - 08.2021

SQL/ETL Developer

Accenture
02.2017 - 08.2021

B.Tech - Electrical and Electronics Eng

Vellore Institute of Technology

XII - Science

Central Board of Secondary Education (CBSE)
Dhruv R Gandhi