Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Gourav Shrivastava

Houston

Summary

Senior Data Engineer specializing in Azure solutions, driving a 40% reduction in ETL process time through automation. Proficient in establishing data governance frameworks and enhancing data quality, supporting regulatory compliance and strategic analytics initiatives.

Overview

14
14
years of professional experience
1
1
Certification

Work History

Sr. Big data Developer

TD bank
09.2022 - Current
  • Established and supported end-to-end data lineage from source systems through Azure Data Factory, Databricks, and curated layers, supporting regulatory and audit requirements.
  • Metadata Integration with Azure Platform - Integrated Collibra Insights Extracts with Azure Databricks and Azure Data Factory, automating technical metadata ingestion and lineage synchronization in SRZ, CZ, AZ Layers.
  • DQ Ingestion to Rahona SRZ/CZ/AZ – Successfully completed, Production Deployment by creating efficient automated ETL ADF pipelines, workflows by curating data for analytical reports of TBSM data.
  • Automated ETL processes to simplify data wrangling and reduce processing time by 40%.
  • Used Scala to store streaming data to HDFS and to implement Spark for faster processing of data (40% faster)
  • Developed and managed scalable data environments (data lakes, data warehouses, lake houses) to support analytics, reporting, and system integration.
  • Developed clean, well-modeled datasets and self-service analytics tools for cross-departmental use.
  • Built and deployed dashboards, reports, and analytical models to support data-driven decision-making.
  • Collaborated with IT teams to ensure the sustainability, scalability, and security of the data architecture.
  • Monitored and improved data quality, data integrity, and interoperability across enterprise systems.
  • Automated data workflows to improve efficiency, consistency, and reliability.
  • Designed and built data pipelines to ingest, transform, and prepare data from internal and external sources.
  • Supported development and enhancement of organization’s data strategy, governance framework, and analytics roadmap to align with digital transformation and organizational priorities.
  • Developed and maintained Metadata Management and Master Data Management (MDM) processes, including data dictionaries, to ensure consistent data standards.
  • Implemented and managed Data Catalog initiatives to enhance data accessibility, data discovery, and organizational understanding of data assets.
  • Increased the efficiency of the data fetching by approximately 30% using query optimization and indexing
  • Implemented Cloud Security and Data Loss Protection
  • Worked on KYC and Customer 360 initiatives by governing critical reference data and customer data assets by Orchestrating ETL framework pipelines to ensure accuracy, consistency, and compliance.
  • Translated analytical insights into clear, actionable recommendations for leadership and decision-makers.
  • Translated analytical insights into clear, actionable recommendations for leadership and decision-makers.

Big Data Engineer & Developer

Wipro Limited
06.2018 - 09.2022
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
  • Developed extraction framework using Databricks for credit card, auto finance, and currency upgrade data, enhancing data accessibility.
  • Automated data loading into Hadoop Distributed using Oozie, facilitating faster reviews and enabling first mover advantages.
  • Developed an ingestion process using Podium for delimited and copybook files.
  • Created Hive queries that allowed market analysts to identify emerging trends through comparisons of fresh data with EDW reference tables and historical metrics.
  • Developed an extraction framework using HPF Python.
  • Tested raw data and executed performance scripts

Associate Data Engineer

TEKsystems
12.2017 - 06.2018
  • Primarily involved in Data Migration and have worked with different storage systems like Azure Blob Storage, ADLS Gen1 and ADLS Gen2.
  • Created ADF pipelines using linked services and datasets to efficiently extract data from multiple storage systems.
  • Deployed SSIS packages on Azure using Azure SSIS IR.
  • Analyzed and managed Slowly Changing Dimensions (SCD) to ensure accurate data tracking.
  • Implemented solution on Azure by using Azure Data platform services (Azure Data Lake, Data Lake Analytics, Azure synapse).
  • Implemented different activities and transformations such as Copy Activity, for each activity, Get Metadata, custom azure data factory activities.
  • For loading the data from source tables into respective facts and dimension tables, worked with various transformations such as join transformation, Derived column, aggregate column, conditional split.
  • Deploying Azure Resource Manager JSON Templates using PowerShell module.
  • Contributed to Technical Architecture Documents and engaged in Project Design and Implementation Discussions.
  • Work closely across teams (Support, Solution Architecture) and peers to establish and follow best practices while solving customer problems.
  • Extracted data from OLTP and OLAP systems utilizing SSIS/ADF for comprehensive data analysis.
  • Developed SSIS packages to consolidate data from various sources into a unified dataset for Power BI reporting.
  • Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, Azure SQL.
  • Created Power BI visualizations and dashboards based on project specifications.
  • Experience in creating and managing SSAS Tabular models, creating Dimension and Fact Tables.
  • Worked in Creating DAX Expressions and implement Partitions in Tabular models.
  • Created shared dimension tables, measures, hierarchies, levels, cubes, and aggregations on MS OLAP/ OLTP/Analysis Server (SSAS) in Tabular Model.
  • Experience in publishing the Power BI Desktop models to Power Bi Service to create highly informative dashboards, collaborate using workspaces, apps, and to get quick insights about datasets.
  • Incorporated filters to narrow down data presented and slicers for appropriate user interaction, conditional formatting to spotlight alarming or profitable numbers.
  • Utilized GitHub for version control and supported the creation of project documentation.
  • (Equifax)
  • Developed data pipelines to integrate diverse datasets for analysis.

Data Engineer

Tech Office
01.2017 - 12.2017
  • Integrated structured and unstructured data into Azure Data Lake Storage using Azure Data Factory, enabling comprehensive data analysis.
  • Used Azure Databricks to clean and transform the structureless datasets and combine them with structured data from operational databases or data warehouses.
  • Leverage native connectors between Azure Databricks and Azure Synapse Analytics to access and move data at scale.
  • Streamlined ETL process with Azure Data Factory and Databricks to ensure reliable data availability for analysis.
  • Designing, Developing ETL jobs to ingest data into Data Lake and staging it through each phase of data processing to support advanced analytics.
  • Extracted data from OLTP system, transformed it using data flow in Azure data factory and loaded it in Data Lake.
  • Used Azure Data Factory to create pipelines for orchestrating data.
  • Developed dynamic pipelines with parameters in Data Factory to automate data extraction from various systems and load into data lake for efficient transformation and analysis.
  • Worked with Azure integration runtime and self-hosted integration runtime in Azure Data factory to perform ETL on cloud as well as from on-premises to cloud.
  • After the data has loaded to data lake, used Databricks to create ETL pipelines for batch processing and streaming data.
  • Used Python for reading data from various sources like .csv, parquet, and created spark data frame for undergoing further transformation to clean up.
  • Worked on Spark data frame in Databricks for Transforming data on scale.
  • Converted data frames in temporary view in order to run SQL query on data and converted temp view back into data frame.
  • Worked on handling corrupt data as well as improving the performance of Databricks notebook.
  • Used Event hub and Azure stream analytics for real time data streaming and analysis.
  • Used dataflow in Azure Data Factory for transforming data in order to meet business requirement.
  • Developed Python Scripts in Databricks to create notebook for performing big data transformation data.
  • Used Spark to develop spark application for extracting, transforming and loading data.
  • Used Azure Cosmos DB to store data in JSON format.
  • Took the insights from Azure Databricks to Cosmos DB to make them accessible through Power BI.
  • Collaborated with different teams to understand the requirement and worked with Data Architect to design and implement the best solution.

Junior Hadoop developer

Parkhya Solution India
07.2012 - 01.2014
  • Developed Hive tables to transform and analyze data in HDFS, facilitating data-driven insights.
  • Implemented partitioning, dynamic partitions, and buckets in Hive, enhancing data retrieval efficiency.
  • Supported Java Map Reduce programs on cluster, improving data processing capabilities.
  • Importing and exporting data into HDFS and Hive using Sqoop and Flume.

Education

PG diploma - internet programming and database Management

Lambton College
Canada
06-2016

BE - Computer Science

Kailash Chandra Bansal college of technology
Indore
04-2012

Skills

  • Hadoop
  • Spark
  • SparkSQL
  • Databricks
  • Azure data factory
  • Azure synapse
  • Oozie
  • Hive
  • Sqoop
  • Flume
  • HDFS
  • MapReduce
  • Collibra
  • Talend
  • Python
  • Java
  • SQL
  • NoSQL
  • Oracle
  • Github
  • Bitbucket
  • Github copilot
  • Confluence
  • Podium
  • Unix
  • S3
  • Azure blob Storage
  • Apache
  • Cloudera
  • AutoSys
  • VsCode
  • Metadata management
  • HBase

Certification

  • Microsoft data engineer certification
  • Microsoft Azure Fundamentals
  • HDP Certified Developer from Hortonworks Certification, awarded a BadgeCert digital badge following is the link http://bcert.me/szupoumo

Timeline

Sr. Big data Developer

TD bank
09.2022 - Current

Big Data Engineer & Developer

Wipro Limited
06.2018 - 09.2022

Associate Data Engineer

TEKsystems
12.2017 - 06.2018

Data Engineer

Tech Office
01.2017 - 12.2017

Junior Hadoop developer

Parkhya Solution India
07.2012 - 01.2014

PG diploma - internet programming and database Management

Lambton College

BE - Computer Science

Kailash Chandra Bansal college of technology
Gourav Shrivastava