Summary
Overview
Work History
Education
Skills
Websites
Certification
Timeline
Generic
Sameer Shaik

Sameer Shaik

Westchester,OH

Summary

Service-oriented Data Engineer professional with commitment to excellence and more than 7 years of experience. Effective collaborator promoting great Software Engineering and Data Engineering skills. Proven history of building and managing efficient data pipelines reliability. Skilled in working on batch and streaming data for On-prem and Cloud data majorly for Retail, e-comm clients. .

Overview

8
8
years of professional experience
2
2
Certification

Work History

Associate Manager Data Engineer

Grainger W.W
San Jose
09.2023 - Current
  • Leading the offshore team for the order data product team and managing the offshore and client interaction
  • Implementing the micro batch processing of KAFKA stream of data from various topics while reading from s3 location
  • Understanding unity catalog in data bricks to query through different layers of data using same shared space for analysis
  • Enable workflows and Job scheduling for the medallion architecture created for the framework to monitor the jobs that read from SAP BODs and Kafka systems
  • Integrate Data quality notebooks before writing it the curated layers for the delta tables
  • Analyze cost and performance benefits of Compute (Job), Pool clusters in data bricks
  • Perform audit logging strategies before loading (gold layer to snowflake) into snowflake tables for the consumption
  • Utilizing the DLT tables to seamlessly ingest the data into different live tables in real time using delta live tables and auto-loaders.

Senior Data Engineer

KROGER
Cincinnati
05.2023 - 08.2023
  • Working as a Data Engineer for the Kroger’s Fulfillment center data platform team
  • Worked on Azure Functions, Cosmos DB, ADX and Event hubs to consume orders and take it to aggregation steps and push the data for downstream web applications at the Fulfillment stores
  • Implemented and understood python with OOPS developed framework for Azure Functions code to consume real time order events from topics.

Associate Data Engineer

Publicis Sapient (Shell Energy Ltd)
05.2021 - 07.2022
  • Leveraging data and analytics to drive insights, improving decision making and provide great value for NES portfolio and shell’s carbon value chain
  • Used Azure services like Data Factory, Databricks, Logic Apps for various data transformations to ingest data from multiple sources such as (SharePoint, ZEMA API, Markets etc.)
  • Implemented Azure Functions to automate the web scrape for the market data from different webpages and enable download and upload of tables to blob storage using beautiful soup, requests modules with python
  • Mount ADLS storage to data bricks and read raw data and write the aggregated logic to feed the current data model which helped in improving the data consumption by downstream applications
  • Implemented optimization techniques at python/SQL level as well as Spark UI level for improved performances
  • Developed preprocessing jobs using Spark Data frames to flatten JSON documents into a flat file format
  • Handle various types of loads using the meta config table to incrementally load the load into target systems.

Infra Specialist (Airflow, Grafana)

IBM Pvt Ltd (DBS Bank, Singapore)
01.2021 - 05.2021
  • Using Airflow for source system data process analysis and supporting the Dev team for production tasks
  • Handling the DAG failures and finding the Root Cause for any failures of dependent tasks.

Data Engineer

TCS Pvt Ltd (Proximus, Belgium)
02.2016 - 12.2020
  • Migrated the Ab-Initio ETL graphs to Spark SQL and python scripting and saved them to HIVE database
  • Experience in integrating the scripts into the meta data driven framework for the automated data transformations and applying Data Structures and Algorithmic approach for any optimization
  • On-premises data validation, connecting to databases extract data, using Spark and Scala with hdfs file system to submit spark jobs as JAR files to CDP
  • Worked on scala recursions, anonymous functions and created various custom packages for reusability
  • Developed data pipelines in ADF for all data sources available (SQL Server, MySQL, Oracle, SFTP) and trigger them using a master pipeline
  • Worked on Web activity API to get third party data as JSON and xml files and parse them and save as parquet datasets
  • Used different transformations like lookup, condition split, row count Transformation, Derived Column, Data Conversion, etc
  • Involved in building Data Mart and multi-dimensional data modelling like Star Schema and Snowflake Schema
  • Worked in collaboration with development and operations team and BA’s to understand customer needs and followed agile methodology
  • Used to work on delta file formats with databricks and perform Incremental load scripts and Auto-load functionality of streaming data
  • Worked on OSIX streaming system to send data to Event hubs and process data to store to Mongo collections allowing for the downstream applications to consume and create reports
  • Worked on SQL concepts like Aggregates, Views, Database objects, and stored procedures
  • Created Databrick notebooks to streamline and curate the data as per the Data Modelling done for various business use cases and mounted blob storage on Databricks
  • Modify and use CI/CD pipelines in Azure Devops with terraform to integrate with build and release pipelines through YML file system for continuous integration and deployment to various environments
  • Perform code review within team and use VScode on premise and git Integrated with databricks on cloud for source code versioning
  • Unit testing with pytest module and anomaly detection, constraint validation with pydeequ module
  • Used Azure functions along with python modules like beautiful soup, request and urllib to implement a data scraping functionality from webpage and upload data to ADLS
  • Built Interactive PowerBI dashboards for a team of voluntary carbon space to check for carbon offsets produced at various geographies and show dynamic changes in the content based on the filters, usability.

Education

Master of Science - Business Analytics

University of Cincinnati, Carl H. Lindner College of Business
Cincinnati, Ohio

Skills

Microsoft Azureundefined

Certification

Certified Data Engineer Associate, Databricks

Timeline

Certified Data Engineer Associate, Databricks

01-2024

Associate Manager Data Engineer

Grainger W.W
09.2023 - Current

Senior Data Engineer

KROGER
05.2023 - 08.2023

Associate Data Engineer

Publicis Sapient (Shell Energy Ltd)
05.2021 - 07.2022

Infra Specialist (Airflow, Grafana)

IBM Pvt Ltd (DBS Bank, Singapore)
01.2021 - 05.2021

Data Engineer

TCS Pvt Ltd (Proximus, Belgium)
02.2016 - 12.2020

Master of Science - Business Analytics

University of Cincinnati, Carl H. Lindner College of Business
Sameer Shaik