A dedicated data engineer with extensive experience in building scalable data pipelines and optimizing data ingestion processes is eager to contribute expertise in big data technologies. Proven success in reducing data processing times and enhancing query performance aligns with the commitment to driving efficiency and innovation.
Overview
9
9
years of professional experience
Work History
Data Engineer (Client:PNC Bank)
Indotronix International Corporation (IIC)
Strongsville, OH
02.2022 - 03.2025
Analyze, Design and develop data ingestion into HDFS from legacy Oracle/Teradata/db2 using Python,Pyspark, Sqoop, Hive, and oozie
Work with different file formats (csv, zip, mainframe and pipe delimited) and build ingestion pipeline to load these files into Hive tables using CA7 scheduler
Build distributed, reliable and scalable data pipeline framework using python to ingest and process data from multiple sources (files, databases, HTTPS) define and model schemas to create Hive tables
Develop and deliver test plan documentation containing scenarios, test cases, and expected results
Supporting Integrated/Independent releases, software/hardware upgrades, server upgrades
Data Engineer
Mitrayu Solutions Pvt Ltd
Hyderabad, India
08.2016 - 07.2019
Worked on building a Centralized-Data-Lake by loading data into Apache Hive and Impala from heterogeneous databases - DB2, Oracle, and Teradata using Apache SQOOP
Developed PySpark scripts to extract and process large-scale, sensitive user/client data from Hive tables for downstream risk analytics and modeling
Build distributed, reliable and scalable data pipeline framework to ingest and process data from multiple sources (files, databases) define and model schemas to create Hive tables
Performed Tuning on Hive Queries
Write workflow which runs daily to perform data load and create final transformed tables
Taking ownership of escalations and perform troubleshooting, analysis, research and resolution using Sqoop, Hive, Oozie and Hadoop skills