More than 11 years of professional IT experience across multiple technologies demonstrating expertise in data engineering, design, and application development.
Experience in Big data, Data warehousing, data modeling & Automation in building up projects involving Data Ingestion, Transformation, and Processing.
Excellent programming skills using Scala and Python.
Experience in data processing with Hadoop Ecosystem tools like HDFS, Hive, Sqoop, Spark with Scala and ETL tools like DataStage9.x and Snowflake with SQL Server, Oracle, Teradata and Unix.
Worked on the Automation of scripts using Shell scripting & Python. Experience with data manipulation using Python libraries such as Pandas, PySpark, and Numpy.
Experience in handling large datasets using Partitions, Spark in Memory capabilities, Effective & efficient Joins and Transformations during ingestion process itself.
Developed RestAPIs using Scala and Akka framework.
Experience in Machine Learning, Deep Learning and Artificial Intelligence for user analytics.
Experience in Advanced SQL with RDBMS (SQL Server, Oracle, and Teradata) and developing Hive scripts using Hive UDTF, HQL for data processing and end user analytics.
Well-versed with importing and exporting data using Sqoop from HDFS to Relational Database Management Systems (RDBMS) and vice-versa.
Worked on continuous integration and continuous delivery/continuous deployment tools like Team City/Ops logic and GITHUB.
Worked extensively with AWS technologies like RedShift, S3, Cloud Watch, Athena, Glue, DynamoDB, Lambda, ECS, EKS, EMR, Flink, Kenesis, RDS, Kafka and Athena.
Involved in the business/client meetings for design, development, requirements discussion and providing solution for complex scenarios.
Involved in all phases of Unit Testing, SIT, UAT and Support.
Experience in interacting with Business Users/Stakeholders to analyze the business rules and requirements in Banking and Domains.
Strong Knowledge on the Spark architecture, Pair RDD’s, Spark DataFrame API including Adaptive Query Execution and profound experience working with UDFs and Spark SQL functions for transforming the raw data to meaningful data for Visualization. Worked explicitly on PySpark and Scala.
Experienced in developing ETL data pipelines in AWS Glue, Transformed the data using AWS Glue Dynamic Frames with PySpark; Cataloged the transformed data using Crawlers and scheduled the job and crawlers using workflow feature.
Expertise in Data engineering and Development of various Data warehousing applications and experienced in fact dimensional modeling (Star Schema, Snowflake Schema), Transactional modelling and SCD (Slowly changing Dimensions).
Well-versed in ingestion of data from different data sources into HDFS using Sqoop and managed sqoop jobs with incremental load to populate HIVE external tables.
Worked with HIVE data warehouse infrastructure –creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.
Experience with different file formats like ORC, Parquet, AVRO, JSON and XML.
Developed Automation scripts using Shell scripting & Python in building up projects involving Data Ingestion, Transformation and processing.
Experienced on Advanced SQL (Views/Stored procedures/Indexes) and hands-on experience in handling database issues and connections with SQL such as Oracle, Teradata, SQL Server, MySQL and NoSQL databases such as HBase, Mongo DB, Dynamo DB.
Instantiated, created, and maintained CI/CD (continuous integration & deployment) pipelines using tools such as Team City/Ops logic and GITHUB/BITBUCKET.
Expert in designing ETL data flows using creating mapping/workflows from heterogeneous source systems and transforming the data.
Experience in Creating, Debugging, Scheduling and Monitoring jobs using Orchestration tools like Autosys, Control M and Oozie.
Worked extensively on agile methodology which involves Iteration planning, Sprint, Retro and backlog planning.
Experience in interacting with Business Users/Stakeholders to analyze the business rules / requirements and perform source-to- target data mapping. Prepared LLD, Technical Specification documents and providing solution for complex scenarios.
Efficient Cloud Engineer with years of experience assembling Cloud Infrastructure. Utilizes strong managerial skills by negotiating with vendors and coordinating tasks with other IT members. Implements best practices to create cloud functions, applications and databases.
Experienced Snowflake data engineer adept at seamlessly integrating Snowflake's cloud-native data warehousing capabilities with AWS services. Proficient in configuring secure data pipelines and optimizing storage solutions, ensuring seamless data access and analysis in the AWS environment.
Experienced in designing and optimizing data warehousing solutions in Snowflake. Proficient in creating scalable data pipelines, optimizing query performance, and ensuring data integrity within the Snowflake platform.
Basic knowledge in GCP, Abinitio and Apache Airflow.
Overview
12
12
years of professional experience
1
1
Certification
Work History
Technical Lead | Senior Data Engineer
HCL Technologies Pvt. Ltd
09.2020 - 08.2023
Package Assurance is an CBA initiated project
The objective of this project is to provide benefits to the customer as well as the bank by giving subsidies in interest rates to the customers while applying for Personal Loan/Home Loan/Credit Card
Worked as senior data engineer/Squad lead coordinating Joint Application Development (JAD) sessions with Solution Designer/Business Analysts and Business stakeholders for performing data analysis and gathering business requirements
Performed end-to-end architecture and implementation assessment of various AWS services like AWS EMR, Redshift, S3 and AWS Glue
Installed, configured and managed Hadoop Clusters and Data Science tools using AWS EMR
Worked on setting up the High-Availability for Hadoop Clusters components and Edge nodes
Designed and developed a Python Parser to auto-convert HiveQL codes into equivalent PySpark (Spark SQL) jobs to leverage the Spark capabilities on AWS EMR, thus reducing conversion time by over 90%
Designed services for seamless monitoring like monitor Active EMR Clusters running across all regions
Used Boto3 library and deployed solution on Lambda
Business notifications configured via SES and scheduled via CloudWatch
Worked on a framework for orchestration & monitoring of our Core EMR Cluster using Lambda, CloudWatch, & SNS
Fetched data from various source systems such as SAP, HLS, CC and COMSSEE by building data pipelines using Pyspark
Created PySpark data frame to profile, clean and transform data in the form of CSV files in Amazon S3 bucket
Designed, developed, and implemented ETL pipelines using python API (PySpark) of Apache Spark using AWS Glue
Performance tuning of PySpark scripts in AWS Glue
Collect batch files from customers and extract, unzip and load the files to S3 buckets
The final refined tables in S3 are moved to AWS Redshift and wrote various data normalization jobs for new data ingested to Redshift
Worked on CI-CD to facilitate seamless integration and deployment; achieved the goals using Github, Team-City and a control framework called Workflow Tables
Worked on an agile methodology and have involved in daily standups, technical discussions with business counterparts, sprint planning, scrum meetings, and have adhered to agile principles and have delivered quality code
Won CEO Award for completion of the project on time and with great accuracy.
Senior Software Engineer | Hadoop Developer
TVS NEXT Pvt. Ltd
08.2019 - 04.2020
Developed software modules for customer alerts and meter functioning for energy client later acquired by Ormat Technologies
Performed data ingestion using Sqoop to ingest tables from sources including SQL Server and Oracle
Developed HIVE tables on top of the resultant flattened data, storing data as Parquet file to enable quick read times
Created HiveQLs to apply business rules, structural transformation, and ensure conformance to refined database
Implemented Hive partitioning and bucketing techniques as part of code optimization
Created shell scripts to run High-Quality Leads (HQLs), capture reported errors, log error situations, and report them to the calling scripts
Named Best Performer for completing user module before deadline.
Systems Analyst | Hadoop Developer
UST Global
10.2018 - 08.2019
Designed and modified Customer Alert application for Client Equifax
Designed and implemented application in Apache Spark with Scala
Served as module lead for end-to-end delivery to customer, including testing
Involved in loading from SQL Server/HDFS data to HDFS/SQL Server, creation of partitioning in Hive, and loading data into Hive.
Senior Product Analyst | ETL Developer
Standard Chartered
05.2011 - 04.2017
Developed mobile-banking applications in DataStage for UK’s Standard Chartered Bank
Nearly eliminated customers’ 1-minute post-transaction delay in receiving alerts by removing multiple joins and modifying API services on existing application
Designed source-to-target mappings, assisted in designing selection criteria document, and developed technical specifications of the ETL process flow to proceed with development
Created mappings using various transformations like Aggregator, Expression, Filter, Router, Joiner, Lookup
Designed and documented validation rules, error handling and unit test strategy of ETL process
Tuned performance of mapping and sessions by optimizing source
Involved in writing UNIX shell scripts to run and schedule batch jobs.
Education
Bachelor of Technology in Electrical and Electronics Engineering - Technology
Jawaharlal Nehru Technological University
05.2006
Master of Science in Business Analytics - Information Technology
University of Louisville
Louisville, KY
07.2024
Skills
Hadoop
HDFS
Sqoop
NiFi
Hive
Oozie
Kafka
Zookeeper
YARN
Apache Spark
Cloudera
HBase
Oracle
MySQL
SQL Server
Teradata
Python
PySpark
Scala
R
Shell Script
SQL
Deep Learning
Machine Learning
Artificial Intelligence
SAS Viya
AWS
GCP
Rest API
PyCharm
Eclipse
IntelliJ
Visual Studio
Plus
SQL Developer
TOAD
SQL Navigator
QueryAnalyser
SQL Server Management Studio
SQL Assistance
Hue
GIT
GITHUB
Bit Bucket
Linux - Ubuntu
Windows
Kerberos
Dimension Modelling
ER Modelling
Star Schema modelling
Snowflake Modelling
Erwin
Visio
Apache Airflow
Autosys
Control M
Tivoli
Tableau
PowerBI
Team city
Ops logic
Jenkins
Octopus
Datastage
Snowflake
Abinitio
Putty
WinSCP
FileZilla
GITBASH
Zeppelin
Jupyter
MongoDB
Cassandra
Dynamo DB
Accomplishments
Achieved [Result] by completing [Task] with accuracy and efficiency.
Collaborated with team of [Number] in the development of [Project name].
Supervised team of [Number] staff members.
Achieved [Result] through effectively helping with [Task].
Achieved [Result] by introducing [Software] for [Type] tasks.
Certification
AWS certified Solution Architect - Associate
Snowflake snowpro
Timeline
Technical Lead | Senior Data Engineer
HCL Technologies Pvt. Ltd
09.2020 - 08.2023
Senior Software Engineer | Hadoop Developer
TVS NEXT Pvt. Ltd
08.2019 - 04.2020
Systems Analyst | Hadoop Developer
UST Global
10.2018 - 08.2019
Senior Product Analyst | ETL Developer
Standard Chartered
05.2011 - 04.2017
Bachelor of Technology in Electrical and Electronics Engineering - Technology
Jawaharlal Nehru Technological University
Master of Science in Business Analytics - Information Technology
University of Louisville
Similar Profiles
Kumar AbhishekKumar Abhishek
Functional Consultant - ITOM at HCL Technologies Pvt. LtdFunctional Consultant - ITOM at HCL Technologies Pvt. Ltd