As a Data Engineer with 4+ years of experience in data engineering, business intelligence, and ETL development using AWS and Azure platforms. Proficient in Python and SQL, with strong expertise in distributed computing, data pipeline orchestration, and data warehouse management. Known for effective cross-functional collaboration and delivering data-driven insights in high-paced environments.
Overview
4
4
years of professional experience
Work History
Data Engineer
Good One IT Solutions
08.2023 - 09.2024
Exposure to Amazon OpenSearch and created Kibana dashboards for lambda log monitoring
Design and development of an automated data pipeline for MDM of customer data, integrating multiple data sources and applying data quality checks
Designed ETL processes using AWS Glue and Lambda to transfer data from landing zones to data lakes, meeting specific business criteria
Built and automated Snowflake processes for daily data loading, enabling real-time business intelligence insights
Worked closely with product managers, software developers, and BI engineers to gather data requirements and deliver data-driven insights for dashboards, reports, and recommendation engines
Applied dynamic data masking in Snowflake to protect PII, maintaining compliance with data privacy standards
Skilled in Python and SQL for data processing and analysis, with experience in developing Spark scripts for data transformation and business logic automation
Developed and implemented data pipelines with AWS (S3, SNS, SQS, Glue, Lambda) for big data processing, supporting operational and analytical needs
Managed source code with Git and Bitbucket; involved in agile sprints to support product development
Developed and implemented a scalable end-to-end data pipeline for MDM of customer data using PySpark, EMR, SQL, and AWS technologies, resulting in a 50% reduction in data processing time and a 30% increase in data accuracy
Implemented data security and compliance measures using IAM roles, dynamic data masking, and encryption to protect sensitive information and PII in alignment with industry standards
Worked with non-relational databases like MongoDB for efficient storage and retrieval of semi-structured data, as well as real-time processing using Kafka and Kinesis
Developed stored procedures and tasks in Snowflake to automate the ingestion of data from S3 into Snowflake, ensuring timely and accurate data processing
Building scripting with python, Unix, Exposure to Jenkins
Developed Glue pipeline to load the data in different layers
Developed lambda to process SQS events to Mango DB
Involved in gathering requirements, designing, development and testing
Generated reports using Spark SQL for business requirements received on ADHOC basis
Exposure to orchestration tools Step function
Data Engineer
Trivium India Software Pvt Ltd.
06.2020 - 07.2022
Collaborated closely with cross-functional teams including product managers, analytics experts, and business stakeholders to meticulously assess and define data and analytics requirements
Developed scalable end-to-end data pipelines for customer MDM using PySpark, EMR, and AWS, reducing processing time by 50% and increasing data accuracy by 30%
Created automated ETL workflows with Airflow and AWS Glue, efficiently loading data from various sources into Snowflake
Configured Snowflake stored procedures and tasks for timely, accurate data ingestion from S3
Integrated data from multiple sources, leveraging data quality checks to ensure accuracy for business-critical insights
Developed and supported end-to-end data solutions including data lakes, ETL pipelines, and data warehouses using AWS services like S3, Glue, Lambda, Redshift, and Snowflake to meet internal and external reporting needs
Generated real-time reports using Spark SQL for ad hoc business requests, enhancing responsiveness to BI needs
Designed and implemented a series of ETL processes using AWS Glue, Lambda to seamlessly transfer data from landing zones to the data lake, aligning with specific business needs and criteria
Leveraged the capabilities of Snowflake to construct efficient procedures for loading data into various Dimensional and Fact tables within the data lake, ensuring optimal organization and accessibility
Streamlined and automated Snowflake procedures through the development of Glue jobs, enabling the systematic and daily loading of data to support real-time business intelligence needs
Empowered end-users by providing access to the DataMart, enabling them to create insightful BI reports that contribute to a deeper understanding of critical business insights
Extracted, transformed, and loaded data from multiple sources using SQL and scripting languages like Python
Experienced in building automated ETL workflows with AWS Glue and Apache Airflow for seamless data integration
Leveraged dynamic data masking features within Snowflake to selectively conceal sensitive fields or personally identifiable information (PII), minimizing the risk of unauthorized access
Experience on source code management with bitbucket, git repositories
Knowledge of agile methodologies for delivering software solutions
Developing and implementing data pipelines using AWS services such as S3, SQS, SNS, lambda, Crawler, Athena, EC2, Glue, to process big data
Actively participated in the creation of change requests and led deployments, orchestrating the smooth transition of code to higher environments to support ongoing development and optimization efforts
Education
Masters of Computer Information Systems and Information Technology -
University of Central Missouri
12.2023
Skills
Hadoop
Spark
HDFS
MapReduce
Yarn
Sqoop
Hive
Glue
Python
SQL
Unix Shell
AWS
Microsoft Azure
Informatica PowerCenter
IICS
Airflow
Druid
Presto
Flink
Unix shell scripting
Oracle
MS SQL server
DB2
Snowflake
MongoDB
Redshift
PostgreSQL
Putty
WinSCP
Kafka
AWS Kinesis
Tableau
Superset
Looker
Timeline
Data Engineer
Good One IT Solutions
08.2023 - 09.2024
Data Engineer
Trivium India Software Pvt Ltd.
06.2020 - 07.2022
Masters of Computer Information Systems and Information Technology -