Data Engineering professional having around 8+ years of experience in a variety of data platforms, with hands on experience in Big Data Engineering and Data Analytics. Practical Database Engineer possessing in-depth knowledge of data manipulation techniques and computer programming paired with expertise in integrating and implementing new software packages and new products into system. Offering 8-year background managing various aspects of development, design and delivery of database solutions. Tech-savvy and independent professional bringing outstanding communication and organizational abilities.
Overview
9
9
years of professional experience
Work History
Sr. Data Engineer
Ecolab
St Paul, MN
06.2022 - Current
Responsible for provisioning key AWS Cloud services and configure them for scalability, flexibility, and cost optimization
Create VPCs, subnets including private and public, NAT gateways in a multi-region, multi-zone infrastructure landscape to manage its worldwide operation
Manage Amazon Web Services (AWS) infrastructure with orchestration tools such as CFT, Terraform and Jenkins Pipeline O Create Terraform scripts to automate deployment of EC2 Instance, S3, EFS, EBS, IAM Roles, Snapshots and Jenkins Server Build Cloud data stores in S3 storage with logical layers built for Raw, Curated and transformed data management
Create data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Athena and Quick Sight
Create manage bucket policies and lifecycle for S3 storage as per organizations and compliance guidelines
Create parameters and SSM documents using AWS Systems Manager Established CICD tools such as Jenkins and Git Bucket for code repository, build and deployment of the python code base
Build Glue Jobs for technical data cleansing such as deduplication, NULL value imputation and other redundant column removal
Also build Glue jobs to build standard data transformations (date/string and Math operations) and Business transformations required by business users
Used Kinesis Family (Kinesis Data streams, Kinesis Firehose, Kinesis Data Analytics) for collection, processing and analyze the streaming data
Create Athena data sources on S3 buckets for adhoc querying and business dashboarding using Quicksight and Tableau reporting tools
Copy Fact/Dimension and aggregate output from S3 to Redshift for Historical data analysis using Tableau and Quick sight
Use Lambda functions and Step Functions to trigger Glue Jobs and orchestrate the data pipeline.
Sr. Data Engineer
Ditech
Fort Washington, PA
10.2020 - 05.2022
Designed and setup Enterprise Data Lake to provide support for various uses cases including Storing, processing, Analytics and Reporting of voluminous, rapidly changing data by using various AWS Services
Used various AWS services including S3, EC2, AWS Glue, Athena, RedShift, EMR, SNS, SQS, DMS, Kinesis
Extracted data from multiple source systems S3, Redshift, RDS and Created multiple tables/databases in Glue Catalog by creating Glue Crawlers
Created AWS Glue crawlers for crawling the source data in S3 and RDS
Created multiple Glue ETL jobs in Glue Studio and then processed the data by using different transformations and then loaded into S3, Redshift and RDS
Created multiple Recipes in Glue Data Brew and then used in various Glue ETL Jobs
Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, Parquet/Text Files into AWS Redshift
Used AWS glue catalog with crawler to get the data from S3 and perform SQL query operations using AWS Athena
Written PySpark job in AWS Glue to merge data from multiple tables and in Utilizing Crawler to populate AWS Glue Data Catalog with metadata table definitions
Used AWS Glue for transformations and AWS Lambda to automate the process
Used AWS EMR to transform and move large amounts of data into and out of AWS S3
Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs using CloudWatch
Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift and 53
Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB
To analyze the data Vastly used Athena to run multiple queries on processed data from Glue ETL Jobs and then used Quick Sight to generate Reports for Business Intelligence
Used AWS EMR to transform and move large amounts of data into and out of AWS S3
Used DMS to migrate tables from homogeneous and heterogenous DBs from On-premises to AWS Cloud
Created Kinesis Data streams, Kinesis Data Firehose and Kinesis Data Analytics to capture and process the streaming data and then output into S3, Dynamo DB and Redshift for storage and analyzation
Created Lambda functions to run the AWS Glue job based on the AWS S3 events.
Data Engineer
Fifth Third Bank
Evansville, IN
07.2018 - 09.2020
Built S3 buckets and managed policies for S3 buckets and used S3 glacier for storage and backup on AWS
Designed, built, and coordinated an automated build & release CI/CD process using GitLab, Jenkins and Puppet on hybrid IT infrastructure
Involved in designing and developing Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing, Amazon SWF, Amazon SQS, and other services of the AWS infrastructure
Running build jobs and integration tests on Jenkins Master/Slave configuration
Conduct systems design, feasibility and cost studies and recommend cost-effective cloud solutions such as Amazon Web Services (AWS)
Involved in maintaining the reliability, availability, and performance of Amazon Elastic Compute Cloud (Amazon EC2) instances
Managed Servers on the Amazon Web Services (AWS) platform instances using Puppet configuration management
Integrated services like GitHub, AWS Code pipeline, Jenkins, and AWS Elastic Beanstalk to create a deployment pipeline
Involved in complete SDLC life cycle - Designing, Coding, Testing, Debugging and Production Support
Coordinate/assist developers with establishing and applying appropriate branching, labeling/naming conventions using Git
Used Kubernetes to deploy scale, load balance, scale and manage docker containers
Worked on JIRA for defect/issues logging & tracking and documented all my work using CONFLUENCE
Branching, Merging, Release Activities on Version Control Tool GIT
Used GitHub as version control to store source code and implemented Git for branching and merging operations.
Data Engineer
Amigos Software Solutions
Hyd, India
01.2017 - 04.2018
Responsible for building scalable distributed data solutions using Hadoop
Demonstrated a strong comprehension of project scope, data extraction, design of dependent and profile variables, logic and design of data cleaning, exploratory data analysis and statistical methods
Used spark steaming APIs to perform necessary transformations for building the common learner data model which gets data from Kafka in near real time and persists into Hive
Developed Spark scripts by using Python as per the requirements
Developed real time data pipeline using Spark to ingest customer events/activity data into Hive and Cassandra from Kafka
Performed Spark jobs optimization and performance tuning to improve running time and resources
Worked on reading and writing multiple data formats like JSON, AVRO, Parquet, ORC on HDFS using Pyspark
Designed, developed, and did maintenance of data integration in Hadoop and RDBMS environment with both traditional and non-traditional source system as well as RDBMS and NoSQL data stores for data access and analysis
Involved in recovery of Hadoop clusters and worked on cluster size of 310 nodes
Worked on creating Hive tables, loading, and analyzing data using Hive queries
Experience in proving application support for Jenkins
Developed a data pipeline with AWS to extract the data from weblogs and store in HDFS
Used Hive QL to analyze the partitioned and bucketed data and compute various metrics for reporting
Used reporting tools like Tableau and Power BI to connect with Hive for generating daily reports of data.
Software Associate
Careator Technologies Pvt Ltd
Hyderabad, India
08.2014 - 12.2016
Designed and Developed data integration/engineering workflows on big data technologies and platforms - Hadoop, Spark, MapReduce, Hive, HBase
Involved in gathering business requirements, logical modeling, physical database design, data sourcing and data transformation, data loading, SQL, and performance tuning
Used SSIS to populate data from various data sources, creating packages for different data loading operations for applications
Transformed and analyzed the data using Pyspark, HIVE, based on ETL mappings
Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats
Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift
Performance tuning: RDBMS are used to manage large volumes of data, and performance issues can arise due to inefficient queries, indexing, or other factors
A data engineer should have experience in performance tuning and optimization of RDBMS
Created Data visualizations using Databricks' integrated visualization tools and third-party tool Power BI
Implement and deliver MSBI platform solutions to develop and deploy ETL, AWS, Azure, Azure analytical, data analytics, reporting, and scorecard/dashboards on SQL Server using SSIS and SSRS
Extensively worked with SSIS tool suite, designed and created mapping using various SSIS transformations like OLEDB command, Conditional Split, Lookup, Aggregator, Multicast, and Derived Column
Scheduled and executed SSIS Packages using SQL Server Agent and Development of automated daily, weekly, and monthly system maintenance tasks such as database backup, Database Integrity verification, indexing, and statistics updates
Worked extensively on SQL, PL/SQL, Scala, and UNIX shell scripting
Expertise in creating PL/ SQL Procedures, Functions, Triggers, and cursors
Developing under scrum methodology and in a CI/CD environment using Jenkins
Designed and documented the entire Architecture of Power BI POC
Utilized Unix Shell Scripts for adding the header to the flat file targets
Deep analysis of SQL execution plan and recommend hints or restructure or introduce index or materialized view for better performance