Result oriented Cloud Data Engineer / Scientist with Stakeholder focus in niche and robust solutions for Cloud Migrations, Data Lake, Machine Learning Models, and BI applications across Marketing, Telecom and Gaming clients
In depth technical and business knowledge from 5 years of professional progressive experience in the IT and Data Engineering (Structured and Unstructured/ Big Data) delivery consulting space
Built ground-up Data Lake, Data warehousing and Machine Learning solutions leveraging various Frameworks (Hadoop, Spark, Docker, PyTorch, Tensorflow), Cloud services (AWS S3, Data Pipeline, Glue, EMR, Athena, Lambda and others) and programming languages (SQL, Python, Shell Scripting, Java)
Extensive knowledge on Amazon Web Services (AWS) EC2, S3, Elastic Map Reduce (EMR) and also on Redshift, Identity and Access Management (IAM).
Designed and developed solutions for On-Premise Enterprise Data Warehouse and Business Intelligence Solutions leveraging databases (Oracle, SQL Server, PostgreSQL) and tools (Informatica, Power BI, Tableau, and others)
Proficient in SQL, PL/SQL programming skills like Triggers, Stored Procedures, Functions, Packages etc. in developing applications.
Experience on Cloud Databases and Data warehouses (Confidential Redshift/RDS)
Hands on experience on AWS cloud services (VPC, EC2, S3, RDS, Redshirt, Data Pipeline, EMR, RDS, SNS, SQS)
Developed Spark code using Python/Scala and Spark-SQL for faster testing and processing of data.
Experience with developing and maintaining applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Watch.
Experience in writing Down-Stream and up-Stream Pipelines using Python.
Good exposure of automation of ETL process using Python and Shell script.
Adept with Agile/Scrum, SDLC methodologies
Overview
6
6
years of professional experience
1
1
Certification
Work History
Data Analytics Engineer
WB Games
07.2022 - Current
Designed and developed ETL processes in AWS GLUE to source disparate gaming datasets from external sources and ingest into Data Lake and AWS Redshift data warehouse system
Expertise in creating, debugging, optimization of ETL Data Pipelines using Apache Airflow to load data into Data Lake stores (Landing/Raw, Integration and Curated) and into Redshift for data analytics workloads
Stored and retrieved data from data-warehouses using Amazon Redshift
Worked on scheduling jobs using Airflow scripts using python
Adding different tasks to DAG's and dependencies between tasks
Written Postgres stored procedures, functions, packages, triggers to implement business rules into Application level
Configured Spark streaming to receive real time data from kafka and store stream data into AWS S3
Collaborated with BI analysts and DBA in curating gameplay data into intuitive and efficient assets that drive insights
Created data visualization dashboards and reports through Looker
Designed and implemented ETL/Data Pipelines to capture reports into Redshift using Looker API to pull KPIs
Leveraged JIRA for Scrum, GitHub for source code control, Jenkins for CI/CD, Confluence for Documentation
Used Git for version control, JIRA for project tracking and Jenkins for continuous integration
Participated in code reviews with peers to ensure proper test coverage and consistent code standards
Collaborated with internal and external stakeholders to optimize data sourcing pipelines and verify data quality
Worked with Agile software lifecycle methodologies. Create design documents when and as requires. Perform coding, debugging, and testing
Working with agile methodology to ensure delivery of high-quality work with Bi-weekly iteration.
Data Engineer
Kroger
01.2022 - 05.2022
Engineered Programmatic Advertising Data and API integration solutions as part of self-service SPMP (Smart Private Market Place) for campaigning and measuring media investments across Kroger
Inventory channels
Designed and implemented ETL/Data Pipelines to ingest disparate datasets from external SSP/DSP partners into Data Lake hosted in AWS cloud
Orchestrated ETL/Data Pipelines using AWS SNS and Lambda functions to automate campaign/deal configuration steps like invoke API calls, trigger AWS ECS tasks, Athena queries for sharing deals data between Audigent and SSP partners and impression/measurement data with Kroger client
Developed and optimized ETL Data Pipelines using technologies like AWS Data Pipelines, Airflow, Unix Shell Scripts, Python modules, Athena queries, AWS Lambda Functions, ECS tasks, JSON config, AWS DMS, Marketplace Qlik Attunity Replicator, and others
Migrating API, Storage and Query Engines from AWS Beanstalk, S3, Athena to Azure counterparts API Services, Storage Account, Synapse Analytics
Created Pods and controlled them using Kubernetes, using Jenkins pipelines to push all micro service builds to Docker registry and then deploy to Kubernetes.
Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines
Big Data Engineer / Data Scientist
CENTREPOINT INFORMATICS PVT LTD.
05.2018 - 04.2021
Developed and optimized Spark Python (PySpark) ETL jobs using EMR/Glue for ingesting disparate external data sources into Data Lake and storing transformed data in RedShift for speeding complex analytical query workloads
Designed and deployed high performance data pipelines for Data Lake and Analytical applications
Familiar with Hadoop file system, AWS S3 storage, and big data formats such as Parquet, Avro, and JSON
Created AWS Athena tables and queries for ad hoc data analysis
Design, develop and test ETL mappings and workflows using Informatica and analyze systems for accuracy
Monitor and adjust end-to-end Informatica workflows to enhance productivity
Developed Informatica Power Center mappings, workflows to convert legacy SSIS ETL Executed Data Analysis and Data Visualization on survey data using Tableau Desktop as well as Compared respondent's demographics data with Univariate Analysis using Python (Pandas, NumPy, Seaborn, Sklearn, and Matplotlib)
Automate Informatica processes to update status tables after running maps successfully
Worked on Tableau to build customized interactive reports, worksheets, and dashboards
Reviewed basic SQL queries and edited inner, left, & right joins in Tableau by connecting live/dynamic and static datasets
Used version control tools Git to update project with team members
Working with agile methodology to ensure delivery of high-quality work with monthly iteration
Performed all necessary day-to-day Git support for different projects, Responsible for maintenance of Git repositories, and access control strategies.
SQL Developer Intern
Eduquity Career Technologies Pvt Ltd
10.2017 - 04.2018
Work in Agile environment to ensure delivery of high-quality work
Design database schemas and ensuring their stability, reliability, and performance
Ability to write complex SQL queries and optimizing queries in SQL Server 2012
Knowledge and experience in RDBMS concepts, Views, Triggers, Stored Procedures, and Indexes
Assist in developing and implementing new technologies
Created and improved existing reports and data analytics visualizations using Power BI.
Education
Master's - Computer Science
Cleveland State University
Cleveland, OH
12.2022
Bachelor's - Computer Science
GITAM
Bangalore, India
06.2018
Skills
Big Data - Hadoop, Spark, Hive, NoSQL DynamoDB etc
Detecting American Sign Language in real-time is implemented using convolutional neural network.
Used ASL letter database of hand gestures from Kaggle. Used ReLU activation function and Adam optimizer.
Using cv2, I was able to utilize the webcam on my laptop to capture frames and send them through the model to predict what class each frame was.
Facial Expression Recognition Challenge
In this project, we will predict expression using data from csv file.
Used the Facial Expression dataset from Kaggle by using Kaggle API and save in my google drive.
Train and tested the dataset Using SVM, decision Tree and KNN classification algorithms to find the f1_score for each model.
Perform grid search to find the best hyperparameters for different algorithms Display the f1-score for each model using data visualization
Business intelligence and Visualization
Building a Business Analytic Data Mining Model using Microsoft BI Data Mining tool and OLAP Cubes.
Used Adventure Work database that are already created to design and build DW Cubes for BI Project.
Used Visual Studio SSDT to build and deploy the cube to SQL Server and Microsoft SQL Server Management Studio for Writing MDX queries to retrieve data.
Visualized the data which is retrieved from Each MDX queries or Data Mining Results.
Web Log Parser (Python, Flask, AWS)
Created a web application using python where the parser read the time log data files and determine the time author spent on each file.
Used AWS as the cloud platform and Flask API to develop the web application.
Intentionally vulnerable web-application
This project is a web application developed though PHP, MySQL and JavaScript which is made vulnerable intentionally.
There are many students with the attacking skills who don't know where to test or practice.
This project gives a platform to practice those skills legally and gives better understanding of securing the web application.
Certification
AWS Certified Data Analytics – Specialty
Timeline
Data Analytics Engineer
WB Games
07.2022 - Current
Data Engineer
Kroger
01.2022 - 05.2022
Big Data Engineer / Data Scientist
CENTREPOINT INFORMATICS PVT LTD.
05.2018 - 04.2021
SQL Developer Intern
Eduquity Career Technologies Pvt Ltd
10.2017 - 04.2018
Master's - Computer Science
Cleveland State University
Bachelor's - Computer Science
GITAM
AWS Certified Data Analytics – Specialty
Similar Profiles
Saudi Benjamin SaverySaudi Benjamin Savery
FQA Tester at Keywords Studios / WB Games MontrealFQA Tester at Keywords Studios / WB Games Montreal