Data engineer lead with 10+ years of professional experience with proven skills in Data Science, Machine Learning, and Big Data Technologies with interest in the field of Software Research and Development.
Overview
14
14
years of professional experience
1
1
Certification
Work History
Data Engineering Lead
Google
04.2020 - Current
Designed and implemented statistical data generation module to synthesize case/issue generation pattern using Python and SQL
Performed Query Tuning and Parameter tuning to increase ETL Pipeline throughput in Dataflow and Dataproc.
Designed and implemented tool to rank bug priorities to help with faster resolution.
Implemented Hadoop/Spark stress testing module for GCP Dataproc Clusters using Pyspark
Implemented tools and automation to improve usability of Google Cloud products.
Write troubleshooting guides used by the frontline team to drive speedy resolution of common classes of cloud infrastructure problems.
Deliver onboarding training for new members of this worldwide team.
Handle technical issue escalations and operational responsibilities day to day.
Work with key accounts to improve technical client experience and address negative patterns or concerns with product team.
Act as a Data Analyst. Write queries and generate dashboards for visualization. Generate business intelligence around client technical bottlenecks for GCP in general in order to better manage the team.
Identify training or process gaps. Highlight problem areas for each product and bring those to the product team.
Data Engineer
Procter & Gamble
12.2018 - 12.2019
Developed and applied machine learning based approach to successfully detect and mitigate real attack threats on the enterprise network
Feature extraction and dimensionality reduction using PCA
Kmeans clustering for network node behavioral modelling and malicious activity detection
Explored and Implemented Deep learning based approach for feature extraction such as LSTM autoencoder for packet sequence data analysis using Keras TensorFlow 2.0 framework
Explored and implemented complex deep learning graph topologies using keras TensorFlow functional API
Leveraged AWS, pyspark for large scale network log processing (1TB/day)
Research Assistant
The University of Texas at Dallas
01.2015 - 12.2019
Large scale data processing with Apache Spark, Hadoop. Machine learning and natural processing language applications.
Transfer learning using pre-trained deep learning models with Keras.
Keras deep learning model topology modification for custom classification.
Network data collection both for system calls and network packet data
Responsible for data identification, collection, exploration, cleaning for modeling.
Gathered data from multiple web, external data sources and APIs, writing SQL to join and aggregate various data for analysis.
Hands on experience in implementing Machine Learning algorithms like
K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Decision Trees, Random Forests.
Hands on experience in performing Dimensionality Reduction with Technique like Principal Component Analysis
Text analytics on threat report data using Natural Language Processing Tool Kit (NLTK) and natural language processing techniques using Word vectors.
Created various types of data visualizations using Matplotlib.
Bigdata/Software Engineering Development Intern
Verizon Labs
06.2016 - 08.2016
Tuned Cassandra and Kafka cluster to handle 250,000 writes/sec
Optimized Kafka topic partition to increase throughput
Tuned Cassandra CQL to optimize performance
Setup of production grade docker, mesos, marathon application framework for Internet of things
(IOT) pipeline for large scale message handling (250,000 messages per sec) Identified bottlenecks
and performed system tuning that improved IOT application message throughput.
Developed an IOT pipeline for more than 60 million virtual IOT devices using microservices architecture.
Automated deployment scripting using bash, expect and python.
Intern Summer Software Engineer
Fedex
06.2014 - 08.2014
Implemented Mobile Based App with customer identification and In-store service recommendation system.
IPhone/IPad Development FedEx Office Print mobile Application.
Big Data Tools Apache Hadoop, Map-reduce, Pig, Hive, and Spark
Databases SQL (Oracle, MySQL, and MSSQL) and NoSQL
Certification
Google Cloud Professional Data Engineer. https://google.accredible.com/7026e595-933c-46f1-b44e-f1151a561d2e
Google Cloud Professional Machine Learning Engineer https://google.accredible.com/476d0f8e-b86b-4f60-8090-0845b0b8a457
Google Cloud Platform Big Data and Machine Learning Fundamentals https://www.coursera.org/account/accomplishments/certificate/F6DMRZW3EM95
Google Cloud Platform Fundamentals: Core Infrastructure https://www.coursera.org/account/accomplishments/certificate/W2L89XZ5UCTE
Modernizing Data Lakes and Data Warehouses with GCP https://www.coursera.org/account/accomplishments/certificate/QMV2XZT5RNZB
Awards and Scholarship
Erik Jonsson full scholarship for PhD in computer science with emphasis on processing large
datasets.
EGBA Leventis community scholarship for outstanding student. Federal Government scholarship.
TEACHING Instructor Winter Break Workshop 2017: Data Analysis using Python
Teaching Assistant: CS 6350 Big Data Management and Analytics Fall 2014, Spring 2015, Fall 2015
and Spring 2016.
Teaching Assistant: CS6396 Real Time System Fall 2014 and Fall 2015.