Data Engineer with 5+ years of experience designing and maintaining data pipelines with a passion in learning . An autonomous worker committed to providing high quality services.
Overview
7
7
years of professional experience
2
2
years of post-secondary education
Work History
Senior Data Engineer
John Deere
San Francisco, CA
04.2020 - Current
Worked with product owners to design new products for automated data ingestion and serving.
Led architecture, design and implementation of back-end features using python, spark, scala and AWS services.
Analyzed complex data and identified anomalies, trends and risks to provide useful insights to improve feature generation jobs.
Trained and mentored junior developers, engineers, and interns teaching skills in spark, python, scala and geospatial data and working to improve overall team performance.
Developed multiple CI/CD architectures and implemented them using Terraform, Jenkins and Drone for 5 repositories.
Implemented 10+ spark extensions and custom aggregation to process geospatial data at scale and create features for machine learning.
Developed and supported multiple software products for external data sources like weather data, raster data and sensor data using Postgis, Spark(Databricks), SQL, Step function, lambda and AWS services
<p>Data Engineer<br /></p>
Walmart
02.2018 - 04.2020
Designed and developed data-flow directed acyclic graph in airflow to manage petabyte scale inventory data, writing 1+TB of data per day into hive tables through TEZ and later, through spark
Developed standalone pyspark, scala application mapping 100M+ product location, using user defined Java functions to consume data from Kafka stream in a batch fashion making it inherently fault tolerant
Reduced DAG execution time from 1.2 hours to 20 minutes by analyzing query execution plan and data distribution and hence forcing optimal join conditions
Utilized advance SQL functionalities like windowing and modular "with blocks" to develop 1000+ lines of monolithic SQL code for quicker analysis, which was later converted into pyspark
Developed automated dataframe level unit and integration test cases to automatically detect logical errors which creep in the code silently and pollute the data
Created automated integrated testing methodology using simple JSON configuration files and automated Docker deployment for rest web services created in JAX-RS
Managed data migration to Bigquery using external and managed tables according to ad-hoc or reporting business case
Developed Scala,Spark and API based rule validator using massively distributed processing to check the validity of 2 Billion store, category and department combinations.
Developed streaming pipeline with processing time windowing and 20 minute trigger for ingesting transaction log from store database.
Created Exploratory data audit report between data lake and Teradata to audit sales data quality for forecasting and work force management.
<p></p><p>Software Engineering Intern<br /></p>
Walmart
06.2017 - 08.2017
Developed java microservices for mobile app gamification to promote and recognize leading stores and departments, among 5000+ US based stores having the best availability.
Developed query string and timestamp based caching mechanism to do bulk caching and hence improve end user (store associate) experience
Created swagger documentation for orchestration layer to expose the functionality company wide. and hence prevent rework
<p>Software Engineer<br /></p>
<p></p>
Tech Mahindra
01.2014 - 01.2016
Developed backend Google Cloud endpoints in JAVA for an Android mobile application, connecting 200+ associates by keeping them informed with the onsite advancements
Developed solutions and POCs in BigQuery, Datastore, GWT and Servlets to handle streaming data and produce near real time analytics