Summary
Overview
Work History
Education
Skills
Certifications
Timeline
Generic

Kushal Juneja

Data Engineer
San Francisco,CA

Summary

Data Engineer with 5+ years of experience designing and maintaining data pipelines with a passion in learning . An autonomous worker committed to providing high quality services.

Overview

7
7
years of professional experience
2
2
years of post-secondary education

Work History

Senior Data Engineer

John Deere
San Francisco, CA
04.2020 - Current
  • Worked with product owners to design new products for automated data ingestion and serving.
  • Led architecture, design and implementation of back-end features using python, spark, scala and AWS services.
  • Analyzed complex data and identified anomalies, trends and risks to provide useful insights to improve feature generation jobs.
  • Trained and mentored junior developers, engineers, and interns teaching skills in spark, python, scala and geospatial data and working to improve overall team performance.
  • Developed multiple CI/CD architectures and implemented them using Terraform, Jenkins and Drone for 5 repositories.
  • Implemented 10+ spark extensions and custom aggregation to process geospatial data at scale and create features for machine learning.
  • Developed and supported multiple software products for external data sources like weather data, raster data and sensor data using Postgis, Spark(Databricks), SQL, Step function, lambda and AWS services

<p>Data Engineer<br /></p>

Walmart
02.2018 - 04.2020
  • Designed and developed data-flow directed acyclic graph in airflow to manage petabyte scale inventory data, writing 1+TB of data per day into hive tables through TEZ and later, through spark
  • Developed standalone pyspark, scala application mapping 100M+ product location, using user defined Java functions to consume data from Kafka stream in a batch fashion making it inherently fault tolerant
  • Reduced DAG execution time from 1.2 hours to 20 minutes by analyzing query execution plan and data distribution and hence forcing optimal join conditions
  • Utilized advance SQL functionalities like windowing and modular "with blocks" to develop 1000+ lines of monolithic SQL code for quicker analysis, which was later converted into pyspark
  • Developed automated dataframe level unit and integration test cases to automatically detect logical errors which creep in the code silently and pollute the data
  • Created automated integrated testing methodology using simple JSON configuration files and automated Docker deployment for rest web services created in JAX-RS
  • Managed data migration to Bigquery using external and managed tables according to ad-hoc or reporting business case
  • Developed Scala,Spark and API based rule validator using massively distributed processing to check the validity of 2 Billion store, category and department combinations.
  • Developed streaming pipeline with processing time windowing and 20 minute trigger for ingesting transaction log from store database.
  • Created Exploratory data audit report between data lake and Teradata to audit sales data quality for forecasting and work force management.

<p></p><p>Software Engineering Intern<br /></p>

Walmart
06.2017 - 08.2017

  • Developed java microservices for mobile app gamification to promote and recognize leading stores and departments, among 5000+ US based stores having the best availability.
  • Developed query string and timestamp based caching mechanism to do bulk caching and hence improve end user (store associate) experience
  • Created swagger documentation for orchestration layer to expose the functionality company wide. and hence prevent rework

<p>Software Engineer<br /></p> <p></p>

Tech Mahindra
01.2014 - 01.2016

  • Developed backend Google Cloud endpoints in JAVA for an Android mobile application, connecting 200+ associates by keeping them informed with the onsite advancements
  • Developed solutions and POCs in BigQuery, Datastore, GWT and Servlets to handle streaming data and produce near real time analytics

Education

undefined

The University of Texas at Dallas
01.2016 - 12.2017

Skills

Hive/SQL

undefined

Certifications

  • Databricks certified data engineering associate
  • Bigquery Developer
  • App Engine qualified Developer
  • Compute Engine Qualified Developer
  • Analytics Certification
  • Adwords Mobile Certification
  • Cloud SQL Developer
  • Cloud Storage Developer

Timeline

Senior Data Engineer

John Deere
04.2020 - Current

Data Engineer

Walmart
02.2018 - 04.2020

Software Engineering Intern

Walmart
06.2017 - 08.2017

undefined

The University of Texas at Dallas
01.2016 - 12.2017

Software Engineer

Tech Mahindra
01.2014 - 01.2016
Kushal JunejaData Engineer