Summary
Overview
Work History
Education
Skills
Projects
Timeline
Generic

Joshua Stern

Site Reliability Engineer
Mountain View,CA

Summary

Successfully managing and driving forward 100+ Apple Maps and FindMy microservices running in Kubernetes as a primary SRE. Ranging from APIs to UIs, both on and off premise, from the initial inception to being on-call and ending with peaceful decommissioning, utilizing CI/CD and GitOps.

Overview

7
7
years of professional experience
4
4
years of post-secondary education

Work History

Site Reliability Engineer

Apple
6 2019 - Current
  • Managed and drove forward OpenStreetMaps (OSM) migration to Kubernetes as a primary SRE. OSM is a collaborative project to create a free editable map of the world, and is comprised of over a dozen microservices and developers.
  • Managing and driving forward over four dozen Maps Editor microservices in Kubernetes as a primary SRE, which are utilized to create high quality Map data.
  • Managing and driving forward half a dozen Machine Learning microservices in Kubernetes, both on and off premise, as a primary SRE, which are utilized by the Apple Ratings and Photos (ARP) feature in Maps to perform ML analysis on millions of vendor and user photo submissions per day.
  • Managing and driving forward FindMy as a primary SRE, which is an asset tracking app and service comprised of a dozen microservices and developers, is tightly coupled with other iCloud services, and receives over half a million requests per second.
  • Spearheaded the FindMy CI/CD pipelines, which are utilized to automatically rollout changes into the production environment in a reliable, robust manner. Decreasing time to production from days to hours.
  • Delivered improvements to FindMy on-call which resulted in a 40% decrease in alerts, as well as performing on-call rotation, and dealing with critical user facing incidents with cross functional teams in a blame free manner.
  • Scaled FindMy infrastructure alongside new user growth. This requires over a dozen action items, including but not limited to writing documentation to track progress, provisioning secrets, provisioning VIPs, provisioning Source Code configs, provisioning Kubernetes configs, provisioning Spinnaker pipelines, deploying, working with QA to validate before going live, debugging, troubleshooting, provisioning on-call alerts, and working with cross functional iCloud teams and services which FindMy depends on.
  • Designed and implemented a software Slack application that fully automates the FindMy deployment requests and status updates for all FindMy services and pipelines, significantly reducing manual work and saving dozens of hours each week for the Developer and SRE teams.
  • Productionized and improved Next Gen Platform (NGP) gateway and Kafka micro-service pipelines via automated smoke tests and robust checks that are heavily utilized by Apple Business Connect, which is a free, web-based portal from Apple to help businesses easily manage, measure, and grow.
  • Designed and implemented a software capability that pre-downloads ML model data for the Maps Core ML services, resulting in a 1000%-5000% speedup in service deployment time and an increase in service availability, spanning cross functionally to multiple Maps vertical stacks, such as Data, Core, and Infra.

Site Reliability Engineer Intern

NVIDIA
06.2018 - 09.2018
  • Developed telemetry tools for DGX supercomputer clusters which are utilized to build enterprise AI infrastructure at scale.

Software Engineering Intern

Big Data Federation
07.2017 - 09.2017
  • Automated the ingestion of big data utilized by AI to predict the stock market.

Education

Bachelor of Science - Computer Science

University of California, Santa Cruz
3.78 GPA
09.2015 - 05.2019

Skills

  • Distributed Systems

  • Microservices

  • Kubernetes

  • Docker

  • Linux, Unix

  • Bash

  • Git

  • GitOps

  • CI/CD

  • Spinnaker

  • HELM

  • PagerDuty

  • Python

  • Flask, Django

  • Splunk

  • Jaeger, OpenTelemetry

  • Jenkins

  • PagerDuty

  • Jira, Wrike

  • Quip

  • HTML, CSS

  • Kafka

  • Typescript

  • Java

  • C

  • Go

  • Rust

Projects

Solved the Ethernaut Challenges (2022)

Ethereum, Solidity, Python


Distributed Key-Value Store (2018)

Distributed Systems, Python, Flask, Docker


Distributed Password Cracker (2018)

Distributed Systems, Multithreading, C++


Bitboard Cryptocurrency Tracker (2018)

HTML, CSS, Python, Django, Docker


Published Time-Based Persistence in Channel-Access Protocols with Carrier Sensing (2017)

IEEE

Timeline

Site Reliability Engineer Intern

NVIDIA
06.2018 - 09.2018

Software Engineering Intern

Big Data Federation
07.2017 - 09.2017

Bachelor of Science - Computer Science

University of California, Santa Cruz
09.2015 - 05.2019

Site Reliability Engineer

Apple
6 2019 - Current
Joshua SternSite Reliability Engineer