Summary
Overview
Work History
Education
Skills
Languages
Timeline
Generic
Heemo Yang

Heemo Yang

Denver,CO

Summary

Results-oriented Site Reliability Engineer with extensive hands-on experience in operating and scaling distributed middleware systems for high-traffic streaming platforms. Expertise in enhancing production reliability, developing observability workflows, and automating deployment and infrastructure processes. Strong emphasis on Kubernetes-based service delivery, AWS infrastructure management, performance testing, and achieving operational excellence across multi-environment systems. Committed to optimizing system performance and ensuring seamless user experiences through a robust understanding of cloud computing, software development, and advanced scripting and automation skills.

Overview

3
3
years of professional experience

Work History

Site Reliability Engineer

SlingTV, Dish Network
Denver
02.2023 - Current
  • Operate and support distributed middleware services handling 30K–200K requests per minute across 30+ microservices, ensuring high availability and reliability for streaming platforms serving millions of users
  • Participate in on-call rotation (1 week every ~2 months), triaging alerts, investigating latency/SLO breaches, and resolving production incidents across multiple environments
  • Led initiative to implement trace-to-log correlation by injecting trace identifiers into logs, significantly improving debugging speed and reducing time to root cause during incidents
  • Built and maintained Dynatrace dashboards, SLIs, and alerting rules, enabling proactive monitoring of latency, error rates, and service health across environments
  • Conducted load, stress, spike, and endurance testing using JMeter and BlazeMeter to validate system performance, identify bottlenecks, and guide capacity planning decisions
  • Performed right-sizing and HPA tuning based on performance test results, optimizing CPU/memory utilization and improving service scalability under varying traffic conditions
  • Onboarded and deployed multiple services to Kubernetes using Helm, ArgoCD, and GitLab CI/CD, configuring ingress routing, environment variables, and scaling policies
  • Investigated and escalated infrastructure issues (e.g., Istio misconfiguration causing unexpected scaling behavior), collaborating with platform teams to resolve system-wide inefficiencies
  • Contributed to AWS infrastructure provisioning using Terraform, creating and managing resources such as API Gateway, CloudFront, Lambda, DynamoDB, S3, and KMS
  • Designed Terraform configuration structure and deployment workflows, including environment-based config patterns, Makefile automation, and state recovery/import processes
  • Built internal automation tooling using Go and GitLab pipelines to manage repository approvals, branch rules, and operational tasks such as rolling restarts across namespaces
  • Led service onboarding and infrastructure setup for new platform initiative (DANY project), delivering environments, pipelines, and deployment workflows across multiple clusters within tight timelines
  • Drove improvements in deployment automation and operational workflows, reducing manual effort and improving consistency across environments
  • Created documentation and led knowledge-sharing sessions on performance testing, capacity planning, and chaos/fault testing practices, mentoring engineers transitioning into SRE roles

Education

Advanced Software Engineering Certificate -

Hack Reactor
Remote
08-2022

Bachelor of Science - Hospitality Management

University of Nevada, Las Vegas
06-2017

Skills

  • Reliability Engineering: Incident Response
  • On-call Operations
  • Production Debugging
  • Service Health Monitoring
  • Root Cause Analysis
  • High Availability
  • Observability: Dynatrace
  • Distributed Tracing
  • Metrics & Alerting
  • Log Analysis
  • Trace-to-Log Correlation
  • Performance Monitoring
  • Cloud & Infrastructure: AWS (API Gateway, CloudFront, Lambda, S3, DynamoDB, DAX, EventBridge, KMS, Keyspace)
  • Terraform
  • Infrastructure as Code
  • Kubernetes & Platform: Kubernetes
  • Helm
  • ArgoCD
  • Service Onboarding
  • HPA
  • Ingress Configuration
  • CI/CD & Automation: GitLab CI/CD
  • Pipeline Automation
  • Deployment Workflows
  • Go (automation scripting)
  • Bash
  • Performance Engineering: Load Testing
  • Stress Testing
  • Spike Testing
  • Endurance Testing
  • Capacity Planning
  • Right-Sizing
  • Log analysis
  • Scripting languages
  • Microservices architecture
  • Incident management

Languages

Korean
Native or Bilingual

Timeline

Site Reliability Engineer

SlingTV, Dish Network
02.2023 - Current

Bachelor of Science - Hospitality Management

University of Nevada, Las Vegas

Advanced Software Engineering Certificate -

Hack Reactor
Heemo Yang