Heemo Yang - Site Reliability Engineer - SlingTV, Dish Network

Summary

Results-oriented Site Reliability Engineer with extensive hands-on experience in operating and scaling distributed middleware systems for high-traffic streaming platforms. Expertise in enhancing production reliability, developing observability workflows, and automating deployment and infrastructure processes. Strong emphasis on Kubernetes-based service delivery, AWS infrastructure management, performance testing, and achieving operational excellence across multi-environment systems. Committed to optimizing system performance and ensuring seamless user experiences through a robust understanding of cloud computing, software development, and advanced scripting and automation skills.

Overview

3

years of professional experience

Work History

Site Reliability Engineer

SlingTV, Dish Network

Denver

02.2023 - Current

Operate and support distributed middleware services handling 30K–200K requests per minute across 30+ microservices, ensuring high availability and reliability for streaming platforms serving millions of users
Participate in on-call rotation (1 week every ~2 months), triaging alerts, investigating latency/SLO breaches, and resolving production incidents across multiple environments
Led initiative to implement trace-to-log correlation by injecting trace identifiers into logs, significantly improving debugging speed and reducing time to root cause during incidents
Built and maintained Dynatrace dashboards, SLIs, and alerting rules, enabling proactive monitoring of latency, error rates, and service health across environments
Conducted load, stress, spike, and endurance testing using JMeter and BlazeMeter to validate system performance, identify bottlenecks, and guide capacity planning decisions
Performed right-sizing and HPA tuning based on performance test results, optimizing CPU/memory utilization and improving service scalability under varying traffic conditions
Onboarded and deployed multiple services to Kubernetes using Helm, ArgoCD, and GitLab CI/CD, configuring ingress routing, environment variables, and scaling policies
Investigated and escalated infrastructure issues (e.g., Istio misconfiguration causing unexpected scaling behavior), collaborating with platform teams to resolve system-wide inefficiencies
Contributed to AWS infrastructure provisioning using Terraform, creating and managing resources such as API Gateway, CloudFront, Lambda, DynamoDB, S3, and KMS
Designed Terraform configuration structure and deployment workflows, including environment-based config patterns, Makefile automation, and state recovery/import processes
Built internal automation tooling using Go and GitLab pipelines to manage repository approvals, branch rules, and operational tasks such as rolling restarts across namespaces
Led service onboarding and infrastructure setup for new platform initiative (DANY project), delivering environments, pipelines, and deployment workflows across multiple clusters within tight timelines
Drove improvements in deployment automation and operational workflows, reducing manual effort and improving consistency across environments
Created documentation and led knowledge-sharing sessions on performance testing, capacity planning, and chaos/fault testing practices, mentoring engineers transitioning into SRE roles

Education

Advanced Software Engineering Certificate -

Hack Reactor

Remote

08-2022

Bachelor of Science - Hospitality Management

University of Nevada, Las Vegas

06-2017

Skills

Reliability Engineering: Incident Response
On-call Operations
Production Debugging
Service Health Monitoring
Root Cause Analysis
High Availability
Observability: Dynatrace
Distributed Tracing
Metrics & Alerting
Log Analysis
Trace-to-Log Correlation
Performance Monitoring
Cloud & Infrastructure: AWS (API Gateway, CloudFront, Lambda, S3, DynamoDB, DAX, EventBridge, KMS, Keyspace)
Terraform
Infrastructure as Code
Kubernetes & Platform: Kubernetes
Helm
ArgoCD

Service Onboarding
HPA
Ingress Configuration
CI/CD & Automation: GitLab CI/CD
Pipeline Automation
Deployment Workflows
Go (automation scripting)
Bash
Performance Engineering: Load Testing
Stress Testing
Spike Testing
Endurance Testing
Capacity Planning
Right-Sizing
Log analysis
Scripting languages
Microservices architecture
Incident management

Languages

Korean

Native or Bilingual

Timeline