Summary

Overview

Work History

Skills

Timeline

Michael Everett

San Clemente,CA

Summary

Dedicated and results-oriented Site Reliability Engineering Leader with a strong track record of steering high-impact initiatives across diverse global teams. Adept at shaping engineering cultures and providing strategic technical guidance, I have successfully led efforts in deploying business-critical infrastructure, implementing Service Level Objectives (SLOs), and driving cost optimization strategies. Recognized for orchestrating the migration of monolithic applications to microservices architecture, I possess a proven ability to enhance operational efficiency and reliability.

Overview

years of professional experience

Work History

Manager, Site Reliability Engineering

Curology

10.2021 - 09.2024

Managed a team of Site Reliability Engineers (SREs), providing mentorship, coaching, and guidance to foster a culture of continuous learning and improvement.
Oversaw the implementation, management, and continued improvement of all Curology, Infrastructure, Infrastructure as Code (Terraform and ACK), Kubernetes cluster (EKS) and deployment pipelines ( Github Action, ArgoCD, Helm)
Led the company wide effort to establish and enforce customer centric Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to ensure system reliability, uptime, and customer experience.
Manage priorities, projects, and the overall workflow of the SRE team while communicating goals, status, risks, and impact of the team’s work to relevant stakeholders and to the larger tech organization. This included organizing regular feedback cycles and roadmap discussions with stakeholders.
Led the creation of a new incident management processes, creating a culture of blameless root cause analyses and structured post-mortems.

Senior Site Reliability Engineer

ZEFR

10.2020 - 10.2021

Led initiatives to establish Service Level Objectives (SLOs) for all customer ML pipelines, enhancing visibility and accountability for the performance and reliability of our business-critical workflows.
Collaborated with engineering teams to ensure services are designed with reliability in mind, and provided guidance on the appropriate use of tooling and automation.
Team lead on the migration of all services out of Spinnaker and into Kubernetes. This included the creation of company wide standards for writing Helm charts, working with development teams to define the appropriate deployment strategies for each service, and coordinating with external stakeholders to insure zero downtime migrations occurred.
Provided bi-weekly mentoring sessions for SRE's and developers on Kubernetes, Helm, and infrastructure-as-code tools (Terraform, ACK, Crossplane ).

Lead Site Reliability Engineer

ModusBox

01.2020 - 10.2020

Managed and led a team of Site Reliability Engineers(SREs) in US and Ukraine, shaping the engineering culture, and providing technical guidance.
Collaborated with external stakeholders in West Africa, overseeing the deployment of all business-to-business payment infrastructure and services.
Led the effort to establish comprehensive Service Level Objectives (SLOs) for ModusBox and PortX services, working with development teams to define target and error budgets for each service.
Defined quarterly departmental Objectives and Key Results (OKRs), ensuring a smooth implementation of the OKR process and successful completion of objectives in collaboration with internal stakeholders.
Implemented chaos engineering using chaos mesh to identify failure domains and bottlenecks within services and infrastructure.

Senior Site Reliability Engineer

Blizzard Entertainment

03.2017 - 03.2020

Engineered and executed the design of immutable infrastructure and established a seamless zero-downtime green-blue deployment pipeline for Starcraft Remastered.
Championed and orchestrated a comprehensive mentorship initiative for junior Site Reliability Engineers, encompassing targeted training sessions and purposeful pairing sessions to foster skill development and enhance overall team capabilities.
Engineered Jenkins shared libraries to streamline the creation of immutable OpenStack infrastructure using Terraform, establishing a standardized approach for all Battle.net teams to deploy infrastructure seamlessly with Jenkins.
Devised and implemented a comprehensive load-testing strategy for all customer-facing websites and backend services, ensuring optimal performance and reliability.

DevOps Engineer

Blizzard Entertainment

05.2015 - 03.2017

Engineered and deployed an automated push-button system, facilitating the seamless deployment of all Battle.net Java applications.
This system adeptly orchestrates canary and green-blue deployments in both staging and production environments.
Designed and implemented Jenkins shared libraries to enhance the efficiency of creating and seamlessly publishing Docker images specifically tailored for Battle.net web services.
Collaborated with Battle.net leadership to cultivate a DevOps culture, elevating communication and collaboration between developers and operations teams.
This initiative resulted in an accelerated velocity and increased deployment frequency achieved through automation in the CI/CD process.

System Administrator

Blizzard Entertainment

10.2010 - 05.2015

Provisioned and configured infrastructure using Puppet, ensuring a continuous deployment of new features and applications within Blizzard's OpenStack and bare-metal environment.
Assisted in identifying underlying root causes and offered solutions or recommendations for long-term, preferably permanent, fixes to critical issues, whether in Production or Development environments.
Offered mentoring to junior NOC technicians, incorporating coding challenges designed to introduce them to Python and providing guidance on basic troubleshooting techniques for our production Linux environments.

System Administrator Manager

LunarPages

04.2007 - 10.2010

Led a team of seven administrators accountable for the day-to-day administration of four data centers across the United States offering support to dedicated hosting customers, encompassing the resolution, configuration, and updating of all Red Hat Enterprise Linux (RHEL) based servers.
Conducted bi-yearly reviews, offering targeted feedback and setting goals to facilitate individual growth and development, aligning with the performance expectations of the role.
Participated in a 24x7 on-call rotation, serving as the primary point of contact for incident response.

Skills

Ifrastructure-as-Code: Terraform, Terragrunt, ACK, Crossplane, Ansible, Puppet
Containerization: Docker, Kubernetes, Rancher(K3s)
Languages: Python, Go, Lua, Powershell

Clouds: AWS, GCP, OpenStack
CI/CD: ArgoCD, Flux, Jenkins, Github Actions, Spinnaker, Octopus
Monitoring: DataDog, Grafana, New Relic

Timeline

Manager, Site Reliability Engineering

Curology

10.2021 - 09.2024

Senior Site Reliability Engineer

ZEFR

10.2020 - 10.2021

Lead Site Reliability Engineer

ModusBox

01.2020 - 10.2020

Senior Site Reliability Engineer

Blizzard Entertainment

03.2017 - 03.2020

DevOps Engineer

Blizzard Entertainment

05.2015 - 03.2017

System Administrator

Blizzard Entertainment

10.2010 - 05.2015

System Administrator Manager

LunarPages

04.2007 - 10.2010