Thomas Lynch

St. Louis

Summary

Principal Engineer with deep experience leading DevOps and Site Reliability transformations across large-scale, data-intensive, and mission-critical platforms. Proven production readiness steward with a track record of partnering across development, product, and operations teams to design, deploy, and operate highly available, secure, and observable systems. Expert in shifting reliability left through automation, CI/CD enablement, incident response maturity, and observability architecture. Known for calm, systematic incident leadership, blameless post-mortems, and improving platform resilience, velocity, and customer experience across globally distributed environments.

Overview

years of professional experience

Work History

Lead Platform Engineer / BizOps

Nestlé Purina

01.2023 - Current

Served as production readiness steward for enterprise cloud platforms across AWS, Azure, and GCP supporting high-availability digital properties.
Led design and rollout of a developer experience and platform enablement ecosystem (Backstage, GitHub, GitHub Actions), increasing adoption of standardized automation by 45% and reducing time-to-delivery by ~30%.
Drove adoption of golden-path infrastructure patterns, improving API and microservice discoverability and reuse, saving thousands of engineering hours and approximately $1M annually in external agency development costs.
Architected and implemented a centralized observability platform, reducing mean time to detect incidents by ~3 days and mean time to recover by ~1 week across previously opaque platforms.
Enabled faster root cause analysis, compliance visibility, and health monitoring through unified metrics and logging, significantly improving operational confidence.
Designed and led refactoring of a legacy platform, delivering ~$500K in annual cost savings through improved efficiency and reduced operational overhead.
Architected and intrasourced multiple platforms and sites, optimizing delivery models and resource utilization, resulting in ~$3.5M in annualized cost savings.
Established enterprise incident response processes with runbooks, escalation paths, change management controls, and blameless post-mortems, shifting reliability from reactive to proactive.
Implemented ITSM feedback loops by analyzing incident and change data to identify resiliency gaps, inform platform improvements, and reduce repeat incidents.
Supported CI/CD operational gating and release readiness processes to ensure quality, stability, and compliance before promotion to higher environments.
Partnered with global development, product, and operations teams to shift reliability, change management, and operational requirements left into system design and delivery.

Senior Platform Engineer / Site Reliability Engineer

National Geospatial-Intelligence Agency (NGA)

01.2018 - 01.2023

Supported mission-critical, globally distributed AI/ML and computer vision platforms with strict availability, security, and performance requirements.
Improved production readiness and operational reliability, reducing recurring failure patterns and increasing platform stability.
Designed and operationalized MLOps workflows improving deployment reliability and reducing manual intervention.
Enhanced observability and monitoring, accelerating detection of failure conditions and root cause analysis.
Led and participated in incident response and post-incident reviews, contributing to improved recovery times and mission continuity.
Automated deployment and operational workflows under strict security and compliance constraints.
Partnered with engineers and stakeholders to integrate operational and reliability concerns earlier into system design.

Data Scientist / Geospatial Analytics Engineer

Patch Terra Geoexploration

01.2014 - 01.2018

Developed geospatial analytics and data science solutions supporting geoexploration initiatives.
Built data pipelines for large spatial datasets using Python.
Operationalized analytics workflows with a focus on reliability and reproducibility.

Geospatial Intelligence (GEOINT) Imagery Analyst

United States Army

01.2009 - 01.2014

Conducted geospatial and imagery analysis in support of operational and strategic missions.
Produced time-sensitive intelligence products under high-pressure conditions.
Maintained strict accuracy, reliability, and security standards.

Education

Bachelor of Arts - Anthropology

Florida Atlantic University

Boca Raton, FL

12-2014

Skills

Cloud infrastructure management
DevOps methodologies
Observability

Change Management,
SRE Practices

Timeline