Overview
Work History
Websites
Timeline
Generic

Steven J. Smith

Pahrump,NV

Overview

18
18
years of professional experience

Work History

Senior Systems Administrator

AllData (AutoZone)
Elk Grove, CA
06.2024 - 02.2025
  • Maintain and managed 1300+ virtual and BareMetal Unix and Linux servers in a 32-rack datacenter and in GCP.
  • This included, software installation, patch application, file-system management, performance monitoring on Sun Solaris, Red hat, CentOS, Debian, and Ubuntu.
  • Responsible for SSL creation and updates on Apache, Tomcat, and Nginx web servers.
  • Troubleshoot and resolve service and host alerts.
  • Performed off-hours maintenance activities as scheduled.
  • Responsible for 24x7 production support in team on-call rotation.

SysOps Site Reliability Engineer II

Cayuse, LLC
03.2023 - 03.2024
  • Managed and maintained 1000+ AWS EC2 instances in 20+ AWS accounts spanning 7 counties globally.
  • Supported over 600 customers and 15 different company applications on a variety of OS’s including Amazon Linux, Ubuntu, Debian, and Windows.
  • Managed Oracle, Postgres, and MySQL databases.
  • Partnered directly with development, QA, and product teams for releases, patches, bug fixes, and updates to Cayuse software using BitBucket and Terraform.
  • Triaged and worked customer tickets using Jira.
  • Performed root cause analysis and incident response for outages.
  • Participated in on-call rotation to provide 24/7 support.
  • Documented troubleshooting and resolution processes for many common issues.
  • Performed installations, updates, and troubleshooting of various Cayuse software.

Customer Reliability Engineer

Astronomer, Inc
01.2021 - 01.2023
  • Functioned as an Apache Airflow and Infrastructure Engineer supporting 300+ customers running on Kubernetes clusters.
  • Triaged and prioritized Zendesk tickets from customers with SLAs in mind.
  • Collaborated directly via video and in writing with SaaS and enterprise customers to troubleshoot and resolve various application and network issues.
  • Partnered directly with the development, product, and field engineering teams with customer on-boarding, feature releases, and bug fixes.
  • Familiar to expert with all three cloud providers AWS (EKS), GCP (GKE), and Azure (AKS).
  • Used alert monitoring and metrics software daily to assist in troubleshooting.
  • Documented troubleshooting and resolution processes for many common issues.
  • Performed installations, updates, and troubleshooting using Kubernetes, Helm, and Docker containers daily.
  • Assisted the QA department in testing new features and bug fixes.

Systems Engineer and Site Reliability Engineer

FRONTLINE EDUCATION
01.2016 - 10.2020
  • Functioned as Systems Engineer and Site Reliability Engineer at Teachscape after it was acquired by Frontline Education.
  • Expanded the footprint into AWS by creating immutable code using Terraform.
  • Successfully consolidated and migrated two datacenters; designed and implemented a disaster recovery site.
  • Built systems on VMware ESX and partnered with the development team to create feature releases and bug fixes.
  • Composed and documented all processes and procedures pertaining to day-to-day operations and projects.

Technical Operations Engineer

TEACHSCAPE, INC
San Francisco, CA
04.2012 - 01.2016
  • Managed and maintained three datacenters, Amazon EC2, consisting of 250+ servers running Red Hat, CentOS, Debian, and Ubuntu.
  • Performed installations, configurations, and troubleshooting of application servers made up of Apache, Tomcat, JBoss, and Jetty.
  • Successfully installed and maintained database servers, Oracle 11g, MySQL 5.1/5.6, Neo4j, and MongoDB.
  • Wrote automation scripts for backup while archiving all databases.
  • Acted as temporary DBA and was instrumental in developing the SOP and DR plans.
  • Controlled the network infrastructure, which consisted of Cisco CSS, ASA, and HP ProCurve switches at one datacenter and Riverbed Stingray Traffic Manager at the others.
  • Utilized Git, Jenkins, and Ansible for application and server provisioning and deployments.
  • Instituted monitoring and logging while using Nagios, Splunk, and New Relic.
  • Partnered with development, product management, and QA for application upgrades, bug fixes, and releases.

Senior Unix Systems Administrator

SCIENTIFIC LEARNING CORPORATION
Oakland, CA
02.2007 - 03.2012
  • Commended by leadership for expanding all aspects of the 500+ sever production datacenter, which included hardware, software installations, and configuration of application servers running on Red Hat EL 6 utilizing Apache, Java, and Tomcat.
  • Set up and expanded the SAN network with NetApp 3160 and 2050 filers.
  • Helped plan and implement the migration from EMC to NetApp storage.
  • Managed, upgraded, and migrated all production Linux systems from Red Hat 3 to 6 and applications from Java 5 to 6.
  • Built out new servers, including compiling and deploying Java application servers.
  • Spearheaded company’s initiative with design, implementation, and expansion into Amazon AWS.
  • Leveraged Amazon’s AMI, load balancer, S3, EC2, and EBS volumes for a highly redundant available SaaS platform.
  • Played a key role in moving the company’s network from a single core 100MB network to a 1GB multi-core redundant network utilizing Cisco ASA firewalls, F5 LTM load-balancers, and 6509 switches.
  • Migrated data center application servers from physical to virtual using VMware vSphere 5.
  • Maintained the Aruba wireless network, Aventail SSLVPN, DNS names, and SSL certs.
  • Migrated domain registrars and SSL providers for centralized management.
  • Collaborated with development, business system, web team, and marketing on upgrade and product releases.
  • Successfully set up the company’s monitoring system utilizing Nagios and Cacti.
  • Integral in migrating the monitoring system to SolarWinds and Splunk.
  • Established automation for the synchronization of production servers using rsync and Puppet.

Timeline

Senior Systems Administrator

AllData (AutoZone)
06.2024 - 02.2025

SysOps Site Reliability Engineer II

Cayuse, LLC
03.2023 - 03.2024

Customer Reliability Engineer

Astronomer, Inc
01.2021 - 01.2023

Systems Engineer and Site Reliability Engineer

FRONTLINE EDUCATION
01.2016 - 10.2020

Technical Operations Engineer

TEACHSCAPE, INC
04.2012 - 01.2016

Senior Unix Systems Administrator

SCIENTIFIC LEARNING CORPORATION
02.2007 - 03.2012
Steven J. Smith