Maintain and managed 1300+ virtual and BareMetal Unix and Linux servers in a 32-rack datacenter and in GCP.
This included, software installation, patch application, file-system management, performance monitoring on Sun Solaris, Red hat, CentOS, Debian, and Ubuntu.
Responsible for SSL creation and updates on Apache, Tomcat, and Nginx web servers.
Troubleshoot and resolve service and host alerts.
Performed off-hours maintenance activities as scheduled.
Responsible for 24x7 production support in team on-call rotation.
SysOps Site Reliability Engineer II
Cayuse, LLC
03.2023 - 03.2024
Managed and maintained 1000+ AWS EC2 instances in 20+ AWS accounts spanning 7 counties globally.
Supported over 600 customers and 15 different company applications on a variety of OS’s including Amazon Linux, Ubuntu, Debian, and Windows.
Managed Oracle, Postgres, and MySQL databases.
Partnered directly with development, QA, and product teams for releases, patches, bug fixes, and updates to Cayuse software using BitBucket and Terraform.
Triaged and worked customer tickets using Jira.
Performed root cause analysis and incident response for outages.
Participated in on-call rotation to provide 24/7 support.
Documented troubleshooting and resolution processes for many common issues.
Performed installations, updates, and troubleshooting of various Cayuse software.
Customer Reliability Engineer
Astronomer, Inc
01.2021 - 01.2023
Functioned as an Apache Airflow and Infrastructure Engineer supporting 300+ customers running on Kubernetes clusters.
Triaged and prioritized Zendesk tickets from customers with SLAs in mind.
Collaborated directly via video and in writing with SaaS and enterprise customers to troubleshoot and resolve various application and network issues.
Partnered directly with the development, product, and field engineering teams with customer on-boarding, feature releases, and bug fixes.
Familiar to expert with all three cloud providers AWS (EKS), GCP (GKE), and Azure (AKS).
Used alert monitoring and metrics software daily to assist in troubleshooting.
Documented troubleshooting and resolution processes for many common issues.
Performed installations, updates, and troubleshooting using Kubernetes, Helm, and Docker containers daily.
Assisted the QA department in testing new features and bug fixes.
Systems Engineer and Site Reliability Engineer
FRONTLINE EDUCATION
01.2016 - 10.2020
Functioned as Systems Engineer and Site Reliability Engineer at Teachscape after it was acquired by Frontline Education.
Expanded the footprint into AWS by creating immutable code using Terraform.
Successfully consolidated and migrated two datacenters; designed and implemented a disaster recovery site.
Built systems on VMware ESX and partnered with the development team to create feature releases and bug fixes.
Composed and documented all processes and procedures pertaining to day-to-day operations and projects.
Technical Operations Engineer
TEACHSCAPE, INC
San Francisco, CA
04.2012 - 01.2016
Managed and maintained three datacenters, Amazon EC2, consisting of 250+ servers running Red Hat, CentOS, Debian, and Ubuntu.
Performed installations, configurations, and troubleshooting of application servers made up of Apache, Tomcat, JBoss, and Jetty.
Successfully installed and maintained database servers, Oracle 11g, MySQL 5.1/5.6, Neo4j, and MongoDB.
Wrote automation scripts for backup while archiving all databases.
Acted as temporary DBA and was instrumental in developing the SOP and DR plans.
Controlled the network infrastructure, which consisted of Cisco CSS, ASA, and HP ProCurve switches at one datacenter and Riverbed Stingray Traffic Manager at the others.
Utilized Git, Jenkins, and Ansible for application and server provisioning and deployments.
Instituted monitoring and logging while using Nagios, Splunk, and New Relic.
Partnered with development, product management, and QA for application upgrades, bug fixes, and releases.
Senior Unix Systems Administrator
SCIENTIFIC LEARNING CORPORATION
Oakland, CA
02.2007 - 03.2012
Commended by leadership for expanding all aspects of the 500+ sever production datacenter, which included hardware, software installations, and configuration of application servers running on Red Hat EL 6 utilizing Apache, Java, and Tomcat.
Set up and expanded the SAN network with NetApp 3160 and 2050 filers.
Helped plan and implement the migration from EMC to NetApp storage.
Managed, upgraded, and migrated all production Linux systems from Red Hat 3 to 6 and applications from Java 5 to 6.
Built out new servers, including compiling and deploying Java application servers.
Spearheaded company’s initiative with design, implementation, and expansion into Amazon AWS.
Leveraged Amazon’s AMI, load balancer, S3, EC2, and EBS volumes for a highly redundant available SaaS platform.
Played a key role in moving the company’s network from a single core 100MB network to a 1GB multi-core redundant network utilizing Cisco ASA firewalls, F5 LTM load-balancers, and 6509 switches.
Migrated data center application servers from physical to virtual using VMware vSphere 5.
Maintained the Aruba wireless network, Aventail SSLVPN, DNS names, and SSL certs.
Migrated domain registrars and SSL providers for centralized management.
Collaborated with development, business system, web team, and marketing on upgrade and product releases.
Successfully set up the company’s monitoring system utilizing Nagios and Cacti.
Integral in migrating the monitoring system to SolarWinds and Splunk.
Established automation for the synchronization of production servers using rsync and Puppet.
Web Marketing & Localization Program Manager at ALLDATA, AN AUTOZONE COMPANYWeb Marketing & Localization Program Manager at ALLDATA, AN AUTOZONE COMPANY