Summary
Overview
Work History
Education
Skills
Timeline
Generic
AMJAD SHAH

AMJAD SHAH

NJ

Summary

Senior Site Reliability Engineer with 8+ years of experience supporting enterprise Linux infrastructure and Kubernetes platforms (on-prem, AKS, EKS). Proven track record of building automation frameworks with Ansible and Python to operate large-scale, multi datacenter environments, improving system reliability, and enforcing security compliance. Experienced in incident response, root cause analysis, and zero-downtime infrastructure maintenance.

Overview

10
10
years of professional experience

Work History

Senior Site Reliability Engineer

Splunk
San Francisco, California
05.2022 - Current
  • Own reliability and operational readiness for Kubernetes clusters across on-prem, AKS, and EKS, driving stability, scalability, and security compliance for enterprise observability workloads.
  • Lead incident response (P0/P1), and perform deep-dive RCA, implementing durable fixes through Infrastructure-as-Code and operational runbook standardization.
  • Design and operate stateful Kubernetes storage (PV/PVC) using NFS, AWS EBS, Azure Disk, and Ceph, ensuring resilient application data persistence.
  • Troubleshoot complex workload and node issues (CrashLoopBackOff, scheduling failures, kubelet/network connectivity) using kubectl, container/runtime logs, and Prometheus/Grafana signals.
  • Implement and govern Kubernetes access controls (RBAC, ClusterRoles, RoleBindings, ServiceAccounts) across namespaces to enforce least privilege and separation of duties.
  • Maintain platform security posture: Vault integrations, secrets management, admission controls, and image scanning/policy enforcement to meet internal security baselines.
  • Manage the Docker container lifecycle (build, tag, scan, push) across private registries (Harbor, ECR, ACR), and partner with teams to remediate base image vulnerabilities (Ubuntu, Alpine, Distroless).
  • Deliver zero-downtime upgrades and maintenance activities; validate API server, etcd, kube-proxy, and node health post-change.
  • Drive onboarding reviews for apps and vendors onto Kubernetes—assessing manifests, networking, storage, and RBAC designs for compliance with platform standards.
  • Build automation using Python and internal tooling (FLO workflows/CLI + Git/GitLab) to standardize diagnostics, reduce manual effort, and improve traceability in Jira.

Senior Infrastructure Automation Engineer

Citi Bank
Irving, TX
03.2020 - 05.2022
  • Built and standardized enterprise automation using the Ansible Automation Platform/Tower across 1,000+ Linux and Windows servers in a multi-datacenter environment.
  • Architected patch orchestration workflows with readiness checks, controlled reboot sequencing, and post-validation improving patch consistency and reducing manual intervention.
  • Automated VMware lifecycle operations, including RHEL 6, 7, and 8 provisioning, templating, snapshots, disk expansion, and validation, using vSphere-aligned workflows.
  • Implemented security and compliance automation aligned with regulated standards (e.g., CIS hardening, baseline enforcement, configuration standardization).
  • Automated identity and access workflows Linux auth standardization, AD integration patterns, and controlled user lifecycle operations for audit readiness.
  • Developed operational visibility automation (health checks, inventory outputs, service validation) to support proactive remediation, and faster troubleshooting.
  • Implemented CI/CD-style automation practices with Git/GitLab for version-controlled playbooks, repeatable deployments, and safer change delivery.
  • Supported AWS infrastructure activities (EC2, IAM, ELB, Route 53, CloudWatch) and aligned automation patterns across hybrid environments.

Linux Systems Engineer

Halliburton
Houston, TX
11.2017 - 12.2019
  • Managed the full Linux server lifecycle (provisioning, patching, migration, decommissioning) across VMware-backed enterprise environments.
  • Led RHEL platform migrations (RHEL 6 → RHEL 7), and supported production stability through structured change management.
  • Automated recurring operations using Ansible ad-hoc, shell scripting, and cron reducing manual work, and improving consistency.
  • Implemented performance and availability improvements (NIC bonding, kernel tuning, RAID/LVM), and performed deep troubleshooting using network/system tools.
  • Maintained operational documentation and knowledge base entries in Confluence to improve team efficiency and response quality.

JR. Linux Admin

Dell Technologies
New York City, NY
10.2015 - 10.2017
  • Supported Linux and Windows servers across enterprise environments; performed OS installs, patching, routine maintenance, and backup verification.
  • Monitored infrastructure health across bare metal and ESXi, and resolved incidents via BMC Remedy with strong documentation practices.
  • Assisted with storage operations, including basic SAN/LUN exposure and LVM troubleshooting under senior engineer guidance.

Education

Linux Essentials -

C.T.T.C Cisco Academy
Pakistan
01.2015

BS - Computer Science

Al-Khair University
Pakistan
01.2008

Skills

SRE / Containers: Kubernetes (on-prem, AKS, EKS), RBAC, PV/PVC, Docker, Private registries (Harbor/ECR/ACR), Prometheus/Grafana
Automation / IaC: Ansible, Ansible Automation Platform / Tower, Bash, Python (automation), Git/GitLab, CI/CD (Jenkins)
Linux: Red Hat Enterprise Linux (RHEL 6/7/8), systemd, performance troubleshooting, package management, OS patching
Storage & Filesystems: LVM, RAID, NFS, XFS, EXT4
Virtualization: VMware vSphere, vCenter, ESXi, VM templates/cloning, snapshots, vMotion
Cloud: AWS (EC2, EBS, ELB, IAM, VPC, S3, CloudWatch, Route53)
Networking (working knowledge): TCP/IP, DNS, DHCP, SSH, NTP, troubleshooting (tcpdump, traceroute, netstat)
ITSM / Documentation: Jira, ServiceNow, BMC Remedy, Confluence

Timeline

Senior Site Reliability Engineer

Splunk
05.2022 - Current

Senior Infrastructure Automation Engineer

Citi Bank
03.2020 - 05.2022

Linux Systems Engineer

Halliburton
11.2017 - 12.2019

JR. Linux Admin

Dell Technologies
10.2015 - 10.2017

Linux Essentials -

C.T.T.C Cisco Academy

BS - Computer Science

Al-Khair University
AMJAD SHAH