Summary

Overview

Work History

Education

Skills

Timeline

AMJAD SHAH

Summary

Senior Site Reliability Engineer with 8+ years of experience supporting enterprise Linux infrastructure and Kubernetes platforms (on-prem, AKS, EKS). Proven track record of building automation frameworks with Ansible and Python to operate large-scale, multi datacenter environments, improving system reliability, and enforcing security compliance. Experienced in incident response, root cause analysis, and zero-downtime infrastructure maintenance.

Overview

years of professional experience

Work History

Senior Site Reliability Engineer

Splunk

San Francisco, California

05.2022 - Current

Own reliability and operational readiness for Kubernetes clusters across on-prem, AKS, and EKS, driving stability, scalability, and security compliance for enterprise observability workloads.
Lead incident response (P0/P1), and perform deep-dive RCA, implementing durable fixes through Infrastructure-as-Code and operational runbook standardization.
Design and operate stateful Kubernetes storage (PV/PVC) using NFS, AWS EBS, Azure Disk, and Ceph, ensuring resilient application data persistence.
Troubleshoot complex workload and node issues (CrashLoopBackOff, scheduling failures, kubelet/network connectivity) using kubectl, container/runtime logs, and Prometheus/Grafana signals.
Implement and govern Kubernetes access controls (RBAC, ClusterRoles, RoleBindings, ServiceAccounts) across namespaces to enforce least privilege and separation of duties.
Maintain platform security posture: Vault integrations, secrets management, admission controls, and image scanning/policy enforcement to meet internal security baselines.
Manage the Docker container lifecycle (build, tag, scan, push) across private registries (Harbor, ECR, ACR), and partner with teams to remediate base image vulnerabilities (Ubuntu, Alpine, Distroless).
Deliver zero-downtime upgrades and maintenance activities; validate API server, etcd, kube-proxy, and node health post-change.
Drive onboarding reviews for apps and vendors onto Kubernetes—assessing manifests, networking, storage, and RBAC designs for compliance with platform standards.
Build automation using Python and internal tooling (FLO workflows/CLI + Git/GitLab) to standardize diagnostics, reduce manual effort, and improve traceability in Jira.

Senior Infrastructure Automation Engineer

Citi Bank

Irving, TX

03.2020 - 05.2022

Built and standardized enterprise automation using the Ansible Automation Platform/Tower across 1,000+ Linux and Windows servers in a multi-datacenter environment.
Architected patch orchestration workflows with readiness checks, controlled reboot sequencing, and post-validation improving patch consistency and reducing manual intervention.
Automated VMware lifecycle operations, including RHEL 6, 7, and 8 provisioning, templating, snapshots, disk expansion, and validation, using vSphere-aligned workflows.
Implemented security and compliance automation aligned with regulated standards (e.g., CIS hardening, baseline enforcement, configuration standardization).
Automated identity and access workflows Linux auth standardization, AD integration patterns, and controlled user lifecycle operations for audit readiness.
Developed operational visibility automation (health checks, inventory outputs, service validation) to support proactive remediation, and faster troubleshooting.
Implemented CI/CD-style automation practices with Git/GitLab for version-controlled playbooks, repeatable deployments, and safer change delivery.
Supported AWS infrastructure activities (EC2, IAM, ELB, Route 53, CloudWatch) and aligned automation patterns across hybrid environments.

Linux Systems Engineer

Halliburton

Houston, TX

11.2017 - 12.2019

Managed the full Linux server lifecycle (provisioning, patching, migration, decommissioning) across VMware-backed enterprise environments.
Led RHEL platform migrations (RHEL 6 → RHEL 7), and supported production stability through structured change management.
Automated recurring operations using Ansible ad-hoc, shell scripting, and cron reducing manual work, and improving consistency.
Implemented performance and availability improvements (NIC bonding, kernel tuning, RAID/LVM), and performed deep troubleshooting using network/system tools.
Maintained operational documentation and knowledge base entries in Confluence to improve team efficiency and response quality.

JR. Linux Admin

Dell Technologies

New York City, NY

10.2015 - 10.2017

Supported Linux and Windows servers across enterprise environments; performed OS installs, patching, routine maintenance, and backup verification.
Monitored infrastructure health across bare metal and ESXi, and resolved incidents via BMC Remedy with strong documentation practices.
Assisted with storage operations, including basic SAN/LUN exposure and LVM troubleshooting under senior engineer guidance.

Education

Linux Essentials -

C.T.T.C Cisco Academy

Pakistan

01.2015

BS - Computer Science

Al-Khair University

Pakistan

01.2008

Skills

SRE / Containers: Kubernetes (on-prem, AKS, EKS), RBAC, PV/PVC, Docker, Private registries (Harbor/ECR/ACR), Prometheus/Grafana
Automation / IaC: Ansible, Ansible Automation Platform / Tower, Bash, Python (automation), Git/GitLab, CI/CD (Jenkins)
Linux: Red Hat Enterprise Linux (RHEL 6/7/8), systemd, performance troubleshooting, package management, OS patching
Storage & Filesystems: LVM, RAID, NFS, XFS, EXT4
Virtualization: VMware vSphere, vCenter, ESXi, VM templates/cloning, snapshots, vMotion
Cloud: AWS (EC2, EBS, ELB, IAM, VPC, S3, CloudWatch, Route53)
Networking (working knowledge): TCP/IP, DNS, DHCP, SSH, NTP, troubleshooting (tcpdump, traceroute, netstat)
ITSM / Documentation: Jira, ServiceNow, BMC Remedy, Confluence

Timeline

Senior Site Reliability Engineer

Splunk

05.2022 - Current

Senior Infrastructure Automation Engineer

Citi Bank

03.2020 - 05.2022

Linux Systems Engineer

Halliburton

11.2017 - 12.2019

JR. Linux Admin

Dell Technologies

10.2015 - 10.2017

Linux Essentials -

C.T.T.C Cisco Academy

BS - Computer Science

Al-Khair University