

Senior Site Reliability Engineer with 8+ years of experience supporting enterprise Linux infrastructure and Kubernetes platforms (on-prem, AKS, EKS). Proven track record of building automation frameworks with Ansible and Python to operate large-scale, multi datacenter environments, improving system reliability, and enforcing security compliance. Experienced in incident response, root cause analysis, and zero-downtime infrastructure maintenance.
SRE / Containers: Kubernetes (on-prem, AKS, EKS), RBAC, PV/PVC, Docker, Private registries (Harbor/ECR/ACR), Prometheus/Grafana
Automation / IaC: Ansible, Ansible Automation Platform / Tower, Bash, Python (automation), Git/GitLab, CI/CD (Jenkins)
Linux: Red Hat Enterprise Linux (RHEL 6/7/8), systemd, performance troubleshooting, package management, OS patching
Storage & Filesystems: LVM, RAID, NFS, XFS, EXT4
Virtualization: VMware vSphere, vCenter, ESXi, VM templates/cloning, snapshots, vMotion
Cloud: AWS (EC2, EBS, ELB, IAM, VPC, S3, CloudWatch, Route53)
Networking (working knowledge): TCP/IP, DNS, DHCP, SSH, NTP, troubleshooting (tcpdump, traceroute, netstat)
ITSM / Documentation: Jira, ServiceNow, BMC Remedy, Confluence