
Dynamic DevOps and Site Reliability Engineer with over 9 years of experience in designing, automating, and managing large-scale cloud-native infrastructures across AWS, Azure, GCP, and on-premises Kubernetes clusters. Expertise in Kubernetes (200+ applications), Kafka, CI/CD, Infrastructure as Code (IaC), and SRE practices has driven significant improvements in system reliability and reductions in Mean Time to Recovery (MTTR). Proven track record of building scalable, highly available systems within hybrid environments while leveraging AI-assisted DevOps automation platforms that integrate tools such as Jira, Jenkins, Kubernetes, PostgreSQL, and LLM-based intent parsing. Committed to enhancing production reliability through innovative AI-driven monitoring and anomaly detection solutions that streamline CI/CD operations.
Cloud Platforms: AWS (EKS, EC2, RDS, S3, VPC, IAM), Azure, GCP
Containerization & Orchestration: Docker, Kubernetes, Helm, Istio, OpenShift
Streaming & Messaging: Apache Kafka (topics, partitions, ACLs, consumer groups, lag monitoring)
CI/CD & DevOps Automation: Jenkins, Bitbucket, Azure DevOps, Argo CD (GitOps), GitHub Enterprise, CI/CD Pipelines, Deployment Automation
Build Tools: Maven, Gradle
Infrastructure as Code: Terraform, CloudFormation, Ansible, Chef, Helm Charts
Monitoring & Observability: Prometheus, Grafana, ELK Stack, OpenSearch, Datadog, AppDynamics
SRE: Incident Response, On-call Support, RCA, Runbooks, SLIs/SLOs
Security: DevSecOps, IAM, Secure CI/CD, Compliance Automation
Programming & Scripting: Python, Java, Shell Scripting, SQL
Databases: Oracle, PostgreSQL, MySQL, MongoDB
Systems & Networking: Linux, TCP/IP, DNS, Load Balancing, HTTP/HTTPS, TLS
Operating Systems: Linux, Unix, Windows, macOS
AI & Automation Tools: GitHub Copilot, OpenAI APIs, LLM Integration, NLP-based Service Parsing, AI-driven DevOps Automation, AIOps Monitoring
Incident management