
Senior Site Reliability Engineering professional with over 19 years of experience specializing in production reliability, incident management, and automation-first SRE practices across telecom, global banking, and healthcare domains. Proven track record in leading cross-functional responses to critical incidents, driving root cause analysis, and mentoring engineering teams to enhance operational maturity and resilience.
Expertise in Azure, AWS, and Kubernetes, with extensive experience in designing and operating scalable, reliable, and compliant systems within highly regulated environments. Recognized for a proactive approach, adaptability, and exceptional problem-solving skills, consistently leveraging new technologies to foster team success and contribute to organizational growth.
Azure Cloud Solutions
Amazon Web Services
Kubernetes management
Infrastructure as code
System monitoring [Appdynamics, Splunk, Datadog]
Incident management [JIRA, Remedy, Service Now]
Scripting [Python, Shell]
ITIL framework
Log analysis
Performance tuning
Continuous integration
Continuous deployment