Summary
Overview
Work History
Education
Skills
Timeline
AdministrativeAssistant

AMIT VARSHNEY

Lead SRE
Pune

Summary

Senior Site Reliability Engineering professional with over 19 years of experience specializing in production reliability, incident management, and automation-first SRE practices across telecom, global banking, and healthcare domains. Proven track record in leading cross-functional responses to critical incidents, driving root cause analysis, and mentoring engineering teams to enhance operational maturity and resilience.

Expertise in Azure, AWS, and Kubernetes, with extensive experience in designing and operating scalable, reliable, and compliant systems within highly regulated environments. Recognized for a proactive approach, adaptability, and exceptional problem-solving skills, consistently leveraging new technologies to foster team success and contribute to organizational growth.

Overview

15
15
years of professional experience

Work History

Lead SRE

Roche Information Solutions
12.2021 - Current
  • Owned end-to-end production reliability for large-scale systems, ensuring high availability, performance, and capacity planning across cloud-native platforms.
  • Led P1/Critical incident response as primary escalation point, coordinating cross-functional teams, driving RCA, and implementing long-term preventive measures.
  • Reduced operational toil through automation-first SRE practices, including alert-driven workflows, automated incident creation, and self-healing mechanisms.
  • Designed and deployed high signal-to-noise monitoring and observability across metrics, logs, and traces for proactive detection and faster MTTR.
  • Built and maintained Kubernetes platform reliability, including cluster stability, resource optimization, and upgrade readiness.
  • Developed custom automation tools and scripts to eliminate manual processes and improve engineering productivity at scale.
  • Optimized cloud infrastructure costs by improving resource utilization and eliminating inefficiencies across compute, storage, and networking.
  • Partnered with security and compliance teams to meet regulatory requirements while clearly communicating reliability risks and trade-offs.
  • Drove continuous improvement of incident management and SRE processes through runbook, postmortems, and operational reviews.
  • Fostered strong collaboration between development, platform, and operations teams, influencing system design for reliability and scalability.

Technical Consultant (HSBC Bank)

Wipro Technology
03.2019 - 11.2021
  • Led 24×7 reliability operations for a mission-critical digital banking platform, serving as primary escalation point with on-call and Officer-in-Charge (OIC) responsibilities to ensure continuous availability.
  • Owned P1 / critical incident management, driving rapid triage, cross-functional coordination, RCA, and post-incident remediation in compliance with strict financial regulatory standards.
  • Reduced MTTR and operational risk by automating application health checks, operational workflows, and reporting processes.
  • Designed and implemented observability and alerting strategies using APM and log analytics platforms (e.g., AppDynamics, Splunk) for proactive issue detection.
  • Built high signal-to-noise dashboards and health rules to minimize service impact and prevent outages before customer-facing degradation.
  • Provided technical leadership for a distributed team of ~40 engineers, ensuring consistent production ownership and 24×7 operational coverage.
  • Mentored engineers across shift teams, improving incident response quality, compliance awareness, and reliability engineering maturity.
  • Established operational readiness and knowledge management practices, including runbook, KB articles, and regular cross-shift knowledge transfer sessions.
  • Partnered with multiple platform and application teams during PIRs and RCA reviews, implementing safeguards to prevent incident recurrence.
  • Drove a culture of calm, data-driven decision-making during high-severity incidents, ensuring fast recovery and minimized customer impact.

Deputy Manager - Operations (Vodafone UK)

VOIS (Vodafone Intelligent Solutions)
05.2014 - 03.2019
  • Designed and operated highly available, scalable AWS infrastructure using EC2, Elastic Load Balancers, Auto Scaling, and infrastructure-as-code to meet enterprise reliability requirements.
  • Built and maintained end-to-end CI/CD pipelines across DEV, SIT, PAT, and PROD using Jenkins, Git, Maven, SonarQube, and Nexus for compliant and repeatable releases.
  • Led production release management for containerized microservices, executing automated deployments to Amazon ECS with predictable, low-risk go-lives.
  • Standardized infrastructure configuration and resiliency using centralized CloudFormation templates to manage multiple microservices.
  • Implemented data protection and backup strategies, including automated Amazon RDS snapshots and artifact backups to Amazon S3.
  • Owned observability and performance monitoring, creating CloudWatch dashboards, alerts, and metrics for early detection of infrastructure and application issues.
  • Implemented synthetic monitoring and user journey checks to continuously validate application functionality and availability post-deployment.
  • Played a key role in major incident triage and recovery, coordinating investigations and steering services to resolution during customer-impacting events.
  • Reviewed change records, technical implementation plans, and release documentation, driving continuous improvement in deployment and reliability practices.
  • Partnered with cross-vendor engineering teams (Accenture, Infosys, Tech Mahindra) on architecture design, requirements, and delivery for large-scale digital platforms.

Deputy Manager - Operations (Vodafone Spain)

VOIS (Vodafone Intelligent Solutions)
05.2014 - 03.2019
  • Led end-to-end service delivery and solution ownership for complex requirements impacting multiple enterprise platforms, ensuring reliability, scalability, and operational readiness.
  • Performed requirements gathering, impact analysis, and feasibility studies, translating business needs into resilient solutions across distributed OSS/BSS systems.
  • Designed end-to-end solution architecture and operational workflows, authoring HLDs, LLDs, sequence and activity diagrams, and test cases with formal stakeholder sign-off.
  • Coordinated closely with clients, architects, vendors, and cross-functional teams, balancing delivery constraints, quality, and timelines.
  • Owned project execution and delivery governance, tracking progress, managing risks, resolving dependencies, and escalating issues to ensure on-time delivery.
  • Developed and executed project plans covering gap analysis, process definition, pilot rollout, production launch, and steady-state operations.
  • Provided ITIL-aligned service management leadership, spanning service transition, incident management, steady-state operations, and continuous service improvement.
  • Owned production operations for billing and mediation platforms (including Arbor, Geneva, mediation and prepaid systems), handling monitoring, P1 incidents, RCA, and operational reporting.
  • Defined and tracked KPIs and operational metrics with customers to measure service effectiveness and drive performance improvements.
  • Led project transitions and go-lives, including KT, shadowing, reverse-shadowing, cross-training, and vendor due diligence to ensure stable production handover.

Package Solution Consultant (Vodafone Spain)

IBM India(P)Ltd
11.2011 - 05.2014
  • Served as L3 production support and design contributor, collaborating with architects and stakeholders on solutions impacting multiple enterprise systems.
  • Owned the end-to-end change lifecycle, from feasibility analysis through design, testing, CAB approval, and controlled production deployment.
  • Authored and maintained technical and business documentation, including HLDs, LLDs, sequence diagrams, activity diagrams, and test cases.
  • Coordinated change build, test, and release activities in alignment with governance requirements and release calendars.
  • Presented and managed RFCs through Change Advisory Board (CAB) reviews, driving scheduling and risk mitigation decisions.
  • Executed post-implementation reviews, validating change outcomes, system stability, and rollback effectiveness.
  • Delivered business-critical configuration and pricing changes (pricing plans, RC/NRC, contracts, discounts, credits, jurisdictions) with zero regression in production.
  • Independently handled projects end-to-end, from feasibility and design through production rollout and validation.

Package Solution Consultant (Maxis Malaysia)

IBM India(P)Ltd
11.2011 - 05.2014
  • Led requirements gathering and analysis, translating business needs into scalable technical solutions.
  • Designed end-to-end system architecture, including business processes and order-flow scenarios, in collaboration with architects and stakeholders.
  • Partnered closely with clients and solution architects to drive design decisions and align on technical direction.
  • Owned release and deployment strategy, defining release policies, rollout plans, and rollback mechanisms aligned with governance and compliance standards.
  • Coordinated build, test, and release teams to ensure predictable, low-risk, and on-time deployments.
  • Implemented secure and traceable release management, ensuring version control, authorization, verification, and rollback readiness.
  • Ensured only approved, tested, and authorized versions were deployed across environments.
  • Provided technical leadership and delivery oversight, tracking milestones, managing risks, and resolving blockers proactively.
  • Produced and shared project status and progress reports, ensuring transparency and on-time execution.

Senior Technical Associate (Optus Australia Telecom)

Tech Mahindra
09.2010 - 10.2011
  • Partnered with business stakeholders to gather requirements, perform impact analysis, and design production-ready solutions for revenue-critical billing systems.
  • Authored impact analysis and solution design documentation, translating business needs into scalable and maintainable technical implementations.
  • Designed and implemented custom data flows and rating logic, including CDF customization, usage-type mapping, hierarchy-based rating, and feed integrations.
  • Developed and optimized database triggers, procedures, and ingestion pipelines to support accurate usage processing and billing.
  • Led cross-organization design workshops with internal teams and external partners to align technical designs with business and operational requirements.
  • Implemented hierarchical rating solutions for SMB customers, ensuring correctness across complex account and usage structures.
  • Executed controlled production deployments, ensuring stability, data accuracy, and zero revenue leakage.
  • Presented solution designs, feasibility analysis, and delivery plans to business and BAU teams, enabling informed decision-making and smooth operational handover.

Education

Master of Science - Information Technology

Allahabad Agricultural University
Allahabad, Uttar Pradesh
01-2005

Skills

Azure Cloud Solutions

Amazon Web Services

Kubernetes management

Infrastructure as code

System monitoring [Appdynamics, Splunk, Datadog]

Incident management [JIRA, Remedy, Service Now]

Scripting [Python, Shell]

ITIL framework

Log analysis

Performance tuning

Continuous integration

Continuous deployment

Timeline

Lead SRE

Roche Information Solutions
12.2021 - Current

Technical Consultant (HSBC Bank)

Wipro Technology
03.2019 - 11.2021

Deputy Manager - Operations (Vodafone UK)

VOIS (Vodafone Intelligent Solutions)
05.2014 - 03.2019

Deputy Manager - Operations (Vodafone Spain)

VOIS (Vodafone Intelligent Solutions)
05.2014 - 03.2019

Package Solution Consultant (Vodafone Spain)

IBM India(P)Ltd
11.2011 - 05.2014

Package Solution Consultant (Maxis Malaysia)

IBM India(P)Ltd
11.2011 - 05.2014

Senior Technical Associate (Optus Australia Telecom)

Tech Mahindra
09.2010 - 10.2011

Master of Science - Information Technology

Allahabad Agricultural University
AMIT VARSHNEYLead SRE