Results-driven Senior Manager with over 18 years of expertise in management and technical domains. Demonstrated success in leading SRE, Service Delivery, Infrastructure Management, and Agile teams for industry leaders such as Apple Inc., Workday, and TCS. Recognized for transforming SRE teams into subject matter experts through effective coaching and mentoring, while excelling at operational strategy and financial management. Proven ability to enhance productivity and efficiency, combined with strong communication and problem-solving skills to meet organizational objectives.
· Successfully delivered on-prem and public cloud services for Workday by leading cross-functional teams, implementing site reliability measures, conducting meticulous testing, managing infrastructure, and automating critical processes. Achieved exceptional 25% reduction in infrastructure costs through strategic automation initiatives.
· Experience in program management of Service delivery for multiple projects parallelly adhering to timelines as committed to workday customers.
· Led cross-functional teams of SRE, Incident/Problem Management, and Testing for Apple’s POS Application, delivering exceptional performance over 10+ years in leadership roles. Managed team growth, process optimization, and resource allocation, resulting in enhanced application stability, reduced incidents by 40%, and improved customer satisfaction.
· Managed a highly efficient global SRE team of 44+ members (includes Principal Engineers, Associate Managers and Leads) ensuring seamless support delivery and achieving a 30% reduction in incident response time and saving an average of 100+ hours per week for the team.
· Drove key performance metrics, including system availability, MTTR/MTTA/MTTD, and customer satisfaction, exceeding targets. Developed monitoring strategies, resulting in 40% reduction in critical incidents.
· Spearheaded the Project management tool implementation of JSM (Migration from Jira) for change management for Workday led to 30% efficiency increase for weekly changes.
· Led the efforts for pilot implementation of Jira Align in Workday Product and Technology group. Ensured before time rollout for 28 teams.
· Implemented data-driven presentations, comprehensive dashboards, and strategic KPIsto foster accountability, stabilize the platform, and instill confidence in service level agreements; continuously monitored and reported on SLAs, SLOs, error budget, and critical KPIs to maintain operational excellence.
· Led the successful launch of Apple's first retail store in Brazil, overseeing all aspects of the New Country Opening; facilitated a smooth opening event, resulting in a highly successful opening day.
· Streamlined maintenance, upgrades, and monitoring of all Workday customer environments and reduced system downtime by 25% while ensuring high levels of security and availability.
· Directed and empowered geographically dispersed teams in USA (Atlanta, Austin, California), Ireland, India, London, and New Zealand; implemented Follow the Sun Model, agile methodologies, resulting in 35% faster project delivery and 15% reduction in time-to-market.
· Orchestrated the seamless maintenance, upgrade, and monitoring of all Workday customer environments, ensuring 99.99% uptime and enhancing user experience for a client base of 7000+ and 35000+ plus tenant’s nodes.
· Led the efforts to revamp process for early build deploy in workday internal environments (Silver version) leading to cleaner deployment during weekly maintenance window. This has helped reducing the toil for engineers during patching.
· Spearheaded cross-functional collaboration between development, operations, and support teams, resulting in increase in customer satisfaction ratings.
· Streamlined upgrade processes by implementing a comprehensive change managementframework, resulting in a 40% reduction in downtime during system upgrades.
· Led the development and deployment of automation tools and processes, reducing SRE toil by 40% and saving an average of 30 hours per week for Workday.
· Led cross-functional teams to establish a customer-centricapproach, transforming the organization's mindset towards product and service management; fostered a culture of innovation and problem-management.
· Implemented robust automation frameworks, reducing manual testing efforts by 50% and enabling faster time to market for critical updates.
· Managed employee performance management, setting clear goals, objectives, and delegating functional leader responsibilities; inspired and motivated teams to deliver exceptional performance against objectives, resulting in 15% improvement in overall team productivity and exceeding quarterly targets.
· Led Retrospective/Blameless Postmortems across SRE group with blameless culture and making sure actions are taken and implemented timely and standardizing the process.
· Played a pivotal role in identifying and implementing cost-saving measures, resulting in significant reduction in operational expenses and increased profitability.
· Led and evangelized global service teams, overseeing Service Delivery Managers, Service Support Managers, Enterprise Architects, Program Managers & Consultants; optimized departmental budget of $14-23 million, resulting in $2.5 million cost savings through strategic resource allocation and process optimization.
· Implemented a streamlined reporting process, consolidating data from sources and presenting key metrics in an easy-to-understand format, leading to a 30% reduction in time spent on report preparation.
· Implemented Agile and Scrum methodologies, leading the transformation from waterfall to agile/sprint/Kanban methodologies. Increased team efficiency by 40% and accelerated project delivery by 20%.
Technical Experience
· Developed and executed long-term plans to maintain up-to-date infrastructurecapacity, ensuring seamless operations and scalability for future growth. I. e (Bare Metal, Hypervisor Virtual machines, Open Stack, Cluster Management Etc.)
· Ensured application adherence to SOX standards through monthly audit report validations, mitigating risks and ensuring compliance with industry benchmarks.
· Planned and executed infrastructure upgrades, including OS (Linux, CentOS), security patches, firmware, SQL, and Oracle, resulting in enhanced system performance and reduced security vulnerabilities.
· Orchestrated network topologies and configuration of load balancers, ACL configurations, and secure (NATted) tunnels for external traffic, optimizing network performance and enhancing data security.
· Led capacity forecast initiatives, resulting in a 25% reduction in infrastructurecosts and improved resource allocation for new VM deployments and datacenter migrations.
· Demonstrated expertise in Linux administration, software configuration, and system maintenance, ensuring optimal system functionality and uptime.
· Managed successful migration projects of legacy Java applications to Kafka, Zookeeper,and the AWS platform, resulting in enhanced scalability and performance.
· Incorporated Python, SQL, Bash scripting, and Java standalone programs to drive data analysis, automation, and software development, optimizing business operations and driving innovation.
· Led the automation of deployment processes using Ansible, reducing deployment time by 40% and minimizing human error in the configuration management process.
· Implemented observability strategies using monitoring tools such as Splunk, Prometheus, and Telemetry, enhancing application performance and stability.
· Demonstrated proficiency in leveraging CI/CD tools and automation frameworks to drive continuous improvement and optimize software delivery pipelines.
· Led the design and implementation of deployment strategies using Jenkins Pipelinesand AWS Code Pipeline, streamlining development workflows, and reducing time to market by 25%.
· Administered Splunk indexers, configurations, and clusters, optimizing log management and analysis processes, leading to reduction in troubleshooting time.
· Managed and maintained MySQL databases, implementing data replicationacross nodes, resulting in improved data availability and disaster recovery.