Summary
Overview
Work History
Education
Skills
Certification
Timeline
Awards & Recognition
Generic
Kalavathi Munasa

Kalavathi Munasa

Redmond,WA

Summary

Experienced technology professional with 13+ years in site reliability engineering, cloud automation, software development, and applied AI/ML. Currently Lead SRE at Optum, driving automation, scalability, and ML-powered solutions. Pursuing Postgraduate Program in AI/ML at UT Austin (expected Jan 2026), focusing on cloud-scale systems, AI/ML, and production-grade automation pipelines.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Lead Site Reliability Engineer

Optum Global Solutions
03.2025 - 08.2025
  • Led SRE team ensuring system scalability and reliability - Drove automation initiatives and CI/CD optimization - Mentored engineers and implemented reliability best practices.
  • Led comprehensive application migration to GCP cloud to enhance operational efficiency and maintain system integrity.
  • Implemented robust monitoring solutions to enhance system reliability and performance.
  • Collaborated with cross-functional teams to troubleshoot complex incidents and optimize incident response times.
  • Mentored junior engineers on best practices for site reliability, fostering a culture of continuous learning.
  • Increased system scalability with deployment of cloud-based infrastructures tailored to project needs.
  • Note: Planned 1-year career break from Sep 2025

Lead / Senior Site Reliability Engineer

Optum Global Solutions
11.2022 - 02.2025
  • Built automation tools with Python, FastAPI, Kubernetes on Azure/GCP - Developed ML models for job runtime prediction & incident classification.
  • Build a different machine learning models(XGboost, logistic regression, random forest, decision trees, linear regression) to predict the different jobs runtime and load before the job scheduled to run.
  • Another project is to auto-classify the ServiceNow incidents and assign automatically to the respective domain teams.
  • Analyse the production data through the python libraries(pandas, numpy, Matplotlib, Scikit-learn, Seaborn ) and build a custom visualization charts to show the trends of the data and publish to the customers for daily validation.
  • Collaboration with other team members (developer, customers, Testers) to ensure the system is responding properly to non -functional requirements such as performance, security, and availability.

Site Reliability Engineer

Optum Global Solutions
11.2018 - 10.2022

Created monitoring dashboards & proactive alerts - Automated incident response and optimized IT operations - Improved uptime via performance tuning & system optimization

o Optimizing On-Call Rotations and Processes and Involved in handling complicated, cross platform issues in

production.

o Building automation and software to any tasks or parts of the system that would benefit from performing manual

routine.

o Creating customised dashboards for the application teams to easily identify any breakages in the application flow.

o Creating and monitoring the alerts to identify the issues upfront before end users reported the issue.

o Monitor application performance and take necessary steps to improve the overall application performance and

stability and come up with an alert/dashboard.

Senior Java Developer / Java Developer

TCS
03.2012 - 10.2018

Supporting Java and Restful Web services in Technical and functional aspects by following agile or devops

methodology and reporting directly to client.

o Interaction with Clients to understand the requirements and converting those in to functional requirements, design

and in development.

o Design applications based on identified architecture and support implementation of design by resolving complex

technical issues during development and deployment.

o Unit test software using latest test strategies and frameworks to ensure optimization reliability and proper

functionality

o Perform performance optimizations on Spring Boot, Hibernate.

o Coordinated with geographically distributed teams for a successful development, testing and deployment of a

project.

o Participated in code and design reviews and used sonar tool for code quality improvements and fix security

vulnerabilities reported by checkmarx.

o Design and development of Restful web services using Spring mvc, spring-boot micro services.

o Involved in unit and integration testing using tools like Cucumber

Education

Postgraduate Program in AI/ML - Artificial Intelligence And Machine Learning

University of Texas At Austin
USA
01-2026

B.Tech - Computer Science & Engineering

JNTU Kakinada
India
01.2011

Skills

  • Python, Java, Groovy, Spring Boot, ReactJS, SQL, PL/SQL, FastAPI, Jupyter Notebook
  • DevOps & Cloud: Kubernetes, Docker, Jenkins, OpenShift, Azure, GCP, CI/CD pipelines
  • Monitoring & Automation: Splunk, Grafana, DataDog, Dynatrace, Icinga, ServiceNow automation
  • AI/ML & Data Science: Machine Learning (Supervised & Unsupervised Learning) , Deep Learning & Neural Networks, NLP, Data Analysis & Visualization, Agentic AI
  • Databases & Messaging: MongoDB, MySQL, PL/SQL, Kafka, RabbitMQ
  • Other Skills: Microservices architecture, REST APIs, Application Performance Monitoring

Certification

  • Google × Kaggle Agentic AI Program – Capstone Completed
  • Google Cloud Digital Leader – 2024
  • Oracle Certified Professional, Java SE 5 Programmer
  • Oracle Certified PL/SQL Developer
  • Oracle Certified Expert Web Services Developer
  • MongoDB Certified Java Developer
  • Oracle Linux Fundamentals Certified Implementation Specialist

Timeline

Lead Site Reliability Engineer

Optum Global Solutions
03.2025 - 08.2025

Lead / Senior Site Reliability Engineer

Optum Global Solutions
11.2022 - 02.2025

Site Reliability Engineer

Optum Global Solutions
11.2018 - 10.2022

Senior Java Developer / Java Developer

TCS
03.2012 - 10.2018

B.Tech - Computer Science & Engineering

JNTU Kakinada

Postgraduate Program in AI/ML - Artificial Intelligence And Machine Learning

University of Texas At Austin

Awards & Recognition

‘Super-Hero’ Award – Automated customized reporting tool, 

‘Live Award’ – Infra-level multi-tenancy improvements (FY 2022),

‘Star Team’ Award – Team excellence recognition, 

UK Client E-Card – Timely delivery of BT NextGen TV project