Summary
Overview
Work History
Education
Skills
Accomplishments
ADDITIONAL INFORMATION
Timeline
Generic

CHARITHA REDDY

Oakville

Summary

Around 3+ years of experience in supporting AWS-based production systems in high-availability microservice environments. Skilled in Kubernetes (EKS), Docker, Linux systems engineering, observability, and Python automation. Experienced in 24/7 production operations, incident management , Disaster Recovery (DR), and performance troubleshooting using Dynatrace and AWS CloudWatch. MBA candidate with strong business acumen focused on operational excellence, reliability, and risk mitigation. Expertise in scripting, database management and application performance tuning

Overview

5
5
years of professional experience

Work History

Production Support Engineer

MANULIFE BANK OF CANADA
07.2024 - Current
  • Involved in 24/7 production support for AWS-based applications and ensuring SLA/SLO compliance are met.
  • Monitoring, maintenance, review, and bug fixing of code for python based applications using Python APIs, Shell scripting, SQL, GIT ,Dynatrace and AWS cloud watch.
  • Monitored system performance, identifying areas for improvement and implementing solutions.
  • Experience in leading P1/P2 incident triage calls and representing the application to make sure impact is smoothly mitigated.
  • Executed runbook , coordinated escalations and did work on service restoration within SLA timelines.
  • Documented processes and procedures to enhance knowledge sharing among team members.
  • Conducted root cause analysis on recurring incidents, driving long-term resolutions.
  • Implemented effective incident management strategies that minimized disruption to business operations during system outages or failures.
  • Follow Agile Scrum Methodology with 3-week sprints and Participated in Sprint Grooming & Project Implementation (PI) planning sessions.
  • Strong experience in using GIT.
  • Monitored AWS console application hosting services by optimizing CPU/memory allocation to improve performance
  • Investigated and proposed corrective actions for quality issues.
  • Automate operational health checks and monitoring workflows with Python, reducing manual effort by 30%.
  • Support CI/CD releases, deployment validation, rollback procedures, and release cutovers.
  • Execute Disaster Recovery (DR) testing, failover validation, and update recovery procedures to ensure operational readiness.
  • Monitor and manage SSL/TLS certificates proactively to prevent expires and production impact.
  • Excellent communication and interpersonal skills and have ability to handle multiple tasks; can take initiative to handle responsibilities independently as well as a proactive member of a team

Site Reliability Engineer

INFRAMART REALTECH INDIA PVT LTD
09.2021 - 04.2024
  • Managed AWS cloud infrastructure: EC2 provisioning, IAM configuration, VPC networking.
  • Streamlined incident response processes, reducing mean time to recovery through effective root cause analysis.
  • Led on-call rotations, providing critical support during outages and ensuring rapid restoration of services.
  • Deployed and maintained Docker containers and Kubernetes workloads across development and production environments.
  • Implemented Dynatrace APM monitoring for performance analysis and proactive incident detection.
  • Conducted root-cause analyses after major incidents to identify areas for process improvement or technical enhancement opportunities.
  • Implemented cost-saving measures by optimizing resource utilization across cloud-based infrastructure environments.
  • Optimized database performance, analyzing and restructuring data storage solutions.
  • Managed Linux server administration: patching, disk management, security hardening, access control.
  • Developed Python automation scripts to streamline operational workflows, reducing manual effort by 30%.
  • Assisted in CI/CD deployment troubleshooting and Git-based release workflows.
  • Monitored infrastructure health using AWS CloudWatch metrics, logs, and alerting mechanisms.
  • Conducted DR planning and testing to validate failover readiness and system resilience.

Education

Master of Business Administration (MBA) -

01-2025

Bachelor of Commerce (B.Com) - undefined

01-2019

Skills

  • AWS cloud and Infrastructure
  • Incident management
  • Disaster recovery
  • Reliability & Production Engineering
  • SLA/SLO Monitoring
  • Root Cause Analysis (RCA)
  • Release & Deployment Validation
  • Docker
  • Kubernetes (EKS)
  • Monitoring & Observability
  • Dynatrace (APM, Infrastructure, Logs, Alerting)
  • Proactive Alerting & Dashboarding
  • Operating Systems
  • Linux (Ubuntu, RHEL, Amazon Linux)
  • Bash/Shell Scripting
  • Automation & DevOps
  • Python (Automation Scripts, Health Checks, Log Parsing)
  • Git
  • CI/CD Support & Troubleshooting

Accomplishments

  • Achieved 99.95% production uptime through proactive reliability engineering.
  • Reduced MTTR by implementing structured RCA documentation and incident response improvements.
  • Improved deployment consistency by 40% through Docker standardization.
  • Reduced operational workload by 30% through Python automation initiatives.
  • Increased incident detection speed by 35% through optimized monitoring and alerting strategies.
  • Executed DR testing and implemented recovery procedures, improving operational resilience.

ADDITIONAL INFORMATION

  • Experience supporting financial services production systems.
  • Strong troubleshooting and analytical skills in high-pressure environments.
  • Eligible to work in Canada.
  • Available for on-call and rotational shift support.

Timeline

Production Support Engineer

MANULIFE BANK OF CANADA
07.2024 - Current

Site Reliability Engineer

INFRAMART REALTECH INDIA PVT LTD
09.2021 - 04.2024

Bachelor of Commerce (B.Com) - undefined

Master of Business Administration (MBA) -

CHARITHA REDDY