Total 9+ years of experience. 3+ years as reliability engineer in Payments space. Implemented and championed site reliability engineering practices to ensure the stability, performance, and scalability of software applications and systems. Developed and maintained monitoring and observability solutions using tools like Grafana, Prometheus, and Splunk, providing real-time insights into system performance and health. Designed and implemented automation solutions to streamline operations, reduce manual intervention, and enhance system reliability, including automating deployment processes, monitoring, and alerting systems. Led incident management efforts, including identifying, diagnosing, and resolving system issues promptly to minimize downtime and impact on users. Analyzed system performance data to identify bottlenecks and areas for improvement, implementing solutions to enhance application performance and efficiency. Collaborated with development and operations teams to plan for future capacity needs and ensure systems can scale to meet growing demands. Worked closely with cross-functional teams, including developers, operations, and product managers, to ensure alignment on reliability goals and initiatives. Created and maintained comprehensive documentation of reliability processes, tools, and best practices, sharing knowledge and mentoring team members on reliability engineering principles. Proactively identified opportunities for process improvements and implemented changes to enhance system reliability and operational efficiency. Conducted risk assessments to identify potential reliability issues and developed strategies to mitigate risks and ensure system resilience. Ensured that reliability efforts aligned with customer needs and expectations, delivering a seamless and reliable user experience. Also addressed user reported tickets. Participated in PTX process discussion in different level HLDD, NFRs where logging reviewed also Assessed application performance and resiliency. Verified all the documentation required to application move to operate state Created a Anomaly detection dashboard as part of eye on glass monitoring across payments space where anomalies where highlighted before it could cause impact Seasoned Reliability Engineer with background in identifying and implementing reliability practices and maintenance optimization projects. Notable strength lies in applying root-cause analysis to improve equipment lifespan and decrease downtime. Demonstrated success in executing fault detection programs, improving overall plant performance. Skilled at collaborating with cross-functional teams to achieve project goals, ensuring reliability and maintainability of new installations. Results-driven Reliability Engineer known for high productivity and efficiency in task completion. Specialize in failure mode effect analysis, predictive maintenance strategies, and risk management processes. Excel in problem-solving, teamwork, and communication, ensuring successful project outcomes through effective collaboration and critical thinking.