Summary
Overview
Work History
Education
Skills
Websites
Accomplishments
Certification
Timeline
Core Technical Skills
Personal Information
Generic

ANURAG KANUMURUI

Redmond,WA

Summary

Senior Application Support Engineer with 15+ years of experience providing expert-level production support for mission-critical applications, data pipelines, and infrastructure across financial services, healthcare, and enterprise environments. Proven expertise in incident management, real-time debugging, automation, and cross-functional collaboration with development, infrastructure, and business teams. Skilled in monitoring tools (Prometheus, Grafana, Logic Monitor), job schedulers (Autosys, Control-M), and ITIL processes. Strong technical writer with ability to create comprehensive runbooks, knowledge bases, and support documentation. Recognized for achieving 99.9% uptime, reducing MTTR by 40%, and driving continuous improvement through SRE principles and data-driven insights.

Professional support engineer with thorough understanding of application systems and client needs. Proven ability to address and resolve technical challenges, ensuring minimal downtime and optimal performance. Known for strong collaboration and adaptability, consistently meeting team goals and driving successful outcomes.

Overview

19
19
years of professional experience
1
1
Certification

Work History

Senior Application Support Engineer / Azure DevOps/SRE Architect

Quadrant Technologies
09.2024 - Current
  • Production Support & Incident Management:
  • Monitor application and system performance for HIPAA-compliant healthcare data platform processing 10M+ patient records daily, ensuring timely report generation and swift resolution of disruptions
  • Serve as incident manager for P1/P2 incidents, coordinating cross-functional teams and providing clear status updates to senior business leadership during critical outages
  • Reduced Mean Time to Resolution (MTTR) by 40% through proactive monitoring dashboards, automated alerting, and systematic root cause analysis
  • Perform advanced troubleshooting and real-time debugging for production applications built on Azure ADF, Synapse, Databricks, and microservices architecture
  • Resolved 95% of critical incidents within SLA timeframes through expert-level technical support and rapid issue escalation
  • Automation & Monitoring:
  • Developed and maintained automation tools using Python and PowerShell for reporting, log collection, and system health monitoring to ensure seamless operations and proactive issue detection
  • Built comprehensive monitoring dashboards using Prometheus, Grafana, and Azure Monitor to track system stability, performance metrics, and data pipeline health
  • Implemented automated alerting for 50+ microservices running on AKS clusters, reducing manual monitoring effort by 70%
  • Created automated runbooks and self-healing scripts that reduced recurring incidents by 35%
  • Problem Management & Documentation:
  • Lead problem management initiatives, performing advanced root cause analysis and driving permanent resolutions for recurring issues affecting data pipelines and infrastructure
  • Maintain and continuously improve support documentation, runbooks, and knowledge bases ensuring consistency and audit readiness for regulatory compliance
  • Created comprehensive fix-logs, training materials, and technical documentation enabling junior engineers to resolve 80% of incidents independently
  • Champion knowledge management through facilitation of team learning sessions and collaborative problem-solving workshops
  • SRE Principles & Continuous Improvement:
  • Apply Site Reliability Engineering (SRE) principles including error budgeting, SLIs/SLOs, and blameless postmortems to improve system resilience
  • Lead service quality improvements through data-driven insights, stability analysis, and proactive remediation strategies
  • Conduct system stability and performance analysis, recommending architectural improvements that reduced infrastructure costs by 18%
  • Collaborate with development, infrastructure, and SRE teams to automate operational tasks, reduce manual toil, and enhance observability
  • Stakeholder Management & Compliance:
  • Interface effectively with business stakeholders and senior leadership, providing executive-level incident updates and post-incident reviews
  • Ensure compliance with HIPAA, SOC 2, and audit requirements through accurate documentation and incident tracking
  • Coordinate 24x6 support coverage across global teams, mentor junior engineers, and ensure smooth handoffs across shifts and geographies
  • Manage vendor relationships to ensure timely resolution of third-party issues and alignment with SLAs
  • Client: Cinq Care (Healthcare)
  • Led troubleshooting efforts for application issues, ensuring minimal disruption to business operations.
  • Mentored junior engineers, enhancing team skill set and knowledge retention.
  • Improved application performance by analyzing system logs and identifying bottlenecks.
  • Collaborated with cross-functional teams to implement software updates and patches effectively.
  • Developed comprehensive documentation for application workflows and support procedures, improving team efficiency.
  • Implemented monitoring tools to proactively identify potential outages and reduce response times.
  • Conducted root cause analysis on recurring issues, driving long-term solutions across systems.
  • Spearheaded training sessions on new applications, increasing user adoption and satisfaction rates.
  • Improved application performance by identifying and resolving bottlenecks, optimizing code, and implementing best practices.
  • Established strong working relationships with vendors enabling swift collaboration during critical situations that required external assistance.

Application Support Engineer / Azure Data Architect

Quadrant Technologies
01.2024 - 09.2024
  • Application Support & Monitoring:
  • Monitored supply chain analytics platform processing 2M+ transactions weekly, ensuring data pipeline integrity and availability
  • Developed automated health check scripts using Python and PowerShell reducing incident detection time from hours to minutes
  • Created stability reports and executive dashboards using Power BI, providing data-driven insights for service quality improvements
  • Led cross-functional collaboration with development and infrastructure teams to resolve production issues and optimize performance
  • Problem Resolution & Documentation:
  • Performed root cause analysis for recurring batch job failures, implementing permanent fixes that improved job success rate from 87% to 99%
  • Maintained comprehensive runbooks and knowledge base articles for 100+ scheduled jobs using Autosys and Azure Data Factory
  • Reduced reporting time from 3 days to 2 hours through automation and process optimization
  • Trained business users and support team members on new application features and troubleshooting procedures
  • Infrastructure & Cost Optimization:
  • Optimized cloud infrastructure costs by 22% through proactive monitoring, right-sizing recommendations, and automated scaling policies
  • Managed change control process for 50+ production deployments with zero rollbacks
  • Coordinated with vendor teams to resolve third-party integration issues affecting data pipeline operations
  • Client: Unilever
  • Provided technical support for application issues, ensuring timely resolution to enhance user experience.
  • Collaborated with development teams to identify and address software bugs, improving overall application functionality.
  • Monitored system performance using diagnostic tools, proactively resolving potential issues before impacting users.
  • Trained junior staff on best practices for application support, fostering a culture of continuous improvement and learning.

Senior Application Support Engineer / Platform Architect

Unatrac
03.2022 - 12.2023
  • Production Support & Real-time Monitoring:
  • Provided expert-level support for real-time telemetry pipelines processing 10TB+ data daily across manufacturing operations
  • Monitored job schedulers (Autosys, Control-M) managing 500+ daily batch jobs with 99.5% success rate
  • Implemented comprehensive monitoring using Azure Monitor, Application Insights, Prometheus, and Grafana for proactive issue detection
  • Debugged complex data pipeline failures involving Azure Data Factory, Databricks, Kafka, and downstream reporting systems
  • Incident & Problem Management:
  • Served as incident manager for critical production outages, coordinating response across development, infrastructure, and business teams
  • Reduced incident frequency by 45% through systematic problem management and implementation of permanent fixes
  • Created automated incident response workflows integrated with ServiceNow and PagerDuty
  • Conducted blameless postmortems and documented lessons learned to prevent recurring issues
  • Automation & Process Improvement:
  • Automated CI/CD pipelines using Azure DevOps, Jenkins, and GitHub Actions, reducing deployment time by 60%
  • Built automated log analysis tools using Python and Bash scripts for faster troubleshooting and pattern recognition
  • Enhanced observability through custom monitoring dashboards and alerting rules tailored to business-critical metrics
  • Led support team meetings to improve reliability, reduce incident frequency, and enhance cross-team collaboration
  • Data Governance & Compliance:
  • Implemented data governance frameworks using Unity Catalog and Azure Purview ensuring regulatory compliance
  • Maintained audit trails and compliance documentation for manufacturing and supply chain applications
  • Interfaced with upstream and downstream data pipelines to ensure data integrity and availability
  • Client: Caterpillar Inc.

Application Support Lead / Data Architect

Unatrac
05.2017 - 03.2022
  • Production Support & Performance Optimization:
  • Managed production support for 150+ BI reports and data applications with 99.8% uptime
  • Monitored and optimized SQL Server, Oracle, and Snowflake databases, reducing query runtime by 75%
  • Performed real-time debugging and troubleshooting for API services, implementing Redis caching that improved response times by 40%
  • Built streaming data architecture with Kafka and Azure Service Bus ensuring real-time data availability
  • Documentation & Knowledge Management:
  • Created comprehensive technical documentation, runbooks, and troubleshooting guides for 200+ applications
  • Developed training materials and conducted knowledge transfer sessions for offshore support teams
  • Maintained knowledge base in Confluence covering common issues, resolutions, and best practices
  • Established documentation standards ensuring consistency and audit readiness
  • Infrastructure & Security:
  • Strengthened security posture through implementation of VNets, WAF, and Private Endpoints
  • Managed change control process for infrastructure upgrades and application deployments
  • Coordinated with infrastructure teams to ensure proper capacity planning and resource allocation
  • Client: Caterpillar Inc.
  • Led the application support team to resolve critical user issues efficiently.
  • Developed and implemented process improvements for application troubleshooting and resolution.
  • Mentored junior staff to enhance technical skills and customer service capabilities.

Production Support Lead / Data Manager

Emirates NBD Bank
01.2013 - 05.2017
  • Mission-Critical Application Support:
  • Maintained 99.95% uptime for mission-critical banking platforms processing $2B+ daily transactions
  • Provided 24x7 on-call support managing incident response for trading, risk management, and regulatory reporting systems
  • Resolved 95% of P1 incidents within 2-hour SLA using PagerDuty and ServiceNow for ticket management and escalation
  • Performed real-time debugging for SQL Server and Oracle databases supporting derivative trades and asset management
  • Problem Management & Automation:
  • Reduced batch processing windows by 30% through optimization of SSIS packages and stored procedures
  • Automated risk reporting workflows using Power BI and Azure Analysis Services
  • Conducted root cause analysis for recurring production issues, implementing permanent fixes that reduced repeat incidents by 50%
  • Created automated monitoring scripts for critical batch jobs and data pipelines
  • Regulatory Compliance & Reporting:
  • Ensured compliance with banking regulatory requirements through accurate incident tracking and audit documentation
  • Supported regulatory reporting applications with expertise in financial asset classes and derivative trades
  • Coordinated with compliance teams during audit reviews, providing detailed incident logs and resolution documentation
  • Managed vendor relationships for third-party trading and risk management systems
  • Team Leadership & Collaboration:
  • Led team of 4 support engineers, providing mentorship and technical guidance
  • Coordinated 24x7 support coverage across multiple shifts and geographies
  • Facilitated effective communication between technical teams and business stakeholders during critical incidents
  • Conducted regular team meetings to share knowledge, review incidents, and drive continuous improvement

Project Lead

Nordea Bank (L&T Infotech)
01.2011 - 01.2013
  • Provided production support for Asset/Liability Management platforms serving Group ALM and LRM teams
  • Built ETL pipelines with AWS Glue and dbt, reducing data processing outages by 20%
  • Debugged complex SQL queries and stored procedures for financial risk modeling applications
  • Created support documentation and runbooks for offshore support teams
  • Led cross-functional teams to streamline project execution and enhance delivery timelines.
  • Developed project plans and roadmaps aligned with organizational goals and stakeholder expectations.
  • Implemented risk management strategies to mitigate project risks and ensure compliance with regulatory standards.
  • Facilitated communication between stakeholders, fostering collaboration and addressing concerns promptly.
  • Spearheaded process improvement initiatives that increased operational efficiency and reduced costs.
  • Analyzed project performance data to identify areas for improvement and optimize resource allocation.
  • Mentored junior team members, encouraging professional growth and knowledge sharing across projects.

Project Lead

Barclays Capital/Lehman Brothers (L&T Infotech)
01.2009 - 01.2011
  • Managed production support for global trading platforms with 24x7 coverage
  • Coordinated 50+ production releases with zero rollbacks through rigorous change management
  • Served as incident manager for critical outages affecting trading operations
  • Performed real-time troubleshooting and debugging for Java-based trading applications

Software Engineer/Lead

L&T Infotech & Accenture
01.2007 - 01.2009
  • Provided application support for enterprise applications across multiple clients
  • Debugged production issues and coordinated with development teams for permanent fixes
  • Created technical documentation and user guides for support teams
  • Designed and implemented scalable software solutions using Agile methodologies.
  • Collaborated with cross-functional teams to define technical requirements and project scope.
  • Conducted code reviews to ensure adherence to best practices and enhance code quality.
  • Mentored junior engineers, fostering skill development and facilitating knowledge transfer.
  • Developed automated testing frameworks, increasing reliability of software releases.
  • Implemented CI/CD pipelines, enhancing deployment efficiency and reducing time-to-market for new features.
  • Analyzed proposed technical solutions based on customer requirements.
  • Developed scalable and maintainable code, ensuring long-term stability of the software.
  • Collaborated with management, internal and development partners regarding software application design status and project progress.
  • Enhanced user experience with intuitive interface design and responsive web applications.
  • Integrated new technologies into existing systems, increasing capabilities and improving overall performance.
  • Implemented effective debugging strategies, resulting in fewer software defects and increased reliability.

Education

Bachelor of Technology (B.Tech) -

JNTU
India
05.2003

Some College (No Degree) - Machine Learning in Business

MIT Sloan School
MIT School
04.

Skills

  • Prometheus
  • Grafana
  • Logic Monitor
  • Azure Monitor
  • Application Insights
  • CloudWatch
  • Autosys
  • Control-M
  • Azure Data Factory
  • Airflow
  • SQL Server
  • Oracle
  • PostgreSQL
  • MySQL
  • MongoDB
  • Snowflake
  • Redis
  • Python
  • PowerShell
  • Bash
  • Perl
  • SQL
  • ServiceNow
  • Jira
  • PagerDuty
  • Confluence
  • Azure
  • AWS
  • Databricks
  • Apache Kafka
  • SSIS
  • Incident Management
  • Problem Management
  • Change Management
  • Service Quality
  • Teams
  • Slack
  • Docker
  • Kubernetes
  • Terraform
  • HIPAA
  • SOC 2
  • API integration
  • Capacity planning
  • Scripting languages
  • Load balancing
  • Data migration
  • Incident management
  • Performance tuning
  • System administration

Accomplishments

  • 99.95% Uptime: Maintained exceptional availability for mission-critical banking and healthcare platforms.
  • 40% MTTR Reduction: Through proactive monitoring, automation, and expert troubleshooting.
  • 95% P1 Resolution Rate: Within SLA timeframes across multiple production environments.
  • 45% Incident Reduction: Through systematic problem management and permanent fix implementation.
  • 60% Faster Deployments: Via CI/CD automation and streamlined change management.
  • 70% Reduction in Manual Monitoring: Through automated health checks and intelligent alerting.
  • 24x7 On-Call Experience: Across financial services, healthcare, and manufacturing sectors.
  • Cross-Functional Leadership: Successfully coordinated teams across development, infrastructure, QA, and business operations.

Certification

  • Microsoft Certified: Azure AI Engineer Associate
  • Microsoft Certified: Fabric Analytics Engineer Associate
  • DAMA CDMP Practitioner

Timeline

Senior Application Support Engineer / Azure DevOps/SRE Architect

Quadrant Technologies
09.2024 - Current

Application Support Engineer / Azure Data Architect

Quadrant Technologies
01.2024 - 09.2024

Senior Application Support Engineer / Platform Architect

Unatrac
03.2022 - 12.2023

Application Support Lead / Data Architect

Unatrac
05.2017 - 03.2022

Production Support Lead / Data Manager

Emirates NBD Bank
01.2013 - 05.2017

Project Lead

Nordea Bank (L&T Infotech)
01.2011 - 01.2013

Project Lead

Barclays Capital/Lehman Brothers (L&T Infotech)
01.2009 - 01.2011

Software Engineer/Lead

L&T Infotech & Accenture
01.2007 - 01.2009

Bachelor of Technology (B.Tech) -

JNTU

Some College (No Degree) - Machine Learning in Business

MIT Sloan School

Core Technical Skills

Prometheus, Grafana, Logic Monitor, Azure Monitor, Application Insights, CloudWatch, Autosys, Control-M, Azure Data Factory, Airflow, SQL Server, Oracle, PostgreSQL, MySQL, MongoDB, Snowflake, Redis, Python, PowerShell, Bash, Perl, SQL, ServiceNow (power user), Jira (power user), PagerDuty, Confluence, Azure (ADF, Synapse, Functions, AKS, Service Bus), AWS (EC2, S3, Lambda, Glue), Azure Data Factory, Databricks, AWS Glue, Apache Kafka, SSIS, Incident Management, Problem Management, Change Management, Service Quality, Teams, Slack, cross-functional team coordination, stakeholder communication, Docker, Kubernetes, Terraform, Infrastructure as Code, HIPAA, SOC 2, audit compliance, regulatory reporting

Personal Information

  • Title: Senior Application Support Engineer
  • Work Permit: Authorized to work in the United States
  • Availability: Available for hybrid work arrangements in Seattle, WA area. Open to 24x6 support coverage and on-call rotations., Authorized to work in the United States