• Review hardware resources and performance of critical customers to identify and fix issues proactively.
• Own customer escalations and work with internal, external teams to provide quick resolution.
• Member of cross teams' collaborative effort to review last one-year outages and respective RCA docs to identify gaps and action plans to ensure non-repetition of root cause.
• Setup meetings and drive small projects by coordinating within and across multiple teams.
• Capacity planning of customer environment to support increased usage during training rollouts.
• Managing ticketing queue, assigning tasks, and guiding the team with complex issues.
• Chalk out hot ticket plan, identify risks, resources and follow ups till closure.
• Managing product release readiness.
• Deploy customer web sites, in-place upgrades, and migration.
• Setup meeting with stakeholders to gather project data and raise risks, blockers upfront.
• Planned well ahead and listed out all the steps for handling critical deliverables.
• Drafted new SOPs and updated existing SOPs.
• IIS Log parsing knowledge. (Log Parser Lizard and Splunk).
• Participated in product job failure drive; looked at repetitive failures, identified environment or product issue and referred to respective teams.
• Member of product readiness group - proposed product enhancements, identifying hosting bottlenecks and risks due to new functionality or features.
• Review Engg provided performance stats for new features and ensure hosting SOPs, VM templates are up to date to handle upgraded products.
• Raised flag if noticed increase in resource utilization due to new feature.
• Train new team members on upgrades, migrations, and new site installation.
• Handle Tier 2 on Call; resolve site outages and deal customer escalations.
• Perform RCAs in Why format and provide preventive actions to ensure issues don't occur in future or incident time is reduced by 10%.
• Working on Application related and Infra issues.
• Creating Splunk alerts, Dashboards and Reports using Splunk Queries
• Working on Windows server and IIS issues
• Participating in Chaos Monkey testing every week for multiple applications and creating necessary alerts for them
• Handling Telsa Dry Runs every week, fail over of PROD to DR environment.
• Working on SQL server for adding, modifying data and generating weekly reports
• Worked as a shift lead and responsible of monitoring the shift members activities and sending the shift report every day.
• Creating VM's and Cloud services in Azure
• Participating in the Bridge Calls for Major Incidents and involving in PIR calls
• Generating weekly Azure subscription audit reports using Powershell Scripts
• Monitoring Splunk alerts and Application insights alerts
• Troubleshooting the issues using Application Insights
• Creating Azure alerts using Azure Monitor
• Work on requirement gathering, planning and deployment of Subtotal products for new customers.
• Follow CTGL methodology for deployment process.
• Work on process improvements & standardizations to improve SLA.
• Participate in new product releases and deployments.
• Contribute in reality lab for new product version rollouts.
• Support in improving standard operating procedures for existing processes and devising procedures for new products.
• Work on decommissioning of old customer sites
• Good experience in analysis of event viewer log files to identify the issue.
• Having Good experience in troubleshooting IIS issues using console and command line utilities.
• Deployed ASP.NET applications in the IIS web farms.
• Providing on call support on 24/7 basis
• Troubleshooting Sev1, Sev2 and Sev3 issues within service level agreements (SLAs)
• I have good experience in resolution of incidents using console, command-line and log files analysis.
• Played the roles within the team like shift leader and change coordinator.
• Having good communication skills and inter personnel skills.
• Experienced in smooth change implementation during production release.
• Having Good knowledge on ITIL V3 processes like change management, Incident management, and problem management.
• Having good experience in working on Problem and identifying root cause analysis (RCA) to fix the issues permanently.