Summary
Overview
Work History
Education
Skills
Awards
Timeline
Generic

Sanjit Lal

Stamford,CT

Summary

Senior Software Engineer with a proven track record in incident management, specializing in system reliability, rapid incident response, and performance optimization in high-stakes environments. Experienced in leading cross-functional teams, troubleshooting complex issues, and maintaining system availability. Seeking to transition into data engineering, leveraging skills in data pipeline development, ETL processes, and database management. Passionate about utilizing data to enhance system performance and drive business outcomes.

Overview

12
12
years of professional experience

Work History

Site Reliability Engineer (Incident Management)

Tata Consultancy Services - Apple Inc
03.2012 - 08.2024
  • Incident Commander, responsible for driving and managing incident resolution with a high level of urgency, cross-functional collaboration, and accuracy, while partnering with a global and diverse set of teams, including Engineering, Development, QA, Product, Compliance, Legal, Execs, etc
  • Managed high velocity of incidents (approx 60 incidents/hour) for major retail events, system upgrades, infrastructural failure events.
  • Lead all user-facing incidents across domains including reliability, technical, security, network, retail, payments, Financing, data privacy etc
  • Contribute to the root cause analysis process while conducting post-mortems, remediation identification, and ensure problem management tasks meet SLA and user expectations
  • Drive improvements in the incident handling process and incident management metrics and tooling based on trends and incident data and application metrics
  • Identification, tracking, and resolution of application bugs, ensuring timely fixes to maintain system stability and performance
  • Planning, coordination, and execution of change management processes, minimizing downtime and ensuring seamless implementation of updates and system improvements
  • Implement and maintain observability and monitoring solutions using Splunk, Hubble, ServiceNow, and internal tools to ensure real-time system health tracking, proactive issue detection, and rapid incident response
  • Analyze application metrics and user data to identify performance bottlenecks, optimize system efficiency, and drive improvements in user experience and application performance.
  • Improved incident management workflows by creating comprehensive documentation on troubleshooting procedures and common issues resolution steps.
  • Developed custom scripts/tools as needed to automate routine tasks, increasing overall team productivity and efficiency.
  • .Fine-tuned query performance and optimized database structures for faster, more accurate data retrieval and reporting.
  • Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.

Education

Bachelor of Technology - Electronics & Communication

Sikkim Manipal Institute of Technology
India
06.2011

Skills

  • Unix, PL/SQL, AWS, Python, Java, Unix Shell scripting, Splunk, GIT, Docker, ServiceNow, JIRA, Kafka, Apache Airflow, GIT, Solr, API architecture (SOAP/REST), Incident Management, Microservices Architecture

Awards

  • On the spot Awards
  • Star performer award (Apple Inc)

Timeline

Site Reliability Engineer (Incident Management)

Tata Consultancy Services - Apple Inc
03.2012 - 08.2024

Bachelor of Technology - Electronics & Communication

Sikkim Manipal Institute of Technology
Sanjit Lal