Summary
Overview
Work History
Education
Skills
Software
Certification
Timeline
Languages
Generic

Debayan Majumdar

Engineering Leadership

Summary

Decisive Engineering Leader highly effective at operating in large scale, dynamic, high-pressure environments. Successfully handled multiple simultaneous responsibilities while exceeding objectives and satisfying tough customers. Skilled at mentoring and growing future leaders in technology. Published tech author and speaker. Thought leader.

Overview

13
13
years of professional experience
6
6
years of post-secondary education
2
2
Certificates
5
5
Languages

Work History

Sr. Engineering Manager, Observability

Roblox
11.2023 - Current
  • Led an engineering team of 12 in building scalable telemetry infrastructure and applications.
  • Spearheaded reliability efforts for a massive infrastructure processing 250M datapoints/sec across 160 VM clusters, reducing alerts by over 50% and completely eliminating high-severity incidents.
  • Directed the development of an Automated Canary Analysis (ACA) framework, preventing 500+ faulty deployments with an 82% success rate.
  • Introduced ACA ML, enhancing ACA's success rate by 10% using machine learning-based confidence scoring and a backtesting framework for global rules and thresholds.
  • Developed a comprehensive alert management system, integrating default, custom, and observability (O11y) alerts through "1 Alert Manager," a unified interface for silencing alerts across 160 VM clusters.
  • Built and maintained an LLM bot and workflows supporting 1,800+ engineers with telemetry and debugging queries.
  • Led the migration from S3 to Ceph Object Storage, reducing backup and recovery times from 30 minutes to 30 seconds per cluster.
  • Developed the Service Health Report, a root cause analysis product that visualizes service dependencies and anomalies, reducing mean time to detection (MTTD) by 50% and mean time to mitigation (MTTM) by 60%.
  • Pioneered the SLO/SLI framework for services, including creating product SLOs for boundary services and educating customers on SLO best practices.
  • Implemented Arize, an observability tool for machine learning (ML) and large language model (LLM) workloads.
  • Designed and implemented a cardinality and expensive query limiter to safeguard Victoria Metrics infrastructure from inefficient usage, establishing guardrails and a quota management system.
  • Created a Slack workflow for shoutouts, collaborated on onboarding presentations, and enhanced the employee onboarding experience with cross-team collaboration. Also, initiated awards for #oneTeam Champion and #reliability award.
  • Developed documentation on agile planning and prioritization, implemented Jira automation for roadmaps and sprint planning, and introduced Geekbot for automated standups and team wellbeing surveys.
  • Organized team activities, initiated bi-weekly knowledge transfer sessions, and actively participated in incident reviews to maintain high reliability standards.
  • Led yearly roadmap development, quarterly planning, and 1:1 coaching, helping to progress multiple team members and continuously improve team innovation and performance.
  • Negotiated vendor contract to save the company 7M dollars.

Sr. Engineering Manager, SRE

Clickup
11.2022 - 10.2023
  • Led and managed distributed team of SREs across US and Europe, driving collaboration, performance, and career development.
  • Spearheaded transformation of ClickUp's availability from Zero 9's to 3 9's, with peak of 4 9's achieved for a week, improving service reliability and customer satisfaction.
  • Implemented and documented incident management process, resulting in improved overall reliability, faster response times, and efficient onboarding, saving company millions of dollars.
  • Developed and deployed incident bot and automated triage/remediation tooling, significantly reducing mean time to repair(MTTR) from hours to minutes.
  • Played pivotal role in sharding data and decommissioning our globally replicated tables from 9 AWS regions, resulting in approximately $20 million savings in AWS infrastructure costs and enhanced system stability.
  • Collaborated with stakeholders to educate, gather requirements and define clear SLIs and SLOs for services, ensuring alignment with business needs and customer expectations.
  • Established and maintained comprehensive service catalog, fostering a culture of service ownership within the Engineering department.
  • Worked closely with cross-functional teams, including product management, development, and operations, to integrate SLIs and SLOs into the service development lifecycle, promoting reliability and accountability.
  • Collaborated with service owners and teams to establish error budgets based on SLOs, facilitating effective prioritization of engineering efforts and balancing innovation with system stability.
  • Led performance reviews, coaching, and mentorship for team members, supporting their growth and development.
  • Managed and sponsored key projects, overseeing project management, resource allocation, and timeline adherence.
  • Led long-term roadmap planning for SRE team, aligning with organizational goals and strategic initiatives.
  • Established and tracked OKRs (Objectives and Key Results) for SRE team, driving focus, accountability, and continuous improvement.
  • Introduced SRE office hours to provide training and guidance on observability and infrastructure best practices to other teams.
  • Led transformation of observability, reducing Datadog costs and improving visibility of cost ownership by service.
  • Drove adoption of open-telemetry with bifurcation strategies to enhance observability and further reduce Datadog costs.
  • Implemented Yotascale to manage cloud cost observability and shift AWS cost ownership to service teams.
  • Led security and compliance initiatives, including EU data residency, SOC2 compliance, and ISO certification, ensuring adherence to regulatory requirements and protection of customer data.
  • Implemented successful reward systems to foster positive and performance-oriented environment, including introduction of Monthly Reliability Champion award recognizing individuals across the organization who led significant improvements in reliability and quality.
  • Established weekly Ops review process, delivering comprehensive reports on service reliability and effectively surfacing immediate attention and needs to product and engineering leadership, ensuring prompt actions to address critical issues and drive continuous improvement.
  • Created comprehensive interview rubrics and robust hiring framework for all individual contributor (IC) and engineering management (EM) roles within the infrastructure organization, ensuring consistency, objectivity, and alignment with organization's values and technical competencies.
  • Saved company over $1 million by skillfully renegotiating vendor contracts, optimizing costs, and maximizing value for the organization.
  • Established culture of continuous feedback by implementing wellness surveys, conducting regular 1:1s, and administering DX (Developer Experience) surveys, fostering open communication, employee well-being, and continuous improvement within the organization

Sr. Engineering Manager, Core Engineering

Pandora
06.2021 - 09.2022
  • Led an org of 30+ engineers spanning multiple disciplines namely, client tools and infrastructure engineering, quality tools, Backend paved path tooling, microservices platform engineering, cloud engineering and SRE.
  • Helped develop the vision, charter and values for each team in the org.
  • Groomed and mentored multiple engineers into tech leads and managers, both within the org and outside our org.
  • Delivered 100% of my OKRs or greater each year.
  • Owned the paved path of tools and practices that would empower thousands of engineers to deploy code with greater velocity, high reliability, better quality, higher security, better data to facilitate decision making and increased operational efficiency and productivity.
  • Incorporated mob programming into our workflow to achieve higher throughput, better code quality, less blockers and more knowledge sharing across teams.
  • Worked with legal and security to develop a process for intaking opensource contributions for Pandora/SiriusXM.
  • Encouraged teams to develop home grown tools that could be opensourced. Sponsored multiple engineers to present at various worldwide conferences.
  • Had the lowest attrition among all engineering teams at SiriusXM/Pandora.
  • Preemptively, re-skilled the entire team with AWS training during a period of uncertainty and chaos. Cleared the AWS solutions Architect Associate certification along with the entire team.
  • Developed a brand new platform to migrate services, data and storage from our on-prem cloud to AWS.
  • Extreme collaboration and coordination with various engineering teams to achieve 100% paved path adoption.
  • Strong partnership with finance, procurement, legal and enterprise security teams to onboard new vendors or propagate new processes.
  • Successfully brought about a cultural change and shifted testing and security to the left.

Engineering Manager, Developer Experience

Pandora
07.2019 - 06.2021
  • Hire and retain talent across multiple geographic locations.
  • Prepare Budget documentation and Business Justification documents for vendor contracts.
  • Negotiated vendor contracts to save the company $1.5M/year.
  • Manage multiple teams of engineers - Platform engineering, Client Devops & Infra, DevSecops and Quality Tooling.
  • Helped shape the overall quality strategy of our app, which helped reduce outages by 90%.
  • Recognized as a deep thinker, disrupter and change agent for Infrastructure Engineering, challenging the status quo by not settling for incremental progress measured by quarterly goals or adhering to company norms.
  • Led transformation from a "hero culture" to a business process based operating culture.
  • Part of Pandora/SXM's Tiger team that leads our DE&I initiatives.

Technical Team Lead, Mobile Tools

Pandora
10.2016 - 05.2019
  • As the first engineer on the mobile tools team, I transformed our mobile infrastructure to become the best in class in the industry.
  • Reduced our app release cycle from 4 months to bi-weekly release cycles.
  • We were the first Enterprise company to achieve continuous delivery with mobile, beating Spotify, Apple music, Google and Amazon to become the only Enterprise app to feature on iOS day 0 releases.
  • Represented the company at various conferences and became a thought leader in the industry.
  • Grew and mentored a set of top quality engineers across Oakland and Atlanta. Interviewed and hired top talent to grow the Mobile Tools Team to 6 engineers and the Dev Tools team to 5 engineers. Interviewed and hired the initial 5 engineers for the SRE team as well. Also, part of the interview Panel for QE Automation Engineers, Mobile and QE managers.
  • Found innovative ways of conducting knowledge transfer with the mobile teams by conducting various pop quizzes and awarding prizes to the winners. This became a huge hit.
  • Helped develop 'Bob the Builder', Pandora's homegrown version of Backstage, Spotify's opensource DevEx platform that paved the path for us to break from a monolithic architecture to a microservices architecture and achieve true continuous delivery for our backend services.

Lead DevOps Engineer

Asurion
09.2014 - 10.2016
  • Led cloud migration to Azure for dot net applications within 14 days
  • Led migration of applications to AWS
  • Developed the continuous delivery pipeline for multiple applications.
  • Incorporated continuous security and continuous testing in the delivery pipeline.

Co-Founder

Webscholarz
11.2012 - 03.2015
  • Co-founded an online digital marketing education platform.
  • Built and shaped the initial product offering.
  • Formulated strategic direction and marketing.
  • Grew the team to over 45 employees and built a subscriber base of over 10,000 subscribers.
  • Eventually merged with KreativeMachinez.

Systems Engineer

Visa Inc
12.2012 - 08.2014
  • Developed the platform and automation for Cybersource.
  • Performed release engineering activities for various clients.

Marketing Strategy and Project Management Intern

Stanford University
04.2011 - 08.2011
  • Researched under BJ Fogg on various techniques for persuasive marketing and subliminal messaging.

Education

Master of Science - Information Technology

Colorado Technical University
Aurora, CO
01.2014 - 10.2014

MBA - Digital Marketing

Hult International Business School
San Francisco, CA
08.2011 - 08.2012

B.Tech - Electronics And Communication Engineering

West Bengal University of Technology
Kolkata, India
05.2007 - 05.2011

Skills

Mentorship

Software

Cloud Engineering

Continuous Delivery

Internal Tools Development

Service Mesh

Continuous Testing

Continuous Security

Continuous Verification

Value Stream Dashboard

Mobile Tools Infrastructure and Tooling

Developer Experience

Distributed Systems

Observability

Reliability

Certification

AWS Solutions Architect Associate

Timeline

Sr. Engineering Manager, Observability

Roblox
11.2023 - Current

Sr. Engineering Manager, SRE

Clickup
11.2022 - 10.2023

AWS Solutions Architect Associate

08-2022

Sr. Engineering Manager, Core Engineering

Pandora
06.2021 - 09.2022

Engineering Manager, Developer Experience

Pandora
07.2019 - 06.2021

Technical Team Lead, Mobile Tools

Pandora
10.2016 - 05.2019

Certified Jenkins Engineer

09-2015

Lead DevOps Engineer

Asurion
09.2014 - 10.2016

Master of Science - Information Technology

Colorado Technical University
01.2014 - 10.2014

Systems Engineer

Visa Inc
12.2012 - 08.2014

Co-Founder

Webscholarz
11.2012 - 03.2015

MBA - Digital Marketing

Hult International Business School
08.2011 - 08.2012

Marketing Strategy and Project Management Intern

Stanford University
04.2011 - 08.2011

B.Tech - Electronics And Communication Engineering

West Bengal University of Technology
05.2007 - 05.2011

Languages

Python
Very Good
Shell Scripting
Excellent
Java
Good
Php
Good
Ruby
Good
Debayan MajumdarEngineering Leadership