Summary
Overview
Work History
Education
Skills
Certification
Timeline
background-images

Samarnath Gande

St Louis,MO

Summary

A results-driven Site Reliability Engineer with over 11 years of experience specializing in the design, automation, and management of scalable infrastructure across GCP, Azure, and on-premises data centers. With a strong foundation in Linux System Administration, I possess deep hands-on expertise in containerization and orchestration using Kubernetes (GKE) and advanced deployment strategies like Blue-Green and Canary. I am proficient in building robust CI/CD pipelines with Jenkins and automating infrastructure as code with Terraform and Ansible, complemented by strong skills in Python and Shell scripting. My experience extends to analyzing end-to-end architecture to identify performance bottlenecks, reduce technical debt, and implement comprehensive monitoring solutions with tools such as Datadog, Prometheus, and the ELK Stack.

Overview

13
13
years of professional experience
1
1
Certification

Work History

Site Reliability Engineer

LTIMindtree Ltd(Client: Equifax Inc)
09.2023 - Current
  • Deploy GCP resources using Terraform.
  • Participate and maintain container Image Hygiene of microservices
  • Maintain 8 Kubernetes clusters for apps deployments in multiple regions
  • Built Datastudio reports based out of Stackdriver logs to calculate Golden metrics
  • Create and manage data storage solutions using GCP services such as BigQuery, Cloud Storage, and Cloud SQL
  • Implement data security and access controls using GCP's Identity and Access Management (IAM)
  • Monitor and troubleshoot pipelines and Microservices using GCP's Stackdriver and Datadog
  • Built Datadog Dashboards for all API's and webapps
  • Write automation using Groovy (most of it), Python and Shell
  • Migrate to Github actions.
  • Participate in code reviews and contribute to the development of best practices for engineering on GCP
  • Stay up to date with the latest GCP services and features and evaluate their potential use in the organization's data infrastructure.
  • Improved incident management workflows by creating comprehensive documentation on troubleshooting procedures and common issues resolution steps.
  • Developed custom scripts/tools as needed to automate routine tasks, increasing overall team productivity and efficiency.
  • Evaluated new technologies and tools to enhance overall system performance, stability, and security.
  • Collaborated with cross-functional teams to develop, test, and deploy scalable software solutions.
  • Implemented cost-saving measures by optimizing resource utilization across cloud-based infrastructure environments.
  • Contributed to the ongoing refinement of internal processes and procedures within the site reliability engineering discipline through regular reviews, updates, and knowledge sharing activities.
  • Designed load testing scenarios to validate application scalability under various traffic patterns and conditions.
  • Fostered collaboration between development and operations teams through effective communication strategies during project lifecycles.
  • Enhanced system monitoring capabilities, integrating advanced tools for real-time performance tracking and anomaly detection.
  • Improved deployment efficiency, automating processes using CI/CD pipelines.
  • Managed capacity planning efforts to ensure optimal resource allocation based on current demand projections and future growth expectations.
  • Created work schedules and adjusted as needed to meet project deadlines and keep shifts properly staffed.

Site Reliability Engineer

Tek Yantra Inc(Client: Equifax Inc)
11.2022 - 09.2023
  • Deploy GCP resources using Terraform.
  • Participate and maintain container Image Hygiene of microservices
  • Maintain 8 Kubernetes clusters for apps deployments in multiple regions
  • Create and manage data storage solutions using GCP services such as BigQuery, Cloud Storage, and Cloud SQL
  • Implement data security and access controls using GCP's Identity and Access Management (IAM)
  • Monitor and troubleshoot pipelines and Microservices using GCP's Stackdriver and Datadog
  • Write automation using Groovy (most of it), Python and Shell
  • Migrate to Github actions.
  • Participate in code reviews and contribute to the development of best practices for engineering on GCP
  • Stay up to date with the latest GCP services and features and evaluate their potential use in the organization's data infrastructure.
  • Improved incident management workflows by creating comprehensive documentation on troubleshooting procedures and common issues resolution steps.
  • Developed custom scripts/tools as needed to automate routine tasks, increasing overall team productivity and efficiency.
  • Evaluated new technologies and tools to enhance overall system performance, stability, and security.
  • Collaborated with cross-functional teams to develop, test, and deploy scalable software solutions.
  • Implemented cost-saving measures by optimizing resource utilization across cloud-based infrastructure environments.
  • Contributed to the ongoing refinement of internal processes and procedures within the site reliability engineering discipline through regular reviews, updates, and knowledge sharing activities.
  • Designed load testing scenarios to validate application scalability under various traffic patterns and conditions.
  • Fostered collaboration between development and operations teams through effective communication strategies during project lifecycles.
  • Enhanced system monitoring capabilities, integrating advanced tools for real-time performance tracking and anomaly detection.
  • Improved deployment efficiency, automating processes using CI/CD pipelines.
  • Managed capacity planning efforts to ensure optimal resource allocation based on current demand projections and future growth expectations.
  • Created work schedules and adjusted as needed to meet project deadlines and keep shifts properly staffed.

Senior DevOps/SRE Engineer

Tek Yantra Inc(Client: CDPH)
08.2022 - 11.2022
  • Creating the Azure DevOps pipelines, maintaining the DevOps code base in Azure Repos
  • Writing the Azure Test plans, and maintaining the build artifacts in Azure Artifacts
  • Involved in analysis and design of existing application components to be migrated to Azure PaaS
  • Code changes done to migrate existing application to Azure PaaS offerings such as Azure Web
    app, Azure Web jobs and Azure SQL database
  • Configured existing console application running as Windows scheduler tasks as an Azure web job
  • Implemented Azure repo branching strategy and Azure pipelines (CI/CD) for deploying source code
    to Azure web apps and Azure SQL databases
  • Involved in coordination and troubleshooting the issues with multiple teams and vendors at each phase
    of the project.
  • Very good knowledge of Azure services Web-apps, key vault, SQL Server DB, Storage account, Redis
    Cache
  • Environment: Account, Load Balancer, key vault, Azure Devops, Service Bus, Function app, Web apps,
    Event hub and Notification Hub, Azure SQL, Postgress cosmos DB, Storage account azure active directory
    Monitoring. Kubectl, Blob Storage, Secret Manager, YAML, RHEL7, Ubuntu, Centos, Ansible, Jenkins,
    Docker, Shell Scripting, GitHub Enterprises, Kubernetes, Snowflake DB.

Site Reliability Engineer

HCL Technologies Pvt Ltd(Client: Google)
02.2020 - 06.2022
  • Infrastructure & Automation:
    Managed and maintained the stability of over 500+ servers for an internal ticketing tool on Google Cloud Platform (GCP) using Puppet for configuration management.
  • Engineered and automated GCP infrastructure provisioning using Terraform (Infrastructure as Code), integrating core services like IAM, networking, logging, and container management.
  • Installed, configured, and administered multiple Linux distributions, including Oracle Enterprise Linux and Ubuntu.
  • CI/CD & Release Management:
    Developed, configured, and maintained CI/CD pipelines using Jenkins, Cloud Build, and Cloud Repo, enabling a touchless deployment process in collaboration with the development team.
  • Managed a weekly release cycle, deploying application updates to Kubernetes (GKE) and Borg clusters across multiple geographic locations.
  • Utilized advanced deployment strategies, including Blue-Green and Canary deployments, to minimize downtime and risk during releases.
  • Containerization & Kubernetes:
    Built and maintained Kubernetes infrastructure (GKE), including writing Dockerfiles, managing container images in GCR, and deploying applications to both Kubernetes and internal Borg systems.
  • Implemented comprehensive Kubernetes monitoring and observability using Google Internal Monitoring tool to ensure cluster health and performance.
  • Cloud Services & Development:
    Leveraged Python scripting to automate manual tasks and improve operational efficiency wherever necessary.
    Implemented and managed cloud-native services, including Spanner for databases, and Cloud Pub/Sub for asynchronous messaging.
  • Configured and managed GCP networking components, including VPCs, security groups, and NAT, within development and sandbox environments.
  • Conducted root-cause analyses after major incidents to identify areas for process improvement or technical enhancement opportunities.
  • Environment: GCP Cloud Compute, storage, network, IAM, VPC, firewall rules, Load Balancer, Cloud Deployment Manager, Cloud Build, Cloud Repo, Cloud Run, Container Registry, GKE, Pub/Sub, Composer, Cloud Storage, Secret Manager, YAML, RHEL7, Centos, Puppet, Docker, Shell Scripting, Borg

Senior DevOps/Cloud Engineer

Foray Software Pvt Ltd(Client: Xilinx(Now AMD))
04.2018 - 01.2020
  • CI/CD & Automation:
    Established and maintained a CI/CD workflow using Jenkins to support the software team's daily and nightly builds.
    Automated infrastructure and application configurations using Ansible, ensuring consistency across a fleet of over 300 servers.
    Utilized Shell scripting to automate various operational tasks and streamline deployment processes.
  • Containerization & Deployment:
    Built application deployment pipelines using Docker, which involved writing Dockerfiles, building images, and managing containers in a private registry.
    Deployed and managed containerized applications on Kubernetes clusters.
  • System & Infrastructure Administration:
    Administered a hybrid infrastructure of over 300 servers across AWS, and on-premises data centers.
    Installed, configured, and maintained various Linux distributions, including RHEL, Ubuntu, and CentOS.
    Built and managed virtual machines on physical hardware using VMware ESXi 6.5.
  • Monitoring & Database Management:
    Deployed and configured the ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging and system monitoring.
  • Assisted in migration projects from on-premises data centers to cloud environments, ensuring minimal disruption to business operations.
  • Used metrics to monitor application and infrastructure performance.
  • Evaluated new cloud technologies and recommended solutions that aligned with organizational goals and objectives.

Senior DevOps/NOC Engineer

DVL Groups Ltd
12.2013 - 01.2018
  • Infrastructure & System Administration:
    Administered a global infrastructure of over 200 servers, including AWS EC2 instances and on-premises bare-metal servers, across multiple data centers.
    Managed the complete lifecycle of Linux servers (RHEL/CentOS), including installation, configuration, routine patching, and system upgrades.
    Built and maintained virtual machines on physical hardware using virtualization platforms such as VMware ESXi and XenServer.
  • DevOps & Automation:
    Established and maintained a CI/CD workflow to support the deployment of Java-based applications, significantly improving release efficiency.
    Deployed applications using Docker and Kubernetes, which included writing Dockerfiles, building images, and managing containers in a private registry.
    Utilized Shell scripting to automate system administration tasks and streamline operational processes.
  • Monitoring & Database Management:
    Implemented and configured Nagios for comprehensive infrastructure monitoring, ensuring high availability and proactive issue resolution.
    Performed routine database administration, including installation, backups, and migrations, for various development and operational tools.
  • Conducted root cause analyses following incidents, implementing corrective actions to prevent recurrence and maintain service quality standards.
  • Exceeded SLA standards for response times and problem resolution.
  • Coordinated with technical support, service provisioning and sales teams to deliver network services at or above SLA requirements.
  • Reduced downtime with proactive identification of potential issues through regular network analysis and reporting.
  • Scheduled infrastructure upgrades and software update rollouts around high traffic times to maintain network availability.
  • Built servers, upgraded applications, and conducted hardware audits.
  • Environment: AWS Cloud, RHEL7, Ubuntu, Centos, VMWare, XenServer, Ansible, Jenkins, Docker, Shell Scripting, Git, Kubernetes, Jira, Mysql, Postgresql, Nagios.

Technical Associate

Sneha Synergy Solutions Pvt Ltd
05.2012 - 11.2013
  • Installed, configured, tested and maintained operating systems, application software, and system management tools.
  • Improved customer satisfaction by providing exceptional technical support and troubleshooting assistance.
  • Led successful software deployments, minimizing disruptions to end-users during critical system upgrades.
  • Collaborated with senior technical staff on larger projects, exceeding required contribution thresholds to project workflow.
  • Assisted in the evaluation of potential tools and technologies, facilitating informed decision-making for future investments.
  • Diagnosed and researched technical faults, escalating support issues when necessary.
  • Participated in new technology rollouts, following mandated technical change procedures to maintain smooth transitions.
  • Increased productivity by automating repetitive tasks using scripting languages.
  • Used ticketing systems to manage and process support actions and requests.
  • Patched software and installed new versions to eliminate security problems and protect data.
  • Environment: RHEL, Ubuntu, Centos, VMWare, XenServer, Mysql, Postgresql, Asterisk (VoIP).

Education

Masters - ECE

Osmania University
01.2014

Bachelors - ECE

JNTU Hyderabad
01.2012

Skills

  • Cloud Platforms: Google Cloud Platform (GCP), Microsoft Azure, Amazon Web Services (AWS)
  • GCP Core: GKE, Compute Engine, Cloud Storage, VPC, IAM, Cloud Functions, BigQuery, Spanner, Composer, Dataflow, Pub/Sub
  • Azure Core: Azure DevOps, Key Vault, Service Bus, Function Apps, Web Apps, Event Hub
  • Infrastructure as Code (IaC): Terraform, Ansible, CloudFormation
  • Containerization & Orchestration: Docker, Kubernetes (GKE), Kubectl
  • CI/CD & Automation: Jenkins, Groovy, Cloud Build, GitHub
  • Scripting & Languages: Python, Shell Scripting, YAML
  • Monitoring & Observability: Datadog, Prometheus, ELK Stack (Elasticsearch, Logstash, Kibana), Stackdriver
  • Databases: Firestore, Spanner, Cloud SQL, MySQL, Oracle SQL
  • Operating Systems: RHEL, CentOS, Ubuntu

Certification

  • Red Hat Certified System Administrator in Red Hat OpenStack – EX210
  • Red Hat Certified Engineer – EX300
  • Red Hat Certified System Administrator – EX200
  • Link: https://rhtapps.redhat.com/verify?certId=170-138-960

Timeline

Site Reliability Engineer

LTIMindtree Ltd(Client: Equifax Inc)
09.2023 - Current

Site Reliability Engineer

Tek Yantra Inc(Client: Equifax Inc)
11.2022 - 09.2023

Senior DevOps/SRE Engineer

Tek Yantra Inc(Client: CDPH)
08.2022 - 11.2022

Site Reliability Engineer

HCL Technologies Pvt Ltd(Client: Google)
02.2020 - 06.2022

Senior DevOps/Cloud Engineer

Foray Software Pvt Ltd(Client: Xilinx(Now AMD))
04.2018 - 01.2020

Senior DevOps/NOC Engineer

DVL Groups Ltd
12.2013 - 01.2018

Technical Associate

Sneha Synergy Solutions Pvt Ltd
05.2012 - 11.2013

Bachelors - ECE

JNTU Hyderabad

Masters - ECE

Osmania University
Samarnath Gande