Summary

Overview

Work History

Education

Skills

Certification

Timeline

Samarnath Gande

St Louis,MO

Summary

A results-driven Site Reliability Engineer with over 11 years of experience specializing in the design, automation, and management of scalable infrastructure across GCP, Azure, and on-premises data centers. With a strong foundation in Linux System Administration, I possess deep hands-on expertise in containerization and orchestration using Kubernetes (GKE) and advanced deployment strategies like Blue-Green and Canary. I am proficient in building robust CI/CD pipelines with Jenkins and automating infrastructure as code with Terraform and Ansible, complemented by strong skills in Python and Shell scripting. My experience extends to analyzing end-to-end architecture to identify performance bottlenecks, reduce technical debt, and implement comprehensive monitoring solutions with tools such as Datadog, Prometheus, and the ELK Stack.

Overview

years of professional experience

Certification

Work History

Site Reliability Engineer

LTIMindtree Ltd(Client: Equifax Inc)

USA

09.2023 - Current

Deploy GCP resources using Terraform.
Participate and maintain container Image Hygiene of microservices
Maintain 8 Kubernetes clusters for apps deployments in multiple regions
Built Datastudio reports based out of Stackdriver logs to calculate Golden metrics
Create and manage data storage solutions using GCP services such as BigQuery, Cloud Storage, and Cloud SQL
Implement data security and access controls using GCP's Identity and Access Management (IAM)
Monitor and troubleshoot pipelines and Microservices using GCP's Stackdriver and Datadog
Built Datadog Dashboards for all API's and webapps
Write automation using Groovy (most of it), Python and Shell
Migrate to Github actions.
Participate in code reviews and contribute to the development of best practices for engineering on GCP
Stay up to date with the latest GCP services and features and evaluate their potential use in the organization's data infrastructure.
Improved incident management workflows by creating comprehensive documentation on troubleshooting procedures and common issues resolution steps.
Developed custom scripts/tools as needed to automate routine tasks, increasing overall team productivity and efficiency.
Evaluated new technologies and tools to enhance overall system performance, stability, and security.
Collaborated with cross-functional teams to develop, test, and deploy scalable software solutions.
Implemented cost-saving measures by optimizing resource utilization across cloud-based infrastructure environments.
Contributed to the ongoing refinement of internal processes and procedures within the site reliability engineering discipline through regular reviews, updates, and knowledge sharing activities.
Designed load testing scenarios to validate application scalability under various traffic patterns and conditions.
Fostered collaboration between development and operations teams through effective communication strategies during project lifecycles.
Enhanced system monitoring capabilities, integrating advanced tools for real-time performance tracking and anomaly detection.
Improved deployment efficiency, automating processes using CI/CD pipelines.
Managed capacity planning efforts to ensure optimal resource allocation based on current demand projections and future growth expectations.
Created work schedules and adjusted as needed to meet project deadlines and keep shifts properly staffed.

Site Reliability Engineer

Tek Yantra Inc(Client: Equifax Inc)

USA

11.2022 - 09.2023

Deploy GCP resources using Terraform.
Participate and maintain container Image Hygiene of microservices
Maintain 8 Kubernetes clusters for apps deployments in multiple regions
Create and manage data storage solutions using GCP services such as BigQuery, Cloud Storage, and Cloud SQL
Implement data security and access controls using GCP's Identity and Access Management (IAM)
Monitor and troubleshoot pipelines and Microservices using GCP's Stackdriver and Datadog
Write automation using Groovy (most of it), Python and Shell
Migrate to Github actions.
Participate in code reviews and contribute to the development of best practices for engineering on GCP
Stay up to date with the latest GCP services and features and evaluate their potential use in the organization's data infrastructure.
Improved incident management workflows by creating comprehensive documentation on troubleshooting procedures and common issues resolution steps.
Developed custom scripts/tools as needed to automate routine tasks, increasing overall team productivity and efficiency.
Evaluated new technologies and tools to enhance overall system performance, stability, and security.
Collaborated with cross-functional teams to develop, test, and deploy scalable software solutions.
Implemented cost-saving measures by optimizing resource utilization across cloud-based infrastructure environments.
Contributed to the ongoing refinement of internal processes and procedures within the site reliability engineering discipline through regular reviews, updates, and knowledge sharing activities.
Designed load testing scenarios to validate application scalability under various traffic patterns and conditions.
Fostered collaboration between development and operations teams through effective communication strategies during project lifecycles.
Enhanced system monitoring capabilities, integrating advanced tools for real-time performance tracking and anomaly detection.
Improved deployment efficiency, automating processes using CI/CD pipelines.
Managed capacity planning efforts to ensure optimal resource allocation based on current demand projections and future growth expectations.
Created work schedules and adjusted as needed to meet project deadlines and keep shifts properly staffed.

Senior DevOps/SRE Engineer

Tek Yantra Inc(Client: CDPH)

USA

08.2022 - 11.2022

Creating the Azure DevOps pipelines, maintaining the DevOps code base in Azure Repos
Writing the Azure Test plans, and maintaining the build artifacts in Azure Artifacts
Involved in analysis and design of existing application components to be migrated to Azure PaaS
Code changes done to migrate existing application to Azure PaaS offerings such as Azure Web
app, Azure Web jobs and Azure SQL database
Configured existing console application running as Windows scheduler tasks as an Azure web job
Implemented Azure repo branching strategy and Azure pipelines (CI/CD) for deploying source code
to Azure web apps and Azure SQL databases
Involved in coordination and troubleshooting the issues with multiple teams and vendors at each phase
of the project.
Very good knowledge of Azure services Web-apps, key vault, SQL Server DB, Storage account, Redis
Cache
Environment: Account, Load Balancer, key vault, Azure Devops, Service Bus, Function app, Web apps,
Event hub and Notification Hub, Azure SQL, Postgress cosmos DB, Storage account azure active directory
Monitoring. Kubectl, Blob Storage, Secret Manager, YAML, RHEL7, Ubuntu, Centos, Ansible, Jenkins,
Docker, Shell Scripting, GitHub Enterprises, Kubernetes, Snowflake DB.

Site Reliability Engineer

HCL Technologies Pvt Ltd(Client: Google)

Hyderabad, India

02.2020 - 06.2022

Infrastructure & Automation:
Managed and maintained the stability of over 500+ servers for an internal ticketing tool on Google Cloud Platform (GCP) using Puppet for configuration management.
Engineered and automated GCP infrastructure provisioning using Terraform (Infrastructure as Code), integrating core services like IAM, networking, logging, and container management.
Installed, configured, and administered multiple Linux distributions, including Oracle Enterprise Linux and Ubuntu.
CI/CD & Release Management:
Developed, configured, and maintained CI/CD pipelines using Jenkins, Cloud Build, and Cloud Repo, enabling a touchless deployment process in collaboration with the development team.
Managed a weekly release cycle, deploying application updates to Kubernetes (GKE) and Borg clusters across multiple geographic locations.
Utilized advanced deployment strategies, including Blue-Green and Canary deployments, to minimize downtime and risk during releases.
Containerization & Kubernetes:
Built and maintained Kubernetes infrastructure (GKE), including writing Dockerfiles, managing container images in GCR, and deploying applications to both Kubernetes and internal Borg systems.
Implemented comprehensive Kubernetes monitoring and observability using Google Internal Monitoring tool to ensure cluster health and performance.
Cloud Services & Development:
Leveraged Python scripting to automate manual tasks and improve operational efficiency wherever necessary.
Implemented and managed cloud-native services, including Spanner for databases, and Cloud Pub/Sub for asynchronous messaging.
Configured and managed GCP networking components, including VPCs, security groups, and NAT, within development and sandbox environments.
Conducted root-cause analyses after major incidents to identify areas for process improvement or technical enhancement opportunities.
Environment: GCP Cloud Compute, storage, network, IAM, VPC, firewall rules, Load Balancer, Cloud Deployment Manager, Cloud Build, Cloud Repo, Cloud Run, Container Registry, GKE, Pub/Sub, Composer, Cloud Storage, Secret Manager, YAML, RHEL7, Centos, Puppet, Docker, Shell Scripting, Borg

Senior DevOps/Cloud Engineer

Foray Software Pvt Ltd(Client: Xilinx(Now AMD))

Hyderabad, India

04.2018 - 01.2020

CI/CD & Automation:
Established and maintained a CI/CD workflow using Jenkins to support the software team's daily and nightly builds.
Automated infrastructure and application configurations using Ansible, ensuring consistency across a fleet of over 300 servers.
Utilized Shell scripting to automate various operational tasks and streamline deployment processes.
Containerization & Deployment:
Built application deployment pipelines using Docker, which involved writing Dockerfiles, building images, and managing containers in a private registry.
Deployed and managed containerized applications on Kubernetes clusters.
System & Infrastructure Administration:
Administered a hybrid infrastructure of over 300 servers across AWS, and on-premises data centers.
Installed, configured, and maintained various Linux distributions, including RHEL, Ubuntu, and CentOS.
Built and managed virtual machines on physical hardware using VMware ESXi 6.5.
Monitoring & Database Management:
Deployed and configured the ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging and system monitoring.
Assisted in migration projects from on-premises data centers to cloud environments, ensuring minimal disruption to business operations.
Used metrics to monitor application and infrastructure performance.
Evaluated new cloud technologies and recommended solutions that aligned with organizational goals and objectives.

Senior DevOps/NOC Engineer

DVL Groups Ltd

Hyderabad, India

12.2013 - 01.2018

Infrastructure & System Administration:
Administered a global infrastructure of over 200 servers, including AWS EC2 instances and on-premises bare-metal servers, across multiple data centers.
Managed the complete lifecycle of Linux servers (RHEL/CentOS), including installation, configuration, routine patching, and system upgrades.
Built and maintained virtual machines on physical hardware using virtualization platforms such as VMware ESXi and XenServer.
DevOps & Automation:
Established and maintained a CI/CD workflow to support the deployment of Java-based applications, significantly improving release efficiency.
Deployed applications using Docker and Kubernetes, which included writing Dockerfiles, building images, and managing containers in a private registry.
Utilized Shell scripting to automate system administration tasks and streamline operational processes.
Monitoring & Database Management:
Implemented and configured Nagios for comprehensive infrastructure monitoring, ensuring high availability and proactive issue resolution.
Performed routine database administration, including installation, backups, and migrations, for various development and operational tools.
Conducted root cause analyses following incidents, implementing corrective actions to prevent recurrence and maintain service quality standards.
Exceeded SLA standards for response times and problem resolution.
Coordinated with technical support, service provisioning and sales teams to deliver network services at or above SLA requirements.
Reduced downtime with proactive identification of potential issues through regular network analysis and reporting.
Scheduled infrastructure upgrades and software update rollouts around high traffic times to maintain network availability.
Built servers, upgraded applications, and conducted hardware audits.
Environment: AWS Cloud, RHEL7, Ubuntu, Centos, VMWare, XenServer, Ansible, Jenkins, Docker, Shell Scripting, Git, Kubernetes, Jira, Mysql, Postgresql, Nagios.

Technical Associate

Sneha Synergy Solutions Pvt Ltd

Hyderabad, India

05.2012 - 11.2013

Installed, configured, tested and maintained operating systems, application software, and system management tools.
Improved customer satisfaction by providing exceptional technical support and troubleshooting assistance.
Led successful software deployments, minimizing disruptions to end-users during critical system upgrades.
Collaborated with senior technical staff on larger projects, exceeding required contribution thresholds to project workflow.
Assisted in the evaluation of potential tools and technologies, facilitating informed decision-making for future investments.
Diagnosed and researched technical faults, escalating support issues when necessary.
Participated in new technology rollouts, following mandated technical change procedures to maintain smooth transitions.
Increased productivity by automating repetitive tasks using scripting languages.
Used ticketing systems to manage and process support actions and requests.
Patched software and installed new versions to eliminate security problems and protect data.
Environment: RHEL, Ubuntu, Centos, VMWare, XenServer, Mysql, Postgresql, Asterisk (VoIP).

Education

Masters - ECE

Osmania University

01.2014

Bachelors - ECE

JNTU Hyderabad

01.2012

Skills

Cloud Platforms: Google Cloud Platform (GCP), Microsoft Azure, Amazon Web Services (AWS)
GCP Core: GKE, Compute Engine, Cloud Storage, VPC, IAM, Cloud Functions, BigQuery, Spanner, Composer, Dataflow, Pub/Sub
Azure Core: Azure DevOps, Key Vault, Service Bus, Function Apps, Web Apps, Event Hub
Infrastructure as Code (IaC): Terraform, Ansible, CloudFormation
Containerization & Orchestration: Docker, Kubernetes (GKE), Kubectl

CI/CD & Automation: Jenkins, Groovy, Cloud Build, GitHub
Scripting & Languages: Python, Shell Scripting, YAML
Monitoring & Observability: Datadog, Prometheus, ELK Stack (Elasticsearch, Logstash, Kibana), Stackdriver
Databases: Firestore, Spanner, Cloud SQL, MySQL, Oracle SQL
Operating Systems: RHEL, CentOS, Ubuntu

Certification

Red Hat Certified System Administrator in Red Hat OpenStack – EX210
Red Hat Certified Engineer – EX300
Red Hat Certified System Administrator – EX200
Link: https://rhtapps.redhat.com/verify?certId=170-138-960

Timeline

Site Reliability Engineer

LTIMindtree Ltd(Client: Equifax Inc)

09.2023 - Current

Site Reliability Engineer

Tek Yantra Inc(Client: Equifax Inc)

11.2022 - 09.2023

Senior DevOps/SRE Engineer

Tek Yantra Inc(Client: CDPH)

08.2022 - 11.2022

Site Reliability Engineer

HCL Technologies Pvt Ltd(Client: Google)

02.2020 - 06.2022

Senior DevOps/Cloud Engineer

Foray Software Pvt Ltd(Client: Xilinx(Now AMD))

04.2018 - 01.2020

Senior DevOps/NOC Engineer

DVL Groups Ltd

12.2013 - 01.2018

Technical Associate

Sneha Synergy Solutions Pvt Ltd

05.2012 - 11.2013

Bachelors - ECE

JNTU Hyderabad

Masters - ECE

Osmania University