Summary
Overview
Work History
Education
Skills
Accomplishments
Timeline
Generic

Avinash Gupta

Ashland,MA

Summary

Dedicated and skilled Senior DevOps/Site Reliability Engineer with a proven track record in implementing and managing robust, scalable, and highly available cloud-based systems. Seeking to contribute expertise in Cloud Engineering to a dynamic team, leveraging a strong background in SRE principles, automation, and cloud technologies. More than 15 years of IT experience working in the e-Commerce and Insurance domain which includes 9+ years of experience working as SRE/Senior DevOps/Cloud Engineering in 3 major cloud providers like GCP, AWS and Azure.

Overview

15
15
years of professional experience

Work History

Senior DevOps Engineer

Staples
11.2023 - Current
  • Developed CI/CD pipelines for APIM automation, integrating Infrastructure-as-Code (IaC) tools such as Terraform and Kubernetes Operators like CorssPlane
  • Designed and implemented Azure API Management (APIM) solutions to streamline API governance, security, and lifecycle management.
  • Deployed and managed self-hosted Azure APIM gateways on Google Kubernetes Engine (GKE) to facilitate hybrid cloud API traffic management.
  • Designed and deployed the Azure API Management (APIM) Developer Portal from scratch, enabling seamless API discovery, onboarding, and self-service capabilities for developers.
  • Implemented Managed Identities for authentication with backend provider APIs, securing access to Azure services without managing credentials.
  • Configured Azure Service Principals to enable secure authentication for external consumers interacting with Azure resources and APIs.
  • Optimized API security and performance by implementing rate limiting, JWT validation, OAuth, role-based access control (RBAC), and caching strategies in APIM.
  • Designed and implemented centralized logging and monitoring solutions using Splunk for real-time observability and incident response.
  • Integrated Azure Event Hub with Splunk, enabling efficient log ingestion and analysis of application, API, and infrastructure logs.

Senior DevOps Engineer

Lula
11.2022 - 11.2023
  • Managed and maintained Kubernetes clusters on, Kubernetes Engine (GKE), ensuring high availability, scalability, and security
  • Implemented CI/CD pipeline frameworks such as shared-workflows using GitHub Actions to automate builds, tests and deployments for microservices applications hosted on Kubernetes in GCP
  • Developed and implemented a release strategy with integrated code quality checks, security scans, and gated release flows
  • Experience working in Docker and Containerization
  • Automated infrastructure provisioning using Terraform and Deployment Manager, ensuring consistency and reliability across multi-region deployments
  • Implemented and managed secrets and sensitive data using HashiCorp Vault, enhancing overall security measures
  • Administered and optimized PostgreSQL databases for performance and scalability
  • Implemented and maintained Firebase services and Cloud Functions for serverless applications
  • Configured networking components including Cloud DNS, VPCs, and VPNs for optimal performance and security
  • Experience with any static analysis tools like SonarQube, Snyk
  • Utilized EventStore database for event sourcing and stream processing, ensuring efficient data management
  • Collaborated with development teams to integrate GitHub workflows for version control, code reviews, and collaboration, ensuring high-quality code delivery and efficient development cycles
  • Experience building and supporting a gRPC microservice architecture
  • Experience working with Cloud-based environments, and interacting with Cloud services such as CloudSQL databases, Pub/Sub, BigQuery, Firebase
  • Collaborated with cross-functional teams to improve deployment processes, reducing manual interventions and enhancing automation using DevOps best practices
  • Conducted regular security audits, enforced access controls with IAM, and implemented encryption solutions using KMS
  • Assisted in disaster recovery planning, including data backups, off-site storage, and failover strategies
  • Assisted in setting up VPCs, load balancers, and VPNs for secure and efficient networking in GCP
  • Collaborated with development teams to optimize applications for GCP, leveraging services like Cloud SQL, Datastore, and BigQuery for improved performance and cost-efficiency
  • Worked in building a Disaster recovery Plan from the infrastructure side
  • Designed and maintained monitoring and logging systems using Stackdriver, ensuring high availability and performance of applications
  • Operational Experience in real-time, streaming and data pipelines relevant frameworks i.e Kafka and Pubsub
  • Experience with performance Tuning of Database Schemas, Databases, SQL, ETL Jobs, and related scripts
  • Utilized Honeycomb for advanced tracing and debugging, significantly reducing incident response time and improving overall system reliability
  • Configured and customized Datadog to monitor application performance, system metrics, and logs

Senior Site Reliability Engineer

Fabric Inc
07.2021 - 11.2022
  • Orchestrated the deployment and maintenance of containerized applications on Amazon ECS, optimizing resource utilization and improving overall system performance
  • Utilized Amazon ECR for secure and efficient storage of Docker images, implementing versioning with tags and access controls to meet security and compliance requirements
  • Architected and deployed scalable and fault-tolerant solutions on AWS, incorporating services such as EC2, S3, RDS, and Lambda, API Gateway
  • Worked on creating and maintaining various components in AWS like Cloudfront, Lambda@edge, WAF
  • Implemented Terraform for automating the provisioning and management of AWS infrastructure
  • Implemented various redirects for various customer by deploying lambda@edge in the cloudfront
  • Managed and monitored multiple e-com sites deployed in AWS
  • Enabled detailed metrics/logging, throttle value etc for various api's deployed under an api gateway in many aws account using automated script
  • Worked in monitoring tools like Datadog, Sedai, Epsagon, PagerDuty
  • Worked in integrating all application logs in AWS cloudwatch to Datadog and monitored each of the applications by creating dashboard and alerts
  • Performed Code Reviews with the Dev teams and helped maintaining the coding standards to enable proper logging information in datadog for effective monitoring setup
  • Captured bugs & issues in existing system that could lead to security compliance and got them addressed by the Development team
  • Worked on Database like MongoDB, Couchbase and redis for caching
  • Created and helped Customer to ramp their site from previous infrastructure to the new fabric infra by supporting all the site migration, suggesting DNS changes, certificate, sitemap, etc
  • Have implemented caching of static assets at the cloudfront to improve the site performance
  • Closely worked in the integration of site with various third party api like tax, address validation, login, payment etc and monitored the uptime of these api's
  • Worked with development team to implement Real User Monitoring using datadog and full story for various customer
  • Worked on analyzing the LCP, FID, CLS values for various customer site and then making suggestion to improve the site performance metrics
  • Helped and monitored detailed metrics on Google analytics to capture the conversion ratio, bounce ratio etc
  • Ensured 100% site up time during peak (holiday season) and 99% uptime through the rest of the year

Site Reliability Engineer

Staples Inc
04.2015 - 07.2021
  • Implemented and managed Azure infrastructure for critical applications, optimizing resource utilization and improving scalability
  • Implemented Infrastructure as Code (IaC) using Azure Resource Manager (ARM) templates and Terraform, automating resource provisioning
  • Conducted regular chaos engineering exercises to identify and address potential points of failure in the microservice world
  • Managed and optimized Azure virtual machines and services, ensuring high availability and cost efficiency
  • Designed and implemented API Gateway solutions on Azure, enhancing the scalability and security of microservices architectures
  • Integrated API Gateway policies for authentication, authorization, and rate limiting to ensure secure and controlled API access
  • Implemented Azure Load Balancer and Azure Traffic Manager for distributing incoming network traffic and improving application availability
  • Implemented and managed Azure Blob Storage solutions, snapshot storage and optimizing storage performance and ensuring high availability for critical data
  • Led the migration of a critical application from on-premises data centers to Microsoft Azure, ensuring minimal downtime and seamless transition
  • Developed Jenkins pipelines to automate the build, test, and deployment processes, resulting in a 60% reduction in manual intervention and faster time-to-market
  • Orchestrated containerized applications using Docker and Kubernetes for scalable and efficient deployment
  • Collaborated with cross-functional teams to troubleshoot and resolve networking issues, ensuring seamless connectivity
  • Worked with business users to resolve high priority issues, develop root cause analysis and worked with the respective teams for implementation of the fixes
  • Automated daily tasks including akamai cache clear, db import/export using Shell scripting
  • Worked on the Migration of legacy Webshphere Application Service to microservices architecture
  • Worked on migrating apache IBM web server to Open source Nginx instance
  • Implemented Siteminder as the authentication module in Nginx
  • Worked on permanent fixes for high priority production issues which had impacted users from shopping on the site, most predominantly cart & checkout issues
  • Worked with performance team on performing performance test for upcoming release and triaging any performance issues
  • Built disaster recovery data center for production to migrate the traffic to DR instance in case of any disasters
  • Worked in monitoring tools like Splunk, Newrelic, pagerduty, soasta etc and created dashboards/alerts for the newly enabled micro services for effective monitoring
  • Improved site performance by identifying improvement areas and driving the process for continuous improvement
  • Work with application stakeholders and define non-functional requirements covering performance, scalability, availability, resilience and reliability including service level objectives, service level indicators and error budgets
  • Responsible for incidents related to NFRs, updating SOPs to capture right set of metrics/logs for RCA, Root cause analysis of the incidents, Solutions identification and analyze production utilization and incidents patterns, identify improvement areas and implement automation to improve productivity, avoid manual tasks and recurring incidents
  • Maintenance and enhancements of Content Management Systems to push new content from CMS
  • Involved in App support post production support for Go Live and Enhancements

Middleware Admin

The Home Depot Inc
02.2014 - 04.2015
  • Administered and maintained middleware infrastructure, including Active MQ, WebSphere Message Broker (WMB), and Service Bus
  • Installed, configured, and optimized Active MQ to ensure reliable and efficient message queuing for various applications
  • Managed WebSphere Message Broker, handling the design, deployment, and troubleshooting of message flows to facilitate seamless integration between diverse systems
  • Conducted MQ installations, upgrades, and patching, ensuring the stability and security of the messaging environment
  • Setting up MQ in all DEV and PRODUCTION environments
  • Creation and Management of MQ objects such as Queues, Channels, Listeners
  • Configuration of queue manager, its Objects, and OAM Security
  • Administration of MQ Objects Queue Manager, Queues, Channels etc

SCM Developer

The Home Depot Inc
02.2010 - 02.2014
  • Gathered client specific requirements based on business requirement documents
  • Prepared High level Design Document (HLD) which includes the design changes necessary as per the requirements
  • Prepared estimation based on the analysis of the BRD (Business Requirement Document)
  • Developed the component design based on the requirements and HLD
  • Hands on experience in writing complex business logic in mainframe and CICS
  • Developed the code and maintaining them as per the standards specified in the technical documents
  • Adhered to the quality standards and processes for code development
  • Analyze the existing Cobol code and fixing the bugs
  • Created new piece of code for the new requirement to the SCM flow as per client requirement
  • Created documentation required for resolved issues & bugs
  • Proficient knowledge on Cobol, VSAM, JCL and CICS

Education

Bachelor of Technology -

Veer Surendra Sai University of Technology
01.2009

Skills

  • GCP
  • AWS
  • Azure
  • Microservice
  • Docker Container/GCE
  • Kubernetes/GKE/AKS
  • IaC using Terraform, Ansible, Helm
  • Service Operator(Flux, ASO, Crossplane)
  • Cloud DNS
  • Cloudrun
  • Cloudbuild
  • ECS with Fargate
  • ECR
  • Lambda
  • Cloudfront
  • WAF
  • API Gateway/Azure API Management
  • Cloud SQL(Postgress,MySQL, SQL Server)
  • Firestore
  • Bigtable
  • PubSub
  • CloudSchedular
  • ArtifactRegistry
  • Azure VM
  • Traffic Manager
  • BLOB Storage
  • Hashicorp Vault
  • Automation Tools
  • Puppet
  • CI/CD (GitOps, Jenkins, Github, Bitbucket)
  • ArgoCD
  • Python/Nodejs
  • YAML/JSON
  • Datadog/Newreclic/Honeycomb/Splunk/Sedai
  • Epsagon
  • NoSQL (Mongo,Couchbase, Redis)
  • Snyk/SonarQube/Terrascan
  • Event Driven Database
  • DNS/HTTP/HTTP/2
  • TLS/SSL/FTP/SFTP
  • Visual Studio/SourceTree/Jetbrain
  • Linux/Ubantu/IBM AIX
  • Nginx R20
  • IBM-WebSphere
  • Tomcat
  • PgAdmin/PgBouncer/DbBeaver

Accomplishments

  • Advance Terraform
  • Advanced Kubernetes: 1 Core Concepts
  • AZ-300 Microsoft Azure Architect Technologies
  • Google Cloud Platform Essential Training for Administrators
  • Azure for Developers: API Management
  • GitOps Foundations
  • Azure: Understanding the Big Picture
  • IBM DB2 Certified
  • Teradata Basic Certified
  • Working on Getting AWS Solution Architect certification
  • Working on Getting GCP certification

Timeline

Senior DevOps Engineer

Staples
11.2023 - Current

Senior DevOps Engineer

Lula
11.2022 - 11.2023

Senior Site Reliability Engineer

Fabric Inc
07.2021 - 11.2022

Site Reliability Engineer

Staples Inc
04.2015 - 07.2021

Middleware Admin

The Home Depot Inc
02.2014 - 04.2015

SCM Developer

The Home Depot Inc
02.2010 - 02.2014

Bachelor of Technology -

Veer Surendra Sai University of Technology
Avinash Gupta