Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

Nagaraja Adda

Frisco,TX

Summary

Highly skilled and motivated Site Reliability Engineer (SRE) with 8 years of experience in designing, building, and maintaining highly scalable and reliable systems and expertise in developing and executing robust DevOps strategies within hybrid cloud environments. Seeking a challenging position where I can leverage my expertise in automation, monitoring,continuous Integration and Deployment, incident response, and infrastructure management to ensure the availability, performance, and efficiency of critical applications and services.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Site Reliability Engineer (SRE)

T-Mobile
01.2022 - Current
  • Managed and secured hybrid AWS and Azure infrastructure, leveraging advanced IAM, RBAC, and Azure Entra ID (formerly Azure AD) to centralize and automate identity and access management, including integration with on-premises AD and external SaaS vendors for seamless SSO and governance. Led the creation and enforcement of granular IAM policies and custom RBAC roles across cloud environments, supporting compliance and least-privilege principles for critical services like EC2, S3, RDS, Lambda, Glue, and bespoke business use cases.
  • Designed cloud networking with robust VPCs, subnets, security groups, and VPN configurations, ensuring secure, scalable interconnectivity between platforms. Implemented and maintained reliable ETL pipelines using AWS Glue for automated data transformation from S3 to RDS, optimizing business data workflows. Established cross-account IAM roles to share resources between isolated AWS environments, and architected secure app authentication via Azure application registrations, service principals, and managed identities.
  • Drove business process automation by deploying Azure Logic and Function Apps, and improved reliability and scalability of third-party integrations by efficiently troubleshooting and collaborating with key vendors. Ensured high application uptime and compliance by configuring T-Mobile Kubernetes Engine (TKE) clusters—with autoscaling, rolling updates, and robust CI/CD pipelines—deploying to dev, staging, and production, and enforcing zero-downtime releases through infrastructure as code and dynamic GitLab and Jenkins workflows.
  • Reduced system outages by quickly diagnosing and resolving REST API and IIS server issues, leveraging in-depth log analysis, Event Viewer, and real-time monitoring. Automated operational tasks across Linux and Windows using custom shell/PowerShell scripts for patching, installations, and service restarts. Architected end-to-end monitoring and incident response with Splunk and Datadog, integrating tailored alerts directly into Teams, PagerDuty, or email for maximum incident visibility. Troubleshot system-level issues across Linux and network stack components (CPU, memory, disk, NFS, DNS, TCP/IP, SSH).

Cloud/ DevOps Engineer

CapitalOne
01.2018 - 01.2022
  • DevOps Engineer with proven expertise in administering Linux and Windows servers and executing cloud-first strategies across hybrid environments. Successfully deployed and automated scalable, highly available AWS applications using CloudFormation and Terraform, provisioning core services (EC2, S3, RDS, ElastiCache, SNS, IAM) with a focus on resilience and auto-scaling. Leveraged Docker and AWS ECS to enable portable microservices architectures. Automated IBM TM1 analytics stacks on AWS EC2 and managed Alfresco for secure, high-performance document management.
  • Configured Apache Airflow to orchestrate complex workflows, and built robust CI/CD pipelines with Jenkins, Bogie (Capital One’s Jenkins framework), and Groovy-based Jenkinsfiles for end-to-end infrastructure automation. Integrated Git webhooks for seamless, automated deployment pipelines. Enhanced system observability and resilience via Splunk, ELK Stack, and Datadog dashboards, automating ingestion and monitoring through scripting and IaC. Developed browser monitors with New Relic Synthetics, tracking application health and JVM metrics.
  • Orchestrated server “rehydration” using continuous integration with Terraform, Jenkins, and CloudFormation, ensuring reliable server lifecycle management. Collaborated with L1/L2 support, providing interim fixes and rapid escalations, and delivered 24/7 support for P1 resolutions and production incidents. Recognized for driving operational excellence, rapid incident response, and delivering reliable business-critical cloud environments

Education

Master's - Global Blockchain Technology

University of Cumberlands
USA
01.2022

Master's - Computer Science

Northwestern Polytechnic University
USA
01.2017

Bachelor's - computer science and Engineering

Andhra University
Visakhapatnam, INDIA
01.2014

Skills

    Cloud Technologies: AWS, Azure

    Infrastructure as Code: Cloud Formation, Terraform

    CI/CD: Jenkins, GitHub, GitLab, Artifactory, SonarQube, Junit

    Configuration Management: Ansible

    Containerization Tools: Docker, Kubernetes

    Database: MySQL, PostgreSQL, SQL Server, Oracle DB

    Web/App Servers: Apache, IIS, Tomcat, WebSphere

    Scripting: Shell scripting, PowerShell, Python

    Monitoring/APM Tools: Splunk, ELK, Datadog, NewRelic

    Incident Management: JIRA, Service Now

Certification

AWS Certified DevOps Engineer Professional - 01/2023


https://www.credly.com/badges/7b31588e-d6a6-44aa-9eb3-a7e7af43ab06/public_url

Timeline

Site Reliability Engineer (SRE)

T-Mobile
01.2022 - Current

Cloud/ DevOps Engineer

CapitalOne
01.2018 - 01.2022

Master's - Computer Science

Northwestern Polytechnic University

Bachelor's - computer science and Engineering

Andhra University

Master's - Global Blockchain Technology

University of Cumberlands
Nagaraja Adda