Summary
Overview
Work History
Education
Skills
Certification
Timeline
Hi, I’m

Kalyan Ambati

Dallas,TX
Kalyan Ambati

Summary

Dynamic Platform Engineer with over 9 years of experience in DevOps, cloud infrastructure, and NVIDIA H100 GPU orchestration across AWS, Azure, and GCP environments. Proven track record of enhancing operational efficiency through advanced automation and Infrastructure as Code practices, achieving a 70% reduction in deployment times and maintaining 99.9% system uptime. Expertise in deploying high-performance AI/ML workloads on OpenShift platforms, along with a strong foundation in Kubernetes, Docker, and multi-cloud management. Recognized for driving significant improvements in security compliance and incident detection time while fostering collaborative team dynamics to adapt to evolving project needs.

Overview

9
years of professional experience
3
Certification

Work History

Verizon

Platform Engineer
03.2025 - Current

Job overview

  • Implemented NVIDIA H100 GPU infrastructure leveraged Hopper architecture with 4th-generation Tensor Cores, achieved up to 9x higher AI trained performance and 30x faster inference workloads compared to previous generation GPUs.
  • Orchestrated GPU resource allocation and scheduled used OpenShift's enhanced Kubernetes capabilities to efficiently distribute GPU-accelerated workloads across multi-node cluster.
  • Administered multi-cluster Kubernetes environments across development, staged, and production with centralized management, policy enforcement, and disaster recovery procedures.
  • Managed cluster lifecycle operations included version upgrades, node maintenance, certificate rotation, and backup/restore procedures with zero-downtime strategies.
  • Development of KubeKit, an application to install and configure Kubernetes clusters on multiple platforms and clouds. KubeKit is used by Teradata to install Teradata Vantage on more than 50 Production Kubernetes clusters with 6-8 nodes most of them (max 24 nodes) and 185-230 Pods each for Teradata applications and Kubernetes plugins/services, all these on AWS EKS, Azure AKS, vSphere and Baremetal IFX.
  • Implemented security scanned and compliance used tools like Falco, OPA Gatekeeper, and Admission Controllers to enforce security policies and detect runtime threats.
  • Development of a Terraformer (aka Terranova) and integration with KubeKit, a Go package that uses Terraform Go code to allow KubeKit to provision multiple Kubernetes clusters, nodes and resources on AWS, EKS, AKS, vSphere, OpenStack and Baremetal through Stacki used Terraform templates.
  • Designed and deployed Kubernetes clusters used kubedam for non-production workloads, with high availability and cluster auto scale enabled.
  • Setup of Kubernetes nodes with CoreOS for GPU Nvidia used cloud-config.
  • Setup of ELK (Elasticsearch, Fluentd & Kibana) on Kubernetes.
  • Deployment of ConfigMaps, Secrets and Volumes on Kubernetes.
  • Designed and implemented IaC solutions in Terraform to provision and manage AWS infrastructure, reduced environment setup time by 70%.
  • Designed sidecar proxy configurations with Envoy proxies for advanced traffic routed, circuit breaking, and retry policies, achieved 99.9% service availability during peak loads.
  • Engineered and troubleshot Kubernetes ingress controllers (NGINX, HAProxy, Traefik) to manage external traffic routed, SSL termination, and path-based routed for 100+ microservices apps.
  • Implemented advanced ingress configurations with host-based and path-based routed rules, custom annotations, and rate limited policies to optimize traffic distribution and prevent DDoS attacks.
  • Diagnosed and resolved complex pod issues included CrashLoopBackOff, ImagePullBackOff, InitContainerError states, reduced mean time to resolution (MTTR) by 45%.
  • Implemented and managed Istio service mesh within complex microservices architectures, reduced service-to-service communication latency by 40% and improved overall system stability.
  • Led CI/CD pipeline development used GitLab CI and Argo CD, enabled fully automated build, test, and deployment workflows for microservices on Kubernetes clusters.
  • Automated large-scale Kubernetes deployments with custom Helm charts and operators, enhanced system scalability and reduced manual deployment tasks by 60%.
  • Designed and implemented Kubernetes-native CI/CD pipelines used OpenShift Pipelines (Tekton), integrated with Jenkins and GitOps workflows to accelerate software release cycles by 30%.
  • Configured dynamic volume provisioned with storage classes and persistent volume claims, enabled automatic storage allocation and management for containerized applications.
  • Implemented OpenShift security best practices included Security Context Constraints (SCCs), Pod Security Standards, and RBAC policies, resulted in 20% improvement in system security and compliance.
  • Automated project creation workflows used OpenShift templates and operators, streamlined application onboarded and reduced provisioned time from days to hours.
  • Deployed and maintained production-grade OpenShift 4.12+ clusters used both Installer Provisioned Infrastructure (IPI) and User Provisioned Infrastructure (UPI) methodologies across AWS, Azure, and on-premises environments.
  • Configured NVIDIA Multi-Instance GPU (MIG) partitioned on H100 GPUs, divided single GPUs into up to 7 independent instances with dedicated memory, compute cores, and cache for optimal resource utilization.
  • Implemented GPU monitored and observability used DCGM-based metrics collection and cluster-level resource tracked to optimize performance and troubleshoot GPU utilization issues.
  • Architected and deployed cloud-native microservices on Kubernetes, implemented SRE principles to achieve 99.99% uptime for production workloads.
  • Implemented monitored and alerted solutions (Prometheus, Grafana) to detect production issues early and proactively ensure service availability and performance.
  • Automated GPU driver deployment and configuration across OpenShift worker nodes used operator-based lifecycle management, ensured consistent CUDA enablement and container runtime integration.
  • Environment: AWS, Kubekit,OpenShift, BareMetal, Kubernetes, Docker, NVIDIA GPU Operator, Nvidia GPU’s, MIG slicing, Inference, AI/ML workloads, DCGM, Go, Prometheus, Grafana, CUDA, Argo CD, GitLab, Istio, Service mesh, Kubernetes Service, RHEL 8, FluentD, Kibana, NGINX, HAProxy, Traefik.

Nagarro

AWS/DevOps Engineer
03.2023 - 04.2025

Job overview

  • Implement AWS Lambda functions to run scripts in response to events in Amazon DynamoDB table, S3 buckets, HTTP requests used Amazon API Gateway.
  • Written Python scripts to automate AWS services which includes Web servers, ELB, Cloud Front distribution, Database, EC2, database security groups and S3 bucket.
  • Used Amazon IAM to grant fine-grained access to AWS resources to users. Also, managed roles and permissions of users to AWS account through IAM.
  • Integrate Amazon Cloud trail with Amazon EC2 instances for monitored the ec2 instance usage used AWS Instance Scheduler.
  • Architected and maintained a secure, multi-zone AWS with Ansible, Terraform & Jenkins for CI/CD. Migrated services, automated deployments, monitored databases, and executed SQL Server migration to AWS via DMS, included schema conversion to Aurora.
  • Designed and implemented microservices architectures on OpenShift, breaking down monolithic applications into smaller, decoupled services for easier management, scalability, and resilience
  • Worked on massively Scaled up ECS based application during the downtime used AWS Autoscaling groups writing custom lambda in python, node.js, Ruby which auto deploys application used AWS s3.
  • Worked on writing custom lambdas which picks up account information/resource information and can be referred as a custom resource for AWS CloudFormation.
  • Managed Microservices used Docker to quickly spin up into production environment and auto-scaled them and orchestration used Amazon EC2 container service (ECS) and deploy it to an Amazon EC2 instance used launch configuration templates
  • Used Elasticsearch, Logstash and Kibana (ELK stack) for centralized logged and analytics in the continuous delivery pipeline to store logs and metrics into S3 bucket used lambda function
  • Implement a Completely Containerized automated process to set up Jenkins servers over various AWS accounts, used Docker, CloudFormation scripts deployed in AWS ECS.
  • Implemented container-based deployments used the Docker images and registries, pulled the required images from Docker Hub, Nexus. Used Docker to avoid the environment difficulties between Developers and Production.
  • Designed and deployed a Secure Infrastructure used AWS resources - IAM, Elastic IP, Elastic Storage, Auto Scaled, VPC(NAT, Peered, VPN), EC2, EBS, APIs, ELB, Route 53, RDS, SES, SNS, SQS, OpsWorks, Red Shift, Glacier, CloudFront, KMS, S3, Elastic Map Reduce (EMR), Lambda, AWS IoT, LAMBDA (Server less), Elastic Beanstalk, ECS, EKS, Cloud Trail, API Gateway, Snow Ball.
  • Created and developed deployments, namespaces, Pods, Services, Health checks, and persistent volumes etc., for Kubernetes in YAML Language.
  • Deployed the infrastructure on AWS, utilized services such as EC2, S3, VPC, ELB, EBS, IAM, ECS, AutoScaled, RDS, Subnets, Elastic IP, Route53, CloudWatch, CloudFront, Lambda, CloudFormation, ElastiCache, CloudTrail. Created user accounts for the Dev, Tested, QA, production teams and added them into different groups and assigned respective roles to each group used AWS IAM
  • Deployed JAR and J2EE applications on Apache tomcat server used Jenkins for auto deployment and Worked with development, tested teams to create fully automated CI/CD Pipelines used AWS and Jenkins with groovy scripted, able to implement pipeline to create Jenkins jobs and Setup Continuous integration with Jenkins and worked with multiple plugins available to setup smoke developer friendly workflows and educated developers on how to commit their work and how they can make use of the CI/CD pipelines that are in place.
  • Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation-used Jenkins Pipelines along with Python and Shell scripts to automate routine jobs.
  • Used Ansible and Ansible Tower as configuration management tool, to automate repetitive tasks, quickly deploys critical applications, and proactively manages changes in the AWS environment.
  • Developed presentation layer used JSP, HTML, Node JS, XHTML, CSS and client validation used JavaScript, DOM Created EBS volumes for EC2 instances and moved snapshots in a timely manner to S3 bucket used custom Shell.
  • Environment: AWS, EC2, RDS, Elastic Load Balanced (ELB), S3, CloudWatch, ElasticSearch, ECS, EKS, Cloud Formation, Route53, Lambda, Big Query, Ansible, Terraform, Jenkins, Shell, Confluence, JIRA, OpenShift, Kafka, Jfrog, Docker, Kubernetes, Maven, GIT, GitHub, Python, Ruby, Nexus, SonarQube, Nagios, Nginx.

Nagarro

Associate Staff Engineer
11.2020 - 12.2021

Job overview

  • Implemented and maintained the used cloud watch and NewRelic monitored and alerted of production and corporate servers/storage used AWS Cloud watch.
  • Managed Cloud Services used AWS Cloud Formation & Terraform, which helped developers and businesses an east way to create a collection of, related AWS resources and provision them in an orderly and predictable fashion.
  • Designed and developed continuous integration and deployment pipeline used Git, Jenkins, Chef and Docker across geographically separated hosted zones in AWS.
  • Worked with GITHUB to store the code and integrated it to Ansible to deploy the playbook deployed micro services, included provisioned AWS environments used Ansible Playbooks
  • Implemented monitored and logged solutions used CloudWatch, ELK Stack, and Prometheus to gain insights into system performance and troubleshoot issues.
  • Deployed Kubernetes clusters on top of Amazon EC2 Instances used KOPS and Managed local deployments in Kubernetes, created local cluster and builded/maintained Docker container clusters managed by Kubernetes and deployed Kubernetes used HELM Charts
  • Developed and managed cloud VM’s with AWS EC2 CLI clients and management console. Created AWS Route53 to route traffic between different regions and alarms and notifications for EC2 instances used Cloud Watch
  • Hands-on experience with Amazon Web services (AWS) and implemented solutions used EC2, S3, and RDS in cloud formation Json templates, EBS, Elastic Load Balancer, Auto Scaled Groups, and Auto scaled Lifecycle Hooks
  • Responsible for ensured systems & Network security, maintained performance, and setup up monitored used Cloud watch, Prometheus, and those are visualized by Grafana dashboard.
  • Used Elasticsearch, Logstash and Kibana (ELK Stack) for centralized logged and analytics in the continuous delivery pipeline to store logs and metrics into S3 bucket used lambda function
  • Involved in configured EC2, S3, Elastic Load Balanced, IAM, and Security Groups in Public and Private Subnets in VPC and other services in the AWS.
  • Created S3 buckets and managed policies for S3 buckets and Utilized S3 buckets and Glacier for storage and backup on AWS builded/Maintained Docker container clusters managed by Kubernetes, Linux, Bash, GIT, Docker.
  • Experience in Implemented with System health and performance Monitored Tools like Nagios, Splunk, Cloud Watch, NewRelic, Elasticsearch, Kibana, AppDynamics.
  • Involved in Writed Docker file to build customized images for created containers and also worked on Docker container snapshots, removed images, and managed Docker volumes.
  • Created functions and assigned roles in AWS Lambda to run python scripts and AWS Lambda used java to perform event driven processed.
  • Integrated AWS DynamoDB used Lambda to store the values of the items and backup the DynamoDB streams.
  • Implemented CI/CD pipelines used OpenShift's built-in capabilities or integrated with tools like Jenkins, GitLab CI/CD, or Tekton to streamline application development, tested, and deployment workflows.
  • Administered and Engineered Jenkins Pipeline for managed weekly Build, Test and Deploy chain, SVN/GIT with Dev/Test/Prod Branch Model for weekly releases.
  • Evaluated Kubernetes for Docker container orchestration. Managed Kubernetes charts used Helm and created reproducible builds of the Kubernetes applications, template Kubernetes manifests, provide a set of configuration parameters to customize the deployment and Managed releases of Helm packages.
  • Environment: AWS, EC2, RDS, Elastic Load Balanced (ELB), S3, CloudWatch, ElasticSearch, ECS, EKS, Cloud Formation, Route53, Lambda, Big Query, Ansible, Terraform, Jenkins, Shell, Confluence, JIRA, OpenShift, Kafka, Jfrog, Docker, Kubernetes, Maven, GIT, GitHub, Python, Ruby, Nexus, SonarQube, Nagios, Nginx.

Accenture

DevOps Engineer
03.2018 - 10.2020

Job overview

  • Created AWS Cloud Formation templates to create custom sized VPC, subnets, NAT instances to ensure successful deployment of web applications and data base templates.
  • Created Python scripts to fully automate Aws Services which includes ELB, Cloud front distribution, database, EC2 and database security, S3 bucket and application configuration, this script creates stacks, seclude servers or join web servers or stacks.
  • Utilized AWS CLI to automate backups of ephemeral data-stores to S3 buckets, EBS and create nightly AMI’s for mission critical production servers as backups.
  • Designed AWS Cloud Formation templates in JSON to create custom sized VPC, subnets, NAT to deploy Web applications & database templates
  • Good knowledge and experience in used Elastic Search, Kibana, CloudWatch, Nagios, Splunk, Prometheus and Grafana for logged and monitored.
  • Provided highly durable and available data by used S3 data store, versioned, lifecycle policies, and create AMIs for mission critical production servers for backup.
  • Installed and configured various web application servers like Apache Tomcat web server, JBoss for deployed the artifacts and deployed applications on AWS by used Elastic Beanstalk.
  • Managed AWS EC2 instances utilized Auto Scaled, and Elastic Load Balanced for our QA and UAT environments as well as infrastructure servers for GIT and Chef
  • Configured Amazon S3 event notifications or Amazon SNS topics to send alerts to Slack integration channels when new files are uploaded, or specific events occur in your S3 buckets.
  • Used AWS Beanstalk for deployed and scaled web applications and services developed with Java, PHP, Node.js, Python, Ruby, and Docker on familiar servers such as Apache and IIS.
  • Used different kinds of build tools like ANT, Maven, Gradle, and MS build monitored the application used New Relic after deployed to a production environment.
  • Created and managed test environment used Docker, Kubernetes, initialized instances depended upon development team requirements
  • Proficient in used all Amazon Web Services included EC2, EBS, IAM, S3, ELB vetted best practice of Groovy scripted for various kinds of application.
  • Implemented Jenkins Master/Slave concept in Jenkins dashboard. Deployed various databases and applications used Kubernetes cluster management some of the services are reddis, nodejs app, nginx etc.
  • Setting up the Jenkins Pipeline, Bamboo to build Pipelines automatically used Groovy Script\ and on boarded and supported several mobile applications onto Jenkins CI/CD pipeline.
  • Created and owned Build and Continuous Integration environment with Ant, Maven, Visual Studio and Jenkins Pipeline. Building Docker images and pushed them to JFrog Artifactory
  • Triggered the Ansible Tower template from Jenkins, Bamboo to deploy the application into different environments (i.e. AWS, VM’s etc.).
  • Environment: SVN, GIT, Jenkins, Maven, Nexus, AWS (EC2, EBS, S3, VPC, RDS, SES, ELB, EMR, ECS, EKS, Cloud Front, Kubernetes, CloudFormation, Elastic Cache, CloudWatch, Chef, Big Query, Splunk, WebSphere, Bitbucket, Puppet, Java/J2EE, Python Scripts, Jfrog, Docker, XML, Unix (Red Hat Enterprise Linux, CentOS.

Deloitte

Build and Release Engineer
01.2015 - 02.2018

Job overview

  • Performed system administration of UNIX servers by used Operated Systems of Solaris, Managed SUN Solaris, Compaq and Linux workstations and servers.
  • Involved in design, configuration, installation, implementation, management, maintain and support for the corporate Linux servers RHEL 3, 4, 5, CENTOS 5, Ubuntu.
  • Involved in System Administration, System Builds, Server builds, Installs, Upgrades, Patches, Migration, Trouble Shooting, Security, Backup, Disaster Recovery and Performance Monitored on UNIX (Red Hat Linux) Systems
  • Build the Linux Firewall for network and implement to secure the network Performance tuned, preventive maintenance and daily backup is done used shell and python scripts.
  • Designed a customized Status Reporting tool used currently, based on the specific requirements used J2EE/Struts and WebSphere Application Server with DB2 as Database.
  • Installed and configured IBM WebSphere Application Server 5.0 and IBM HTTP Server on AIX and CentOS
  • Configured distributed file systems and administered NFS server and NFS clients and edited auto-mounted mapped as per system / user requirements
  • Experienced with Linux internals, virtual machines, and open-source tools/platforms improve system performance by Implemented with the development team to analyze, identify and resolve issues quickly
  • Remotely copied files used SFTP, FTP, SCP, Winscp and FileZilla and regularly manage backup process for server and client data
  • Environment: NFS, FTP, Linux, UNIX, CentOS, Ubuntu, FTP, Telnet, Nagios, SSH, VSphere, VMWare, Virtual Box, RPM, and YUM.

Education

Lewis University
Romeoville, IL

Master of Science from Computer Science
01.2023

University Overview

Skills

  • Operated Systems: Red Hat, CentOS, Fedora, SUSE, Ubuntu, Solaris, Debian, MacOS, Windows
  • AI/ML: NVIDIA GPU Operator, NVIDIA H100 GPUs, MIG (Multi-Instance GPU) Slicing, CUDA, DCGM, Tensor Cores
  • CI/CD Tools: Jenkins, Bamboo, GitLab CI, uDeploy, Travis CI, Octopus
  • Cloud Environment: Amazon Web Services (AWS), Azure (Microsoft), Google cloud platform (GCP)
  • Framework/Tools: Apache Struts, JUnit, Hibernate, Spring boot, Spring batch, Ant, Web Services, AJAX, JSF, JSON
  • Infrastructure as code: Terraform, CloudFormation, Ansible, Puppet, Chef, ARM, Bicep
  • AWS Services: VPC, IAM, S3, Elastic Beanstalk, CloudFront, Redshift, Lambda, Kinesis, DynamoDB, Direct Connect, Storage Gateway, EKS, DMS, SMS, SNS, and SWF
  • Scripting: SHELL Scripted, Groovy, Python, Ruby, Perl, YAML and PowerShell
  • Version Control Tools: GIT, GITHUB, GitLab, Subversion (SVN), Bitbucket, TFS and Azure DevOps Server
  • Build Tools: Maven, Selenium, Gradle, SonarQube, Nexus, Ant, JUnit
  • Containerization Tools: Docker, Kubernetes, ECS/EKS, Apache Mesos, AKS, OpenShift, Rancher, Marathon
  • Application Servers: Apache Tomcat, Nginx, Httpd, WebSphere Application Server, Kafka, JBoss, WebLogic
  • Network Protocols: DNS, DHCP, TCP/IP, Cisco Routers/Switches, WAN, LAN, FTP/TFTP, SMTP
  • Monitoring Tools: Nagios, AWS CloudWatch, Splunk, ELK, Grafana, Prometheus
  • Bug Tracking Tools: JIRA, Confluence, Service Now, Bugzilla, Red Mine
  • Serverless computing
  • Cloud infrastructure management
  • Linux system administration
  • DevOps methodologies
  • Container orchestration
  • API design and development
  • Web services
  • Continuous integration and deployment
  • Cross-platform development
  • Disaster recovery planning

Microservices architecture

Virtualization technologies

Certification

  • Cloud Fundamentals by GCP
  • Udemy Certification on OpenShift AI

Timeline

Platform Engineer
Verizon
03.2025 - Current
AWS/DevOps Engineer
Nagarro
03.2023 - 04.2025
Associate Staff Engineer
Nagarro
11.2020 - 12.2021
DevOps Engineer
Accenture
03.2018 - 10.2020
Build and Release Engineer
Deloitte
01.2015 - 02.2018
Lewis University
Master of Science from Computer Science