Summary
Overview
Work History
Education
Skills
Timeline
Generic

Srikanya P

Pflugerville,TX

Summary

With over 7 years of technical expertise in Linux, AWS, DevOps, and Kubernetes administration and support. I have honed my skills in resolving complex technical issues and implementing efficient troubleshooting processes. My ability to collaborate with cross-functional teams has allowed me to enhance product stability and drive customer success. Adept at providing innovative solutions and exceptional problem-solving abilities, contributing to revenue growth.

Overview

8
8
years of professional experience

Work History

Cloud Administrator

HireArt
10.2023 - Current
  • Build and deploy highly available environments for customers and manage them throughout their life cycle
  • Access management of multiple AWS accounts
  • Set up monitoring & logging and integrate with AWS for the application team to access them from a central place
  • Deploy infra components onto all k8s clusters
  • Create & manage OPA gatekeeper policies to help customers to enforce organization-level policies
  • Enable network policies to restrict traffic in multi-tenant environments
  • Perform k8s version upgrades for all EKS clusters
  • Enable monitoring by deploying Prometheus & alert manager
  • Documented all infrastructure configurations, deployment processes, and troubleshooting procedures to enhance team efficiency
  • Managed network setup and integration of sites running AEM, ensuring seamless communication between environments
  • Configure and deploy helm or k8s manifest-based applications onto the clusters
  • Deploy open cost to collect cost metrics for effective utilization of the resources
  • Create Jenkins pipelines to trigger application deployment
  • Set up IDP integration with AWS
  • Prepare custom helm charts based on demand
  • Review & help with the network configuration of Kubernetes clusters in multiple cloud environments
  • Set up Velero backups & restores for DR
  • Designed and deployed AEM infrastructure using Terraform and AWS CloudFormation, ensuring high availability and scalability
  • Implemented CI/CD pipelines with Jenkins and GitLab CI/CD for automated AEM application builds, testing, and deployments
  • Containerized AEM instances using Docker and orchestrated with Kubernetes for consistency across development and production environments
  • Automated AEM configurations using Ansible to ensure consistency and rapid environment provisioning
  • Suggest the k8's best practices
  • Adapt to the new features
  • Install/Configure external DNS for automatic DNS management
  • Install/Configure nginx ingress controller or AWS load balancer controller
  • Install/Configure cluster auto scaler or karpenter for node autoscaling
  • Configure HPA for pod autoscaling to support dynamic load
  • Install/Configure vault & AWS secrets manager for k8s secret management
  • Troubleshoot cluster DNS or networking issues
  • Manage/Upgrade all infrastructure add-ons like Prometheus, external-DNS & aws load balancer controller, OPA gatekeeper along with k8s upgrades

Site Reliability Engineer

Marriot
03.2022 - 10.2023
  • Create development & production environments for multiple application teams to deploy and test their code
  • Create automation process using GoCD to do the automatic deployment of their applications
  • Troubleshoot all infrastructure issue and ensure high availability of applications
  • Create customer environments with Terraform on AWS
  • Configure private/public API endpoints for the E K S cluster
  • Create IAM policies to restrict access to AWS resources for development & testing teams
  • Install Prometheus, Alert manager for observability
  • Install Elasticsearch, kibana & fluentd for log aggregation
  • Customize alert manager configuration to send alerts to required customer endpoints or tools
  • Configure monitoring operator to control resource allocation to monitoring stack components
  • Configure logging stack components based on customer use cases
  • Create knowledge base articles that help support engineers to resolve known issues faster
  • Maintain a proactive approach to customer empathy, identifying customer satisfaction concerns, and managing customer expectations
  • Create on-demand pipelines in GoCD to deploy applications on required EC2 instances or clusters
  • Developed scripts for automated content migration between AEM environments, reducing manual efforts and deployment time
  • Integrated monitoring and logging tools (Prometheus, Grafana, ELK Stack) to track AEM instance performance and health
  • Conducted security scanning within CI/CD pipelines to identify and mitigate vulnerabilities in AEM applications
  • Established automated backup and disaster recovery strategies for AEM instances to ensure data integrity
  • Being able to conduct in-depth diagnostics on EKS clusters and being able to work with the Engineering group effectively
  • Respond to and resolve critical customer issues
  • Create IAM service accounts to provide access to AWS resources for pods
  • Be responsible for efficiently managing the relationship with these customers and thoroughly documenting their cases
  • Create knowledge articles about frequent issues
  • Access management for AWS accounts, EKS clusters & EC2 instances
  • Upgrade k8s versions & OS versions

Associate Consultant

Wipro
06.2017 - 02.2022
  • Migrate on-prem servers, and databases to the cloud
  • Help customers set up their environments in the cloud
  • Provide suggestions for setting up a network for high availability and configure monitoring & alerting using CloudWatch & sns
  • OS patching on regular basis
  • Server provisioning with AWS EC2
  • Configure High Availability with Load balancing & Autoscaling
  • Provision databases using AWS RDS
  • Configure monitoring and enable alerts using AWS Cloudwatch and SNS
  • Server hardening based on AWS inspector findings
  • Configure and manage S3 buckets
  • Create life cycle policies for snapshot management
  • File system expansion
  • Server Build /Decommission
  • Worked in DR (Disaster Recovery) drill
  • Performing Change Requests
  • Production deployments in Adobe AEM Application
  • Build and Deploy code into different environments using pipeline
  • Managing Kubernetes cluster
  • Managing pods, replicasets, deployments & jobs
  • Scheduling cron jobs in kubernetes
  • Build and maintain docker images
  • Working on docker private registries
  • Provisioning new environments using GoCD pipeline tool and terraform in AWS
  • Managing Incidents/Service requests/Change management Using Service NOW
  • Tomcat web server restart based on request from application team
  • Managing AWS IAM users
  • Redhat Linux servers User and Group Management
  • File System Management and Logical Volume Management
  • Software Package Administration
  • Install and configure Cloudwatch agent to get additional metrics
  • Create and mount EFS shares as per request
  • Job Scheduling and Automating Process using CRON
  • Development and Maintenance of Shell Scripts
  • Implementation of SSH Authentication through Private/Public SSH keys
  • Managing the DNS with Route 53
  • Taking snapshots of EBS volumes
  • Set lifecycle policy for snapshot for automatic snapshot creation & deletion after retention period
  • Take AMI backup of the servers
  • Creating/Providing access to S3 bucket based on user request
  • Creating IAM custom roles & policies to restrict access to the users
  • Setting S3 bucket policies and ACL to restrict access to S3 buckets
  • Create site to site VPN connection to enable connection from datacenter to AWS(Eg: Customer datacenter to AWS resources owned by him)
  • Triggering build pipeline to generate and push artifacts into nexus
  • Create and manage users in bitbucket
  • Creating feature branches based on requests from the development team
  • Creating Jenkins jobs/pipelines based on request
  • Updating or rolling back the deployments
  • Configure monitoring on Kubernetes pods
  • Configure & schedule jobs on kubernetes

Education

Master of Science - Computer Science

University of Central Missouri
Warrensburg, MO
05-2022

Skills

    Operating System: RHEL, Ubuntu & CentOS
    Scripting: Shell scripting, Python
    Automation Tools: Ansible
    Clouds: AWS, Azure
    Ticketing Tools: Service Now, Jira, Zendesk
    Monitoring Tools: Nimsoft
    Observability: Prometheus, Alert manager
    Policy Management: OPA Gatekeeper
    DeveOps Tools: Jenkins, GoCD, Git, Docker, Bitbucket
    Containerization and Orchestration Tools: Kubernetes, Docker, Containerd

Timeline

Cloud Administrator

HireArt
10.2023 - Current

Site Reliability Engineer

Marriot
03.2022 - 10.2023

Associate Consultant

Wipro
06.2017 - 02.2022

Master of Science - Computer Science

University of Central Missouri
Srikanya P