I am seeking a Senior or Principal DevOps engineer / Site Reliability Engineer position in a reputed organization for challenges that would utilize my multi-platform experience as well as enrich my knowledge and skills.
Overview
12
12
years of professional experience
Work History
Principal Software Engineer
Palo Alto Networks
12.2023 - Current
Design, develop and implement highly scalable software features and infrastructure on our next-generation security platform ready for cloud native deployment from inception to completion
Work with different development and quality assurance groups to achieve the best quality - You accomplish this by being hands-on, creating tools, processes, and systems that produce transparency, alignment, and direction
Profile, optimize and tune systems software (management/control/dataplane) for efficient cloud operation
Site Reliability Engineer Manager
Apple
03.2023 - 11.2023
Act as the Service Owner, designing and mapping key performance indicators to achieve the organization’s mission
Lead the definition of requirements, priorities and planning of engineering deliverables
Implement structured engineering and operations processes
Lead the team in daily agile SRE practices, ensuring proper team focus on priorities, achievements, and deliverables
Optimize velocity and efficiency of delivery, and drive continuous improvement
Site Reliability Engineer
Apple
03.2022 - 02.2023
Worked on creating terraform provider for getting infrastructure details from inventory to update Netscaler LB.
Worked on kubernetes operator for managing CRUD operations on GSLB and DNS.
Setting up alerts and dashboards for applications.
Automation using golang for failing over traffic from GSLB.
Onboarding new apps on Kubernetes.
Migrating existing microsevices from bare metal to kubernetes.
Onboarding PCI and non PCI applications.
Managing API gateway (nginx) configs for applications.
On call duties.
Senior Site Reliability Engineer
Palo Alto Networks
01.2019 - 03.2022
Worked on migrating microservices to kubernetes cluster.
Automated aws infrastructure, eks and gke cluster using terraform.
Architecture design replacing ELB with http api gateway and vpc link between istio ingress lb and api gateway.
Created kubernetes operator using kubebuilder.
Istio mesh setup using operator, adding virtual service, destination rules and ingress gateway.
Automated bringing up entire kubernetes infrastructure and application services using terraform.
Worked on integrating with GitLab which supports terraform IaC CI/CD.
Worked on setting up cortex, loki and tempo for observability stack.
Setting up gitlab CI/CD for kubernetes deployment using terraform.
Contributed to cortex open source project.
Worked on creating consul golang modules.
Major consul, vault and mongoDB upgrade.
Automated using golang and python for aggregating application metrics (Application which are not using prometheus endpoints) Grafana cloud agent and vmagent for scraping metrics from prometheus endpoints.
Implemented Kubernetes event driven autoscaling for kafka consumers.
Upgrading ubuntu from to 14.04 to 18.04.
Troubleshooting application issues.
Setting up grafana with mysql as backend storage to have exact same configs, datasource and dashboards in staging and production environment.
Helped in hiring and team building.
Worked on setting up strimzi kafka on kubernetes cluster.
Locust cluster for testing kafka load.
Mentoring new hires.
Site Reliability Engineer
Cisco
06.2018 - 12.2018
Worked on big data analysis on client events.
Used Apache spark for batch data processing (pyspark), storing it on hdfs as parquet file.
Impala to query data from parquet file.
Used Qlik Sense for visualization.
DevOps Engineer
Netskope
06.2015 - 05.2018
Configuring, managing and troubleshooting issues.
Using Ansible to deploy new services to production machines.
Automated complete aws stack using terraform and ansible.
Planning and creating virtual machines or provisioning physical machine or aws instance (using terraform) or docker container as per the application requirements.
Deploying using Jenkins.
Configuring Load balancer (F5, nginx and Haproxy).
Building debian package or docker container using Jenkins.
Troubleshoot issues and escalating to developers to fix the bug in code.
Data Center management.
Infrastructure planning and deploying.
NOC Administrator
DreamWorks Animation Bengaluru
04.2015 - 05.2015
Quickly learned new skills and applied them to daily tasks, improving efficiency and productivity.
Carried out day-day-day duties accurately and efficiently.
Platform Operations Engineer
Akamai Technologies
03.2012 - 04.2015
Member of the network team within the Network Operations Centre co-ordinating with ISPs on networking issues.
Troubleshooting day to day networking issues, like packet loss, connectivity issues telnetting into routers and switches.
Working knowledge of BGP .Good knowledge of routing protocols such as RIP, EIGRP, OSPF etc.
Automated manual tasks using Perl and bash scripts.
Also part of core group within the NOC for automating procedure guidelines for issue handling.
Handled server software releases and maintenances.
Good knowledge in perforce.
Have undergone training at CCNA level for networking.
Handled multiple high priority incidents while rallying resources during the same.
Worked with other stakeholders and pursued root cause determination.
Education
M. Tech - Computer And Information Sciences
Birla Institute of Technology
Pilani
01.2018
B. Tech - Telecommunication
AMC Engineering College
Bengaluru, India
06.2011
Skills
Container Orchestration (Kubernetes)
Cloud Computing ( AWS and GCP)
IaC (Terraform)
DR Architecture for High Availably and Reliability of services
Configuration Management (Ansible and Puppet)
Programming Language ( Golang, Python and bash script)
My objective was to create a general active monitoring system where performance could be measured in “Real time” . Based on the data, traffic can be re routed to different servers. Netskope is a CASB(Cloud security) company where customers data will be monitored, So all the traffic needs to be sent to Netskope proxy to analyze the customer traffic on real time so performance needs to be monitored all the time to have a better response back to the customer. Passive monitoring is a proxy server, all the client traffic should be directed to proxy server and based on the current cpu,memory,iostat, number of connections etc., performance of the traffic is analyzed by capturing the response time at each hops i.e., client → proxy → server. If there is a performance degradation, then traffic should be diverted to some other available proxy to get a better performance. By data analysis, one can know the threshold for the better performance in terms of cpu,memory,io,network stats etc. If sufficient proxies are unavailable then this system should be able to create either cloud(aws) instance or container proxy.
Head of Human Resources at Protect AI, Inc. (Acquired by Palo Alto Networks)Head of Human Resources at Protect AI, Inc. (Acquired by Palo Alto Networks)