Self-motivated, reliable team member with over 8 years of experience streamlining deployment processes, enhancing system reliability, and implementing cloud solutions across various industry sectors. Aspiring senior-level cloud engineer keen to apply knowledge of cloud infrastructure, observability, and cloud-based development into practice.
Overview
11
11
years of professional experience
Work History
Site Reliability Engineer
EQUIFAX
2022.11 - Current
Automated the release process and GKE cluster upgrades to the prod environment without any downtime using node affinity in Kubernetes
Designed and deployed Microservices using JavaScript, Docker, and REST API for backend systems that serve to communicate with the customer to generate correspondences for customer components running on GKE clusters
Written Shell scripts using Bash, Python, Groovy, YAML, and PowerShell for automating tasks
Created Docker images for application components, optimizing image size and dependencies for efficient deployment and resource utilization Maintained platform-wide observability process focused on identifying symptoms of performance issues and patterns for resolution
Created interactive dashboards for monitoring key performance metrics, system health, and application behavior utilizing Splunk visualization features and advanced search capabilities
Worked on Refinitiv Eikon and Refinitiv internal tool Workspace as a level 2 application support engineer for a backend product Refinitiv Elektron
Worked on Amazon AWS Cloud Services (EC2, S3, ELB, CloudWatch, Elastic IP, RDS, SNS, SQS, Glacier, IAM, VPC, CloudFormation, Route53) and managed security services on AWS
Provided frontline support for Java-based applications, diagnosing and resolving technical issues reported by clients and internal stakeholders
Participated in on-call rotation scheduling and contributed to refining escalation procedures and best practices for efficient incident management
Established post-incident review (PIR) processes and dashboards in Splunk to capture lessons learned, track remediation actions, and prevent the recurrence of incidents
Developed custom Splunk dashboards to visualize performance data and track key performance indicators, facilitating data-driven decision-making and continuous improvement efforts
Created and deployed Docker containers to break up monolithic apps into microservices, enhancing development workflow, scalability, and optimizing duration
Planned and implemented Disaster Recovery solutions, capacity planning, data archiving, backup/recovery strategies, Performance Analysis, and optimization
Set up non-prod regions on Azure cloud-based, using Azure DevOps, PowerShell, Terraform, and Ansible
Automated the deployment process by writing Shell (bash), and Python scripts in Jenkins
Created alerts and monitoring dashboards using Prometheus and Grafana for all microservices
24/7 monitoring of Azure Resources using Azure Monitor and web apps for application insights and handling escalated support tickets
Provided 24x7 on-call support in debugging and fixing issues related to Linux, Solaris, HP-U Installation/Maintenance of Hardware/Software in Production, Development & Test Environment as an integral part of the Unix/Linux (RHEL/SUSE/SOLARIS/HP-UX/AIX) Support team
Resolved TCP/IP network access problems for the clients
Developed, maintained, and updated various scripts for services (start, stop, restart, recycle, Cron jobs) UNIX-based shell
OS upgrades and installation of third-party software, packages, and patches as per requirement
Maintained Linux Firewall for network and implemented network security