Environment: AWS, Terraform, Jenkins, GitLab CI, GitHub, Kubernetes, Docker, Ansible, SumoLogic, Datadog, ELK Stack, Python, Bash, CI/CD, Agile
- Reorganized Terraform codebase into modular components (EKS clusters, RDS databases, S3 buckets) by breaking down a monolithic repo into individual modules, each stored in dedicated GitHub repositories with customized pipelines, leading to improved scalability, manageability, and faster code deployment
- Implemented a phased Terraform upgrade to bring infrastructure closer to the latest standards, addressing compatibility and security improvements step-by-step while introducing modular configurations, enhancing maintainability and aligning with best practices
- Led EKS upgrade from Kubernetes 1.23 to 1.24, coordinating Terraform and Jenkins for infrastructure updates, aligning Helm charts to new API requirements, and validating internal and third-party applications with ArgoCD and integration tests to ensure security and stability
- Optimized Hazelcast deployment across six AWS regions by transitioning from Community to Enterprise edition and leveraging Terraform and Ansible for infrastructure consistency, reducing the number of instances from 80 to 12 and achieving substantial cost savings
- Optimized AWS ElastiCache Redis infrastructure to reduce costs by analyzing CloudWatch metrics for CPU and memory utilization, proposing downsizing options, and consolidating pre-production clusters
- Led a seamless cutover using Terraform and a custom Bash script for data migration, achieving substantial savings on the AWS bill while maintaining performance
- Enhanced security configuration management by automating rule updates with a GitHub-triggered pipeline, enabling on-demand adjustments and seamless security rule enforcement, improving compliance while reducing manual intervention
- Enhanced visibility into AWS Spot instance autoscaling activities by developing a Python-based Lambda function to retrieve scale-in and scale-out logs from Spotinst and forward them to SumoLogic for centralized monitoring
- Designed and built a SumoLogic dashboard to visualize scaling activity, enabling correlation with production incidents and improving scalability insights
- Developed a robust monitoring solution for GitHub Actions self-hosted runners by integrating GitHub's API with Datadog, using a Python Lambda function deployed via Terraform
- Implemented alerts in Slack and PagerDuty for downtime or unavailable runners, ensuring high CI/CD reliability and immediate on-call response to disruptions
- Built a reusable Jenkins library of over 30 Groovy modules, eliminating redundant code and enabling consistent, streamlined updates across 50+ pipelines to enhance scalability and reliability, critical in a high-paced deployment environment
- Led 'demonolithing' initiative for Java microservices by separating components into dedicated repositories, creating CI/CD pipelines per service, and enabling automated syncs via ArgoCD, resulting in faster, independent deployments and improved scalability
- Developed an automated Jenkins pipeline for DoorDash promotions, integrating AWS, Terraform, and EKS to dynamically deploy full infrastructure stacks for each campaign, enabling seamless, non-technical initiation of promotions and reducing manual effort through automated logging and monitoring with the ELK stack
- Environment: AWS, Terraform, Jenkins, GitLab CI, GitHub, Kubernetes, Docker, Ansible, SumoLogic, Datadog, ELK Stack, Python, Bash, CI/CD, Agile