🚀 Senior Site Reliability Engineer passionate about building resilient, scalable cloud infrastructure
With 4+ years transforming enterprise infrastructure, I specialize in cloud-native solutions that drive business impact. Currently leading critical AWS ECS to EKS migrations at LexisNexis Risk Solutions, where I've:
âś… Architected production-ready EKS clusters across 50+ environments
âś… Reduced monitoring costs by $72K annually through strategic observability migration
âś… Built systems processing 15,000+ logs/second with zero-downtime operations
âś… Resolved critical incidents affecting 14M+ transactions in under 1 hour
My expertise spans the full DevOps ecosystem: Kubernetes orchestration, Terraform automation, GitOps workflows, and comprehensive monitoring solutions. I combine deep technical knowledge with business acumen to deliver infrastructure that scales.
🛠️ Core Technologies: AWS | Kubernetes | Terraform | Python | Docker | GitOps
🎯 Specializations: EKS Migrations | Infrastructure as Code | Observability | Incident Response
Always open to connecting with fellow engineers and discussing cloud infrastructure challenges!
• Led complete infrastructure migration from AWS ECS to EKS as sole engineer, taking full ownership of the transition process while maintaining zero-downtime operations.
• Architected production-ready Terraform modules for AWS EKS supporting multi-AZ high availability across Kubernetes versions 1.30-1.33, enabling standardized cluster provisioning with minimal configuration overhead
• Engineered modular Terraform-based solutions for deploying critical infrastructure components that includes service mesh capabilities using Linkerd and implemented GitOps workflows with ArgoCD, External Secrets Operator with HashiCorp Vault to securely manage sensitive configuration data, cost analysis through OpenCost, Kubernetes admission controllers and webhooks using Kyverno across 50+ environments spanning development, testing, and production with minimal code duplication using Helm charts
• Migrated observability stack from DataDog to Grafana Cloud, implementing Grafana Alloy deployment and implemented OpenTelemetry (OTEL) integration in application Dockerfiles to enable distributed tracing and reducing monitoring costs by $72K/year
• Integrated comprehensive monitoring solutions using Prometheus, Thanos, Grafana, and Fluent Bit processing 15,000+ logs per second with resilience to node and network outages with robust aggregation architecture for enterprise-scale observability
• Engineered storage solutions supporting multiple EBS volume types (gp3) and file systems with integrated AWS EFS CSI driver for persistent storage requirements
• Designed node auto-scaling capabilities using Cluster Autoscaler and Karpenter, optimizing compute resources across AMD64/ARM64 architectures and reducing infrastructure costs through strategic spot instance utilization
• Established automated Terraform CI/CD pipelines using GitHub Actions with OIDC authentication, S3 backend, and DynamoDB state locking for secure infrastructure changes
• Created AWS security policies and IAM roles following least privilege principles, Developed fine-grained RBAC systems with role-based access controls and group associations for EKS clusters.
• Automated multi-AZ Amazon Aurora PostgreSQL 16.8 clusters with auto-scaling reader instances, implementing enterprise-grade monitoring with CloudWatch alarms, Performance Insights, and AWS DevOps Guru, all managed via Infrastructure as Code (IaC) using Terraform
• Upgraded Elasticsearch 7 → 8 with zero data loss, migrating 44TB via Amazon Simple Storage Service (S3) buckets while maintaining high-availability with 90+ data nodes
• Configured advanced ingress solutions with NGINX Ingress Controller, supporting internal/external traffic patterns with customized security rules and SSL termination
• Implemented tiered alerting systems with SNS topics for high/medium/low severity incidents, ensuring appropriate response times (15 minutes for critical, 2 hours for medium priority issues)
• Mitigated critical data privacy incident affecting 14 million transactions within 1 hour through systematic troubleshooting methodology
• Enforced zero-trust security with Web Application Firewall (WAF) client IP whitelisting blocking malicious traffic
• Integrated Datadog with Amazon Web Services (AWS) using Infrastructure as Code, configured synthetic health checks, and reduced manual testing effort by 80%
• Actively participated in Agile ceremonies including sprint planning, daily standups, sprint retrospectives, and backlog refinement sessions, ensuring infrastructure deliverables aligned with development timelines
• Administered bare metal and OpenStack Kubernetes clusters utilizing Helm charts and automating deployment tasks through Continuous Integration/Continuous Deployment (CI/CD) pipelines, reducing manual efforts by 60%
• Implemented Kubernetes objects including Pods, StatefulSets, CronJobs, Services, and ConfigMaps to create robust, scalable application architectures
• Established comprehensive monitoring solutions with Prometheus, Elasticsearch and Grafana, configuring Remote Write functionality to export metrics to centralized monitoring system
• Developed specialized Docker containers for Prometheus metric receivers to diagnose production server load issues, enhancing system reliability
• Created custom dashboards using PromQL, enabling cross-functional teams to analyze performance metrics for data-driven optimization decisions for 80+ microservices
• Led successful database migration from MongoDB to PostgreSQL utilizing Kubernetes pre and post install hooks, ensuring data integrity throughout transition
• Implemented Kubernetes monitoring stack with Kube-State-Metrics, Node Exporter, and Alert Manager, maintaining code in Gerrit and automating through Jenkins
• Delivered technical product demonstrations to clients and stakeholders, establishing position as trusted technical expert for system architecture inquiries
• Gained expertise in Cloud Infrastructure focusing on Amazon Web Services (AWS) services, Amazon Elastic Kubernetes Service (EKS) cluster management, and Terraform module development for Infrastructure as Code (IaC) implementation
• Designed and implemented Continuous Integration/Continuous Deployment (CI/CD) pipelines integrating Terraform with GitHub Actions, automating infrastructure provisioning and container deployments to Amazon Web Services (AWS) Amazon Elastic Kubernetes Service (EKS)
• Developed hands-on experience with containerization technologies and microservices architecture in cloud-native applications and TCP/IP stack, gRPC microservices communication, SSH remote administration.
AWS Compute Services (EC2, EKS, ECS, Auto Scaling)
AWS Networking(VPC, Security Groups, NACL, Route53, Global Accelerator, Transit Gateway, Load Balancers, DNS Resolvers)
AWS Storage & Database (S3, RDS, ECR)
AWS Security & Identity (WAF, ACM, SSM Parameter Store)
Kubernetes
Terraform
Docker
CI/CD pipelines
Infrastructure as Code
Prometheus
Grafana
Datadog
AWS CloudWatch
Elasticsearch
Helm
GitHub Actions
ArgoCD
Blue-Green Deployments
Incident Management
Production Support
Loki
Thanos
Cassandra
MySQL
PostgreSQL
Jenkins
GitHub
Python
Django
Reactjs
REST APIs
Data Structures & Algorithms
On-Prem environments
Canary Deployments
SDLC
Agile Methodology
System Design
AWS Cloud Practitioner
AWS Solution Architect Associate
Hacker Rank Programming (Python, Java, SQL)
CLA - Programming Essentials in C
PCAP - Programming Essentials in Python
Kubernetes
Prometheus
Linux