

Managed large-scale Kubernetes (EKS) clusters with 500+ pods, implementing HPA/VPA autoscaling and resource optimization, reducing infrastructure costs by 35% ($1.2M annually)
Designed service mesh architecture using Istio with mTLS, traffic management, and distributed tracing across production environments
Architected multi-region disaster recovery with automated failover achieving RTO 15 minutes, RPO 5 minutes, and 99.99% availability
Designed self-healing infrastructure with automated pod recovery, health checks, and circuit breakers, reducing downtime by 35%
Conducted chaos engineering experiments using Chaos Mesh and Gremlin to validate system resilience and identify failure modes
Built real-time observability stack with Grafana, Prometheus, and distributed tracing (Kafka), improving MTTR by 50%
Led capacity planning with forecasting models, proactively scaling infrastructure to handle 300% traffic growth during peak seasons
Assisted in monitoring system performance and reliability metrics.
Supported incident response efforts through troubleshooting and problem resolution.
Deployed ML-based fraud detection on Azure processing 10M+ transactions daily with 99.9% accuracy using Azure ML and AKS
Implemented real-time fraud alerting pipeline with Azure Event Hubs, Stream Analytics, and Logic Apps, reducing response time by 70%
Built anomaly detection systems for network traffic analysis, identifying and blocking fraudulent patterns at scale
Automated multi-region CI/CD pipelines using Jenkins, Kubernetes, and Terraform for zero-downtime blue-green deployments
Provisioned Azure infrastructure with VNets, NSGs, Storage lifecycle policies, and Application Gateways using IaC
Developed custom Docker images with security scanning and vulnerability assessment, optimized for Azure Container Registry
Implemented Ansible Tower (AWX) managing 200+ playbooks and 500+ hosts with HIPAA-compliant automation and audit logging
Built self-service infrastructure portal integrated with Ansible Tower API, reducing provisioning time from days to minutes
Deployed microservices to AKS using Terraform modules with Istio service mesh for mTLS encryption and traffic management
Created RBAC workflows in Ansible Tower for multi-team collaboration, reducing deployment errors by 60%
Led CI/CD pipeline implementation with Jenkins, automated testing, and security scanning, reducing deployment time by 50%
Developed automated compliance scanning and remediation playbooks maintaining 100% HIPAA compliance
Led cloud migration from on-premises to AWS, migrating 100+ applications with zero data loss using AWS Migration Hub, DMS, and SMS
Re-architected monolithic applications into microservices during migration, improving scalability and reducing costs by 40%
Implemented hybrid cloud architecture with VPN and Direct Connect for seamless connectivity during phased migration
Migrated databases to AWS RDS and Aurora with automated backups and point-in-time recovery
Managed EC2 Auto Scaling Groups with predictive scaling and ELB for high availability
Automated infrastructure provisioning with Ansible and Kubernetes operators, implemented Puppet for configuration management
Configured Apache Tomcat for Java application deployments
Managed Linux Virtual Machines and server migrations using VMware
Administered Linux environments, supporting upgrades, package management, and user account management
Microservices architecture
Infrastructure automation
Disaster recovery
Containerization technologies
System monitoring
Load balancing
Configuration management
Database administration
Network troubleshooting
Data center operations
Web server administration
Linux administration
Operations management
Cost estimation
Infrastructure design
Cloud security implementation
Cloud architecture design
Cost optimization
Kubernetes orchestration
Azure Kubernetes service
Docker and Kubernetes experience
Kubernetes administration
AWS elastic Kubernetes service
Docker and Kubernetes
Kubernetes management
Kubernetes deployment
Experienced Site Reliability Engineer and Senior IEEE Member with 11 years of experience managing large-scale Kubernetes clusters (500+ pods), automation, and system reliability. Reduced cloud costs by $1.2M annually, decreased downtime by 35%, processing 10M+ daily transactions.
Cricket
Squash
Senior Member of IEEE
Java