Designed and operated Akkeris, an internal Platform-as-a-Service (PaaS) that provides developers with self-service infrastructure provisioning, application deployments, observability, and operational capabilities on AWS.
- Built and maintained a developer self-service platform that abstracted infrastructure complexity while enforcing security, compliance, monitoring, and automated failover capabilities.
- Designed reusable Terraform modules for S3, RDS, ElastiCache, Elasticsearch, networking resources, and platform services to standardize infrastructure provisioning.
- Migrated existing AWS resources into Terraform state without service disruption, establishing infrastructure-as-code governance.
- Managed production multi-cluster Kubernetes environments on Amazon EKS supporting highly available containerized applications.
- Migrated workloads from on-premises Kubernetes clusters to AWS EKS using Velero and Karpenter, improving scalability and resilience.
- Leveraged Karpenter for intelligent node provisioning and resource optimization.
- Standardized deployments using Helm charts and implemented GitOps workflows through ArgoCD to automate application synchronization and drift detection.
- Designed and maintained CI/CD pipelines using Jenkins and GitHub Actions enabling reliable and repeatable deployments.
- Implemented service-to-service traffic management and observability using Istio service mesh.
- Secured application secrets and encryption keys using AWS Secrets Manager and KMS while enforcing IAM least-privilege policies.
- Built observability platforms using Prometheus, Grafana, OpenTelemetry, Coralogix, InfluxDB, and CloudWatch for centralized metrics, logs, and distributed tracing.
- Defined SLIs, SLOs, and error budgets to establish service reliability objectives and operational excellence practices.
- Reduced downtime by approximately 70% through telemetry-driven alerting, automated recovery mechanisms, and resilient infrastructure design.
- Integrated monitoring systems with PagerDuty and participated in incident response, root cause analysis, and postmortem processes to reduce MTTR.
- Contributed to FinOps initiatives through rightsizing, autoscaling, tagging strategies, and infrastructure governance to optimize AWS spending.
- Built reusable infrastructure patterns suitable for AI-enabled applications, model-serving workloads, and emerging LLMOps requirements.
- Collaborated closely with engineering teams to improve developer experience and accelerate software delivery.
Technologies: AWS, Kubernetes, EKS, Multi-Cluster Kubernetes, Docker, Terraform, ArgoCD, GitOps, Helm, Karpenter, Velero, Istio, Jenkins, GitHub Actions, Prometheus, Grafana, OpenTelemetry, CloudWatch, Coralogix, PagerDuty, Secrets Manager, KMS, Java, TypeScript.