Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic

VIGNESHWARAN LOGANATHAN

Staff Site Reliability Engineer
Bangalore,India

Summary

Seasoned Site Reliability Engineer with over 10 years of expertise in foundational internet technologies, specializes in ensuring the reliability, scalability, and performance of critical systems. Specializes in tackling intricate assignments, implementing sustainable solutions, and consistently delivering outstanding performance. Practices DevOps principles underscored by robust technical, analytical, and functional skills. Thrives in high-pressure, fast-paced environments, leveraging a strong sense of urgency to efficiently solve critical business challenges

Overview

11
11
years of professional experience
6
6
years of post-secondary education
3
3
Certifications

Work History

Staff Site Reliability Engineer

Suki AI India Private Limited
11.2023 - Current
  • Configured robust observability solutions to enhance visibility across Suki's diverse workloads using Datadog and OpenTelemetry.
  • Led DevOps and NOC teams, responsible for hiring and coaching team members.
  • Lead multiple projects on Disaster Recovery, Cost-optimization(saving $5,500/month), Availability,Observability and SRE
  • Strategically drove cost optimization initiatives for GCP workloads, ensuring efficient resource utilization.
  • Established cutting-edge CI/CD pipelines, enhancing deployment workflows for various workloads using cloudBuild, ArgoCD and Kustomize.
  • Facilitated FedRAMP interview process and provided comprehensive support for obtaining Authority to Operate
  • Pioneered the adoption of Infrastructure as Code principles, leveraging Terraform for scalable and automated deployment practices(Focused towards provisioning & alert management).
  • Architected and implemented advanced monitoring systems(APM, Synthetics, Infra & RUM) to proactively identify and resolve potential issues.
  • Configured end-to-end Akamai Web Application Firewall solutions.
  • Applied best practices in reliability engineering to enhance overall resilience of Suki's infrastructure.
  • Drove continuous improvement by implementing automation solutions to streamline operational processes.

Lead Site Reliability Engineer

Acko Technology & Services Private Limited
09.2022 - 10.2023
  • Collaborated closely with development teams to architect and implement scalable and dependable applications across all Lines of Business at Acko.
  • Significantly reduced operational silos by integrating Infrastructure as Code (IAC) practices, leveraging Terraform, and establishing stringent reliability standards.
  • Developed and maintained robust alerting and monitoring systems using an array of tools, including New Relic, Coralogix, Site 24*7, Prometheus, Grafana, PagerDuty, among others.
  • Successfully optimized AWS cost implications, delivering substantial savings of approximately $38,500 in the first quarter of 2023 through diligent resource cleanup initiatives.
  • Demonstrated proficiency in defining and implementing OKRs and KRAs for the SRE team.
  • Participated actively in on-call rotations, ensuring 24/7 availability of web applications and critical infrastructure.
  • Presented SRE Golden Metrics and usage metrics to stakeholders, fostering greater awareness of their workloads among leadership.
  • Meticulously engaged in Root Cause Analysis (RCA) documentation for all line of business incidents at ACKO. Conducted thorough incident analyses and delivered monthly reports to leadership.
  • Expertly integrated data from diverse sources, including internal monitoring systems and third-party APIs, to provide comprehensive uptime metrics.
  • Spearheaded the establishment of observability metrics, enhancing visibility into business operations, expediting issue resolution, resulting i 30% reduction in incident response times and a 20% improvement in system reliability.
  • Skillfully configured an event monitoring system using Segment and Sentry, integrating it with Google BigQuery (GBQ) for essential business indicator metrics.
  • Implemented Real User Monitoring (RUM) for frontend workloads, enabling comprehensive visibility into funnel performance, drop-offs, and gains.

Senior Site Reliability Engineer

Flipkart Internet (P) Ltd
04.2021 - 09.2022
  • Worked with cross-functional design teams to create software solutions that elevated client-side experience and significantly improved overall functionality and performance.
  • Developed and implemented performance improvement strategies around security engineering space and created plans to promote continuous improvement.
  • Subject matter expert for observability and maintaining reliability, also, to reduce live-site incidents.
  • BigBillionDay production readiness metrics for all security microservices.
  • Saved 100000 USD($) by incorporating ECC certificate to critical services of production grade services
  • Versed in complete software life cycle for certificate lifecycle management and Domain lifecycle management from preliminary needs analysis to enterprise-wide deployment and support.
  • Primary POC for DNS and SSL for Flipkart, Myntra and Cleartrip
  • Managed DNS and SSL Migration for acquired companies like Cleartrip, Scapic..etc.
  • Drove project lifespan from concept to final rollout in security services development, system deployment, testing and monitoring for Cloud and Infra technology.
  • Prepared HLD, LLD, BUYTECH and EULA documents for AppViewX (CLM & DLM) onboarding in Flipkart.
  • Wrote code and integrations to meet cross-platform user needs
  • Worked closely with other business analysts, development teams and infrastructure specialists to deliver high availability solutions for mission-critical applications.
  • Maintained consistent security to effectively implement best practices and protect Flipkart security assets.
  • Collaborated with cross-functional development team members to analyze potential system solutions based on evolving client requirements.
  • Participated in Chaos and NFRs for all Security services owned by Flipkart and planning benchmark framework.

Site Reliability Engineer & FIS Certified DevOps Trainer

FIS Global Business Solutions India (P) Ltd
11.2018 - 03.2021
  • Developed cost estimates for planned projects to aid in costing and budget planning efforts.
  • Developed SOPs and change controls pertaining to AWS project setup, maintenance and Infrastructure operations.
  • Automated oncall toils such as User Management using Python in Rundeck, Ansible playbooks for Infrastructure resource provisioning.
  • Responsible for setting-up OpenShift cluster, Kubernetes architecture and design, troubleshooting issues with platform components AWS, and developing global or multi-regional deployment models & patterns for large-scale developments and deployments on, OpenShift and Kubernetes
  • Supervised and migrated all middleware component like JBoss, Tomcat and Apache services to CI/CD framework.
  • Self-healing bots for node recovery in Jenkins, also, created Team bot (Chatbot) and trained to give basic team information using Python.
  • Atlassian products (JIRA/BB/Crucible/Wiki) migration from Active Directory to OpenLDAP using Atlassian API’s written in Python
  • DR (Disaster Recovery) automation of all public facing applications using python and boto3(AWS SDK)
  • Pivotal in having designed and implemented continuous build-test-deployment (CI/CD) system with multiple component pipelines using Jenkins
  • Designed and implemented resource management on AWS using AWS CLI. Extended high durability of available data using data storage in AWS S3 bucket, versioning S3, lifecycle policies.
  • Utilized AWS CLI to automate backups of ephemeral data-stores to S3 buckets, EBS and create AMIs for mission-critical production servers as backups and used Kubernetes for deploying and scaling web applications and services.
  • Worked with WSO2-EI for managing Restful API's and containerize same on top of OpenShift.
  • Managing OpenLDAP (Symas) for FIS-Mobile Department
  • Instrumental in implementing production ready, load balanced, highly available, fault tolerant, auto scaling Kubernetes AWS infrastructure and microservice container orchestration.
  • Extensively worked on Ansible, managing hosts file, authoring various playbooks and custom modules with Ansible.

Technical Lead

Cognizant Technology Solutions
09.2013 - 01.2017
  • Administered Build automation tools including Jenkins and maven scripts; built environments from scratch and responsible for code promotions.
  • Led all middleware component installations, configurations and deployments like JBoss, Tomcat and Apache Web Servers - automated most projects using Ansible.
  • Successfully implemented fault-tolerant infrastructure for Production environment and DR site.
  • Installed and configured: WebSphere Network Deployment Manager 8.0, on UNIX Platforms, Web Sphere base Application Server and used Update Installer to update with fix packs.
  • Automated oncall and fix pack upgrade tasks with shell scripts using silent response files.
  • Administered, configured, tuned, & troubleshot WebSphere in a clustered environment and Java Applications on a JBoss Clustered environment.
  • Accountable for tuning JVM based on performance testing /monitored.
  • Created and federated profiles to Deployment Manager using both Profile Management tool and automated process using shell scripts.
  • Configured: WebSphere resources like JDBC providers, JDBC data sources and connection pooling and tuning and monitoring it using Tivoli Performance viewer.
  • Opened & worked on PMR’s with IBM to solve various issue related to the environment
  • WebSphere plug-in for load balancing across cluster members and manually updated it for remote webservers in DMZ.
  • Involved in writing shell scripts to automate WebSphere admin tasks and application specific rsyncs / backups and other schedulers.
  • Gained experience working with user authentication and authorization using ldap and siteminder and working on single sign on and cryptography for exchanging information with third party application.

Education

Master of Science - SAAS

University Paris Saclay
Paris, France
02.2017 - 09.2018

Bachelor of Engineering - Electrical And Electronics

Anna University
Chennai, India
04.2009 - 05.2013

Skills

Reliability: High Availability, Service Level Indicators/Objectives/Agreements, Disaster Recovery, Root Cause Analysis, Chaos Engineering

Certification

Certified Kubernetes Administrator - CKAD

Timeline

Staff Site Reliability Engineer

Suki AI India Private Limited
11.2023 - Current

AWS - Certified Solutions Architect Professional

02-2023

Lead Site Reliability Engineer

Acko Technology & Services Private Limited
09.2022 - 10.2023

Certified Kubernetes Administrator - CKAD

05-2022

Senior Site Reliability Engineer

Flipkart Internet (P) Ltd
04.2021 - 09.2022

AWS - Certified Solutions Architect Associate

04-2020

Site Reliability Engineer & FIS Certified DevOps Trainer

FIS Global Business Solutions India (P) Ltd
11.2018 - 03.2021

Master of Science - SAAS

University Paris Saclay
02.2017 - 09.2018

Technical Lead

Cognizant Technology Solutions
09.2013 - 01.2017

Bachelor of Engineering - Electrical And Electronics

Anna University
04.2009 - 05.2013
VIGNESHWARAN LOGANATHANStaff Site Reliability Engineer