Summary

Overview

Work History

Education

Skills

Certification

Timeline

Jagdish Korade

Jersey City

Summary

Senior Linux Systems, DevOps and Site Reliability Engineer with over 15 years of experience in Unix system administration, Cloud infrastructure(AWS), CI/CD automation, container orchestration (Kubernetes, Docker), and infrastructure-as-code(Terraform). Proven success in optimizing system performance, deploying scalable cloud-native applications, and ensuring high availability and performance of large-scale systems. Expertise in automation, incident management, and developing SRE best practices to enhance system reliability and operational efficiency for enterprise-grade systems across financial services and enterprise domains.

Overview

years of professional experience

Certification

Work History

Site Reliability Engineer - AI/ML Platform

JPMorgan Chase

Jersey City

03.2022 - Current

Collaborated with AI/ML, Data Engineering, and Platform teams to implement scalable and faulttolerant ML services using EMR, Spark, and SageMaker.
Design and maintain high-availability infrastructure patterns across AWS, Kubernetes (EKS), and internal platforms, minimizing toil and reducing operational incidents by 50%.
Lead the definition and implementation of non-functional requirements (NFRs), service level indicators (SLIs), and service level objectives (SLOs) to improve reliability and performance across core enterprise platforms.
Implemented error budgets to balance reliability and feature velocity, helping engineering teams prioritize operational work effectively.
Developed and maintained monitoring and alerting systems using Datadog, CloudWatch, Prometheus, Grafana, and Dynatrace, improving observability and reducing mean time to resolution (MTTR) by 30%.
Automated infrastructure tasks and repetitive procedures using Terraform, Ansible, and Python, reducing manual toil by over 50%.
Participated in 24/7 on-call rotations for critical services, and led blameless post-mortems to address root causes and drive systemic improvements.
Migrated data workloads to AWS using Pentaho and implemented robust security and access controls via IAM, Security Groups, and VPC configurations.
Contributed to the team's CI/CD modernization using Terraform, GitHub, and Jenkins, accelerating infrastructure provisioning and audit compliance.
Championed SRE culture by mentoring teams on reliability engineering principles and tooling adoption.
Contributed to the platform roadmap with a focus on automation, resiliency, and observability, aligning infrastructure goals with business priorities.

SRE - AWS, Kubernetes and Unix Infrastructure

Goldman Sachs

Salt Lake City

05.2019 - 03.2022

Designed and maintained large-scale observability platforms supporting metrics, logging, and distributed tracing across on-prem data centers and multi-cloud (AWS) environments, used by 2,000+ engineers globally.
Built and maintained robust, scalable AWS-based infrastructure using Terraform and Bitbucket CI/CD, enabling infrastructure-as-code practices and reducing provisioning time by 60%.
Automated deployment and scaling of containerized applications using Docker and Amazon EKS, with future readiness for Kubernetes adoption.
Designed and managed Amazon RDS environments for high availability and performance, incorporating backup automation and monitoring via CloudWatch.
Championed site reliability engineering practices-including SLIs/SLOs, error budgets, blameless postmortems, and toil reduction-across UAT and production environments.
Provided end-to-end support and lifecycle management for over 2,000+ low-latency Linux servers (RHEL) across global colocation sites and core data centers.
Performed advanced Linux kernel tuning and BIOS optimization to minimize jitter and latency on HP ProLiant DL380/580 (Gen 8-11) and Synergy hardware, aligned to real-time trading SLAs.
Tuned NICs, CPU affinity, IRQ balance, TCP/IP stacks to boost throughput and deterministic performance.
Automated infrastructure tasks with Ansible and shell scripts, including OS patching, compliance reporting, and performance diagnostics-reducing manual efforts and drift.
Delivered secure and resilient systems via clustering (VCS) and SAN-based storage replication, supporting high availability trading environments.
Administered VMware, integrated with ServiceNow for Incident, Problem, and Change Management processes; ensured uptime and rapid recovery across business-critical systems.
Collaborated with developers, network, and storage teams to test and deploy latency-sensitive applications across highly interconnected financial systems.
Built custom hardened Linux OS images with enforced guardrails for Docker,and Kubernetes deployments, integrated into CI/CD pipelines.
Integrated Linux authentication systems with Active Directory and LDAP using tools such as SSSD and Centrify, enforcing secure SSO, sudoers policies, and role-based access.
Led enterprise-wide Linux vulnerability management using Foreman and custom patch automation, reducing critical unpatched exposure by 80% within six months.

Sr. Linux / DevOps Engineer

Informatica

Red Wood City

07.2015 - 04.2019

Worked on Administration, maintenance, and support of Linux (Red Hat 5.x/6.x/7 and CentOS) Servers.
Worked on installation and configuration of DevOps Automation Tool Chef, Terraform.
Working on several Docker components like Docker Engine, Hub, Machine, Compose and Docker Registry.
Manage AWS EC2 instance using Elastic load balancers (ELB) and Auto Scaling groups.
Applied patches every quarter regularly to meet audit requirements using Oracle Ops Center, Red Hat Satellite server, Up2Date, YUM, RPM tools. Experience on Linux Security framework.
Involved in Storage and Data Center migration activity.
Co-ordinate with vendors (Red Hat) for OS related issues.
Co-ordinate with vendors (DELL, IBM) for H/W troubleshoots and replacement.
Working on Incident and Change management support to resolve, Implement the tasks Changes.

Sr. Linux System Administrator

Infosys

Pleasanton

02.2012 - 06.2015

Linux System Administrator

Capgemini

Mumbai

10.2010 - 01.2012

Education

Bachelor - Information Technology

Pune University

Pune, India

Skills

Operating System: RedHat Linux (RHEL), UNIX

Database: SQL (SQL Developer)

Infrastructure as Code: Terraform

CI / CD: Jenkins, Git, Bitbucket

Containerization and Orchestration: Docker, Kubernetes

Cloud Computing: AWS

Observability and Monitoring: Grafana, Dynatrace, DataDog, CloudWatch

SRE Practices: SLO, SLI, Error Budgets, Incident Response, Root Cause Analysis

Certification

AWS Solutions Architect Associate
Red Hat Certified Engineer (RHCE)

Timeline

Site Reliability Engineer - AI/ML Platform

JPMorgan Chase

03.2022 - Current

SRE - AWS, Kubernetes and Unix Infrastructure

Goldman Sachs

05.2019 - 03.2022

Sr. Linux / DevOps Engineer

Informatica

07.2015 - 04.2019

Sr. Linux System Administrator

Infosys

02.2012 - 06.2015

Linux System Administrator

Capgemini

10.2010 - 01.2012

Bachelor - Information Technology

Pune University