Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

Naga Preethi

Des Moines

Summary

HPC and Linux Systems Engineer with 3+ years of experience supporting high-performance computing environments and enterprise Linux systems. Strong expertise in job scheduling, workload management, and cluster performance optimisation using Slurm. Hands-on experience with NVIDIA GPU platforms, parallel file systems, and automation tools. Proven ability to support production systems, collaborate with cross-functional teams, and deliver reliable infrastructure solutions.

Overview

5
5
years of professional experience

Work History

HPC Engineer

CVS Health
Des Moines, IA
11.2024 - Current
  • Designed and deployed GPU-accelerated HPC clusters supporting AI/ML and scientific workloads.
  • Administered and optimised Slurm workload manager with GPU scheduling, QoS, and fairshare configurations.
  • Installed and maintained NVIDIA drivers, CUDA Toolkit, NCCL, and GPU-aware MPI for distributed workloads.
  • Engineered high-performance storage using Lustre with NVMe tiers to optimise I/O throughput.
  • Automated cluster provisioning and configuration using Ansible and PXE boot.
  • Automated cluster provisioning using Ansible, PXE boot, reducing node deployment time by 60% and ensuring consistent system configuration.
  • Implemented InfiniBand HDR/NDR networking with RDMA and GPUDirect RDMA, significantly lowering latency and accelerating inter-node GPU communication.
  • Deployed and managed large-scale HPC clusters using Bright Cluster Manager (BCM), administering head nodes, compute nodes, GPU nodes, and storage nodes in production research environments.
  • Designed and implemented high availability (HA) head node configurations in BCM, ensuring cluster failover resilience and minimizing downtime for critical workloads.
  • Designed, built, deployed, and maintained HPC clusters, ensuring high availability and performance using InfiniBand and Lustre parallel file systems.

Freelance HPC Engineer

Tata Consultancy Services
USA
05.2023 - 12.2023
  • Supported enterprise Linux and HPC infrastructure across physical and virtual environments.
  • Automated system configuration and patching using Ansible to ensure consistency and reliability.
  • Supported GPU-enabled servers and shared storage systems for compute-intensive workloads.
  • Assisted users with HPC applications and large-scale simulation workloads.
  • Assisted in managing shared storage solutions (NFS, iSCSI, SAN/NAS), optimising performance for data-intensive applications.
  • Supported CFD applications such as ANSYS Fluent, C+, and COMSOL Multiphysics on HPC clusters, enabling large-scale simulation workloads.

Linux System Engineer

Tata Consultancy Services
India
01.2021 - 09.2022
  • Installed, configured, and maintained Linux servers (RHEL, CentOS, Ubuntu).
  • Managed users, permissions, patching, monitoring, and system hardening.
  • Configured networking, storage services, and systemd-managed services.
  • Monitored server health, including CPU, memory, disk usage, and network performance, and proactively troubleshooting performance issues.
  • Configured and managed network settings, including IP addressing, DNS, gateways, bonding, and VLANs.
  • Supported NFS and SMB file shares, ensuring reliable storage access for users and applications.
  • Managed system services using systemd, troubleshooting failed services and boot issues.

Education

M.S. - Information Technology & Management

Belhaven University

Skills

  • HPC & Cluster Technologies: Cluster architecture, GPU-accelerated clusters, cluster deployment, administration, lifecycle management, capacity planning, performance optimisation
  • GPU & Accelerator Technologies: NVIDIA GPUs (A100, H100, V100, L40), CUDA Toolkit, CUDA drivers, NCCL, multi-GPU and multi-node scaling, DCGM, nvidia-smi
  • Schedulers & Workload Management: Slurm, PBS Pro, LSF (familiarity)
  • Parallel & Distributed Computing: MPI (OpenMPI, MPICH, Intel MPI), OpenMP, CUDA-aware MPI, GPUDirect RDMA
  • Storage & Filesystems: Lustre, IBM Spectrum Scale (GPFS), BeeGFS, NFS, NVMe, SSD tiers
  • Networking & Interconnects: InfiniBand, high-speed Ethernet, network bonding, VLANs
  • Automation & DevOps: Ansible, Terraform, PXE provisioning

Websites

Timeline

HPC Engineer

CVS Health
11.2024 - Current

Freelance HPC Engineer

Tata Consultancy Services
05.2023 - 12.2023

Linux System Engineer

Tata Consultancy Services
01.2021 - 09.2022

M.S. - Information Technology & Management

Belhaven University
Naga Preethi