Versatile and visionary technical leader specializing in high-performance computing (HPC), AI/ML acceleration, embedded systems, and distributed infrastructure across networking, cloud, and enterprise platforms. Proven expertise in architecting low-latency, high-throughput systems integrating hardware and software (C, C++, Python) across GPUs, vGPUs, CPUs, NICs, SmartNICs, and DPUs, with deployments spanning AWS, Azure, edge, and on-prem environments. Skilled in GPU/FPGA/NPU/DPU acceleration integration, parallel programming, HPC orchestration, and performance optimization for large-scale, mission-critical workloads.
Overview
17
17
years of professional experience
Work History
Principal System Engineer ( Research & Innovation )
Self Employed
06.2025 - Current
Conducting independent R&D in HPC, AI/ML acceleration, and networking, focused on distributed compute platforms, congestion control, and system scalability.
Designed and evaluated simulation clusters for AI workloads (object detection and perception algorithms), optimizing performance across GPUs, CPUs, SmartNICs, and NVSwitch interconnects.
Researched GPU–memory bottlenecks, congestion control, and NVSwitch-based scaling to address challenges in large-scale AI training and inference.
Investigating multi-cloud orchestration strategies (AWS, Azure ) and interoperability bottlenecks to benchmark portability, resilience, and cost-efficiency of HPC/AI pipelines.
Explored next-gen NIC and DPU architectures for high-throughput data movement, applying OSI stack protocols (TCP/IP, HTTPS, SSL/TLS, RDMA) to low-latency workloads.
Publishing technical insights and white papers from ongoing research, contributing thought leadership on congestion control, GPU/CPU/NIC scaling, and distributed AI architectures.
Principal Platform Architect & Engineer
Parry Labs
03.2025 - 06.2025
Led Edge computing proprietary defense platform distributed computing (classified), integrating HPC/AI workloads at the tactical edge.
Architected distributed compute and AI inference pipelines for object detection and perception, aligned with mission and ecosystem constraints.
Conducted industry and ecosystem analysis to identify opportunity theses, enablers, risks, and competitive advantages in edge AI adoption.
Partnered with internal and external stakeholders to co-develop platform strategy, informing investment recommendations and roadmap planning.
Software Development Manager (Hybrid Robot Fleet Server Platform)
Amazon
06.2021 - 09.2024
Led architecture and development of low-latency, high-throughput HPC platforms for robotic fleet systems, integrating GPU, FPGA, NPU, and DPU acceleration for AI inference across AWS cloud, edge, and embedded domains.
Architected mission-critical edge ML pipelines (C++) leveraging PCIe, RDMA (RoCE), and Kubernetes-based HPC clusters, reducing latency by 30% and increasing accuracy by 20%.
Enhanced inter-GPU communication efficiency by 30% via NCCL and Open MPI tuning for parallel AI workloads in secure environments.
Boosted throughput by 25% by optimizing PCIe memory coherence pathways for -grade data movement.
Implemented zero-copy data transfers using RoCE, minimizing CPU overhead and enabling real-time decision support.
Principal System Software Engineer (Medical Infection Detection Platform)
Bio-Rad Laboratories
01.2018 - 01.2021
Spearheaded secure AI-driven diagnostic platforms using NVIDIA GPUs (TensorFlow, CUDA, TensorRT) for real-time pathogen detection.
Optimized low-power CPU, GPU, and FPGA workflows under IEC 61508 safety standards, reducing detection latency by 30%.
Enhanced diagnostic throughput by 25% using DSP, ARM SoCs, and AI accelerators for transformer-based models.
Reduced network latency by 25% with DPDK-based kernel bypass for time-sensitive diagnostics.
Deployed mission-aligned bioinformatics pipelines on AWS, Azure, and GCP, achieving -level scalability.
Staff Software Engineer (Network and PCIe Switches Platform)
Broadcom
01.2016 - 01.2018
Designed PCIe-based embedded platforms integrating ML co-processors, SmartNICs, and accelerators for high-performance, low-latency applications.
Developed RDMA over PCIe fabric for ultra-low latency and zero-copy data movement in tactical HPC workloads.
Collaborated with industry leaders (Netflix, Google, NVIDIA) to optimize GPU-accelerated NVMe-oF workloads.
Managed BMC firmware development ensuring secure remote management capabilities for mission-critical platforms.
Developed FDA-approved IoT-based sterilization devices with ARM, FPGA, DSP, Secure Boot, TEE, ARM TrustZone, and TPM-based security, ensuring compliance with IEC 62304 and ISO 26262.
Reduced contamination detection time by 25% and accelerated sterilization cycle verification by 20% through DSP/FPGA signal optimization under safety-critical constraints.
Led cross-functional teams of engineers, biologists, and regulatory specialists to deliver prototype validation, advanced sensor calibration, and FDA clearance ahead of schedule.
Directed full device lifecycle management from embedded firmware to cloud integration, ensuring interoperability with hospital IT systems and improving detection accuracy by 15%.
Designed and optimized firmware for Hybrid Disk, NAND Flash, and HDD platforms (C/C++, Perl, ARM, QNX, DSP) to achieve -grade performance in data center and aerospace applications.
Architected NAND Flash Manager with advanced Error Recovery, Data Relocation, and Wear Leveling, extending drive lifespan and reliability under mission-critical workloads.
Implemented Dynamic Power Management, TLER, storage service optimizations, and HDD robust algorithms for media-related failure reduction, boosting manufacturing yield and operational readiness.
Developed drivers to connect storage devices with in-house tools, optimized performance during testing with IO meter and Perf tools, and led scrum meetings while championing in-circuit emulation, oscilloscopes, and logic analyzers to resolve complex issues and ensure seamless integration.
Education
BS - Electrical Engineering
California State Polytechnic University
09.2007
Certification - Artificial Intelligence
Stanford University
11.2024
Certification - Design Controls
AAMI Foundation
04.2018
Skills
HPC Architecture & Optimization & Research
AI/ML Acceleration
Low-Latency Networking (PCIe, RDMA, RoCE)
Networking (TCP/IP, UDP, Https)
Energy-Aware & Microgrid-Aware Computing
Embedded Systems (QNX, Linux, Zephyr)
Secure Boot, TPM, TEE
GPU/FPGA/NPU/DPU Integration
Parallel Programming (MPI, OpenMP, CUDA)
HPC Cluster Orchestration
Mission-Critical & Compliance
Cloud/Edge/Embedded Interoperability
Cyber Resilience & Secure Frameworks
Personal Information
Citizenship: US Citizen
Title: Principal HPC Architect – High-Performance Networking, Cloud, and AI/ML Acceleration
Nationality: US Citizen
Availability: Onsite Available
Visa Status: US Citizen
Core Technical Skills
HPC Architecture & Optimization & Research
AI/ML Acceleration
Low-Latency Networking (PCIe, RDMA, RoCE)
Networking (TCP/IP, UDP, Https)
Energy-Aware & Microgrid-Aware Computing
Embedded Systems (QNX, Linux, Zephyr)
Secure Boot, TPM, TEE
GPU/FPGA/NPU/DPU Integration
Parallel Programming (MPI, OpenMP, CUDA)
HPC Cluster Orchestration
Mission-Critical & Compliance
Cloud/Edge/Embedded Interoperability
Cyber Resilience & Secure Frameworks
Onsite Availability
True
Timeline
Principal System Engineer ( Research & Innovation )
Self Employed
06.2025 - Current
Principal Platform Architect & Engineer
Parry Labs
03.2025 - 06.2025
Software Development Manager (Hybrid Robot Fleet Server Platform)
Amazon
06.2021 - 09.2024
Principal System Software Engineer (Medical Infection Detection Platform)
Bio-Rad Laboratories
01.2018 - 01.2021
Staff Software Engineer (Network and PCIe Switches Platform)