About

HPC architecture for the AI era.

I help organizations transform their AI infrastructure from reactive debugging to predictive optimization, working at the intersection of high-performance computing and modern ML systems.

Background

With over a decade of experience in high-performance computing and AI infrastructure, I've helped organizations design, deploy, and optimize systems that scale from research prototypes to production workloads serving millions of requests.

My work spans the full stack of modern AI infrastructure: distributed training architectures, memory hierarchy optimization, vendor-agnostic performance engineering, and the emerging discipline of predictive performance modeling.

I founded PerformanceNexus to formalize what I've learned into systematic frameworks that organizations can apply regardless of their specific hardware, software stack, or scale.

Consulting Services

Hardware-Software Co-Design

My core practice — building and validating hardware performance models for novel CPU and GPU architectures. Roofline and CPI-stack analysis to expose the true bottleneck (compute, bandwidth, latency, or interconnect); design-space exploration across cache, memory, and execution-width tradeoffs toward Pareto-optimal configurations; and modeling of chiplet, near-memory, and AI-accelerator designs. Every model is calibrated against hardware counters and closed in a compiler-architecture co-optimization loop, so the highest-leverage decisions are grounded before silicon.

Performance Architecture Review

A workload-specific capability assessment — modeled on the dimensions that actually govern your ML/AI, CFD, molecular dynamics, weather/climate, or engineering workloads, plus a cross-cutting Energy & TCO review — delivered as a phased transformation roadmap.

Distributed Training Optimization

Multi-node training strategy design, parallelism configuration, and communication pattern optimization for large-scale models.

Vendor-Agnostic Migration

Hardware platform evaluation and migration planning across NVIDIA, AMD, Intel, and emerging accelerators.

Custom Tooling Development

Bespoke performance engineering tools tailored to your specific workload patterns and operational constraints.

Get in Touch

Let's discuss your performance engineering challenges.

Whether you're scaling a frontier model, optimizing inference costs, or rearchitecting your training pipeline, I can help.

Start a Conversation

Or start with the capability assessment