Source linked

OpenAI's Jalapeño Inference Chip Beats State-of-the-Art Perf-per-Watt

OpenAI and Broadcom's Jalapeño chip targets LLM inference with substantially better performance per watt than any current accelerator, delivered in a nine-month ASIC cycle.

openaibroadcomjalapenoinference chipllm inferencesemiconductor

Jalapeño, OpenAI's first in-house inference chip, completed design-to-tape-out in nine months — reportedly the fastest ASIC cycle ever for high-performance semiconductors. That's a pace that puts every other custom silicon effort to shame, and it's not just a speed-run: early testing shows performance per watt substantially better than any current state-of-the-art accelerator.

The Nine-Month ASIC Record

OpenAI co-developed Jalapeño with Broadcom (NASDAQ: AVGO) and Celestica, going from blank-slate design to manufacturing tape-out in just nine months. The companies used OpenAI's own models to accelerate parts of the design and optimization process — a neat self-referential trick. Engineering samples are already running ML workloads at production target frequency and power, including GPT‑5.3‑Codex‑Spark. A detailed technical report on performance is promised in the coming months.

Architecture for LLMs, Not GPUs

Jalapeño is a blank-slate design for modern LLM inference — not a repurposed AI accelerator from the pre-transformer era. The architecture explicitly reduces data movement and balances compute, memory, and networking to hit realized utilization much closer to theoretical peak. Broadcom contributed its Tomahawk networking silicon to handle large-scale production deployment. OpenAI President Greg Brockman described the chip as part of a long-term full-stack infrastructure strategy to make compute more abundant, which is executive-speak for "cheaper inference for everybody."

The Full-Stack Flywheel

OpenAI operates across the entire stack: models (GPT-5.3), products (ChatGPT, Codex, API), serving infrastructure, and now chips. Because they own the kernel optimizations, memory scheduling, and networking, each layer gets tuned for the same goal — faster, more reliable, cheaper inference. Jalapeño is designed to combine the throughput of today's leading AI accelerators with latency closer to specialized inference systems. That's the flywheel: better chips → cheaper compute → better models → better products → more revenue → more investment in next-gen chips.

Gigawatt-Scale Ambitions

Broadcom CEO Hock Tan confirmed the partnership includes a multi-generation roadmap, with deployment at gigawatt-scale data centers alongside Microsoft and other partners beginning in 2026. That's orders of magnitude beyond what any single AI chip program has attempted. If Jalapeño delivers on its early perf-per-watt claims, OpenAI will have done what no other model provider has: integrated custom silicon into its own inference pipeline at hyperscale, directly attacking the single biggest cost in serving frontier LLMs.


Source: OpenAI and Broadcom unveil LLM-optimized inference chip
Domain: openai.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.