Source linked

OpenAI's Jalapeño Chip Cuts Inference Costs After $20.9B Loss

venturebeat.com@eager_lynx3 hours ago·Artificial Intelligence·6 comments

OpenAI and Broadcom built a custom LLM inference ASIC in nine months, using OpenAI's own models for chip design, as the company tries to fix unit economics bleeding $20.92 billion in 2025.

openaibroadcomjalapenocustom siliconai inferencelarge language models

OpenAI burned through $20.92 billion in 2025 on $13.07 billion in revenue, with R&D eating 56% of spend - $19.18 billion - mostly on compute infrastructure. That's the financial crater Jalapeño was built to fill.

Nine Months from Schematics to Silicon Tapeout

Jalapeño is a purpose-built ASIC for LLM inference, co-developed with Broadcom in just nine months. Typical new processor cycles run years; OpenAI used its own models to accelerate the chip design itself - a software-hardware feedback loop that is as novel as the silicon. Broadcom contributed Tomahawk networking silicon and core implementation, with Celestica handling board and system integration. OpenAI claims the architecture minimizes unnecessary data movement and better matches compute, memory, and networking for modern LLM serving, starting clean rather than adapting a GPU.

Initial testing runs GPT-5.3-Codex-Spark on the chip in a production workload (test environment, caveat inserted). OpenAI plans to begin rolling out Jalapeño across active data centers by end of 2026. Notably, Broadcom's release positions Jalapeño as potentially available to external AI firms - "built from the ground up for current and future LLMs across the industry."

Why OpenAI Needed Its Own Inference Hardware

Every query on ChatGPT, Codex, or the API currently runs on Nvidia GPUs or other vendors' silicon. With a $10.59 billion payment to Microsoft alone for R&D and compute infrastructure in 2025, OpenAI has every incentive to bring inference costs under control. Jalapeño is an ASIC - narrower than a GPU, but cheaper and more efficient per token for the specific transformer workloads OpenAI runs. If it hits its practical performance ceiling on real workloads, the unit economics shift from bleeding cash toward sustainable margins.

Greg Brockman, OpenAI's president and co-founder, put it directly: "By designing more of the stack ourselves, we can serve more intelligence with greater efficiency."

A Web of Vendors, One Custom Chip

OpenAI isn't replacing Nvidia overnight. In February 2026, Nvidia finalized a $30 billion direct investment, including a deal for 10 gigawatts of computing - 3 GW dedicated inference, 2 GW training - on Nvidia's Vera Rubin platform. Amazon dropped $50 billion in that same round, committing OpenAI to consume ~2 GW of AWS Trainium capacity over eight years. AMD's Instinct MI450 series and Cerebras also have supply agreements. Jalapeño lives alongside all of them, targeting the specific inference workloads where custom silicon wins on cost and latency.

For Broadcom, the partnership is a reputational jackpot: shares are up 18% year-over-year in early 2026 and nearly 7x since end of 2022. For OpenAI, Jalapeño is the first brick in building gigawatt-scale data centers - cities worth of compute - with Microsoft and others starting in 2026.

OpenAI has moved beyond software: it now controls the physics of its inference pipeline. If Jalapeño delivers on cost, the $20.9 billion operating loss becomes a historical footnote rather than a terminal diagnosis.


Source: OpenAI unveils first custom AI inference chip, Jalapeño, with Broadcom - and its development was sped-up with OpenAI's own models
Domain: venturebeat.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.