OpenAI launches Jalapeño, its first custom AI chip built with Broadcom for LLM inference at production scale

OpenAI Just Built Its Own Chip. The Industry Should Pay Attention.

There’s a tweet from OpenAI that got 15,000+ likes and most people in my feed either cheered it or scrolled past. I think both reactions miss the weight of what actually happened. OpenAI announced Jalapeño, its first custom AI chip, designed in-house and brought to production with Broadcom. Purpose-built for LLM inference workloads running ChatGPT, Codex, the API, and future agentic products. This is not a research prototype. It is in production.

Let me explain why this is a structural shift, not a product announcement.

The Dependency Problem

For years, OpenAI ran everything on NVIDIA hardware. Every API call, every ChatGPT response, every Codex completion. That is a staggering amount of compute, and none of the underlying silicon was theirs. When you do not control your hardware, you do not fully control your costs, your supply chain, or your ceiling on scale. NVIDIA has been generous with its margins, and the H100 and A100 allocations during peak demand shortages were not always kind to anyone, OpenAI included.

Building your own inference chip breaks that dependency at the root. It does not eliminate NVIDIA from the picture overnight, but it changes the negotiation permanently.

The Playbook Already Exists

Google built TPUs starting around 2016. The result was a training and inference infrastructure that let them iterate on Gemini and its predecessors without being structurally capped by third-party supply. Apple moved to the M-series and now controls performance-per-watt in a way no Intel partnership ever could have delivered. Amazon built Trainium for training and Inferentia for inference, and AWS now offers both to customers as differentiated products.

In every one of those cases, the company that owned its silicon eventually owned its economics in ways competitors could not replicate by just buying more GPUs.

OpenAI is following the same playbook, roughly a decade after Google and several years after Amazon. Late, but not too late.

Why Inference Specifically

The chip is inference-focused, and that is the right call right now. Training runs are expensive but infrequent. Inference is the 24/7 tax. Every single user query to ChatGPT, every API call, every agentic workflow running in the background is an inference workload. At OpenAI’s current scale, shaving cost per token through optimized silicon has a compounding return that generic GPU clusters simply cannot match.

Purpose-built inference hardware can optimize memory bandwidth, attention computation, and token throughput in ways that general-purpose GPUs are not architected to prioritize. That gap widens as models get larger and context windows grow.

What the Broadcom Partnership Actually Means

OpenAI designed Jalapeño. Broadcom manufactured it at scale. This is the same model Google used with its TPUs, which are also produced with Broadcom’s ASIC expertise. It means OpenAI gets custom silicon without needing to become a fab or a chip company from scratch. The design IP stays with OpenAI. The manufacturing leverage comes from Broadcom.

That is a clean split, and it is the realistic path for any software company moving into silicon without a decade of hardware engineering headcount.

The Bigger Picture for AI Infrastructure

Custom silicon is how you survive at production scale when your core product is compute-intensive by definition. The companies that figure this out early will have margins and scale ceilings that pure GPU renters will not. OpenAI just crossed a line that most of its direct competitors, Anthropic, Mistral, xAI, have not crossed yet.

Whether Jalapeño is good enough on day one is almost beside the point. The organizational capability to design, tape out, and deploy custom inference silicon is now inside OpenAI. That capability compounds.

The real question is not whether this chip beats an H100 on a benchmark. It is whether OpenAI can iterate on Jalapeño the way Apple iterated from M1 to M4. If they can, the cost and performance curve bends in their direction permanently. I think they probably can.

Sources & Further Reading

#OpenAI #AIInfrastructure #CustomSilicon #LLM #MachineLearning #AIChips #Broadcom

OpenAI launches Jalapeño, its first custom AI chip built with Broadcom for LLM inference at production scale

Sources & Further Reading

Google Gemini 2.5 Pro tops coding benchmarks and delivers usable 1M token context window

Career-ops: open-source Claude Code system that filters job applications using engineering discipline instead of spray-and-pray

Prediction: AI citation visibility is replacing traditional SEO rankings as the new content distribution game

LLMs write plausible code not correct code, and what that distinction means for engineers in production

Hosting Your Own MCP Server Online: Python, Heroku, and OAuth

xAI teases Imagine image generation model via Elon Musk, with analysis of structural advantages from X platform integration

Leave a Reply Cancel reply

Sources & Further Reading

Similar Posts

Leave a Reply Cancel reply