NVIDIA Groq 3 LPU Unveiled at GTC 2026 — GPU+LPU Division and Samsung 4nm Foundry

3월 17, 2026

▲ NVIDIA GTC 2026 GPU+LPU division of labor

The Groq 3 LPU (Language Processing Unit) is a dedicated AI inference chip — specialized hardware designed solely for the "thinking" stage where AI models generate responses. NVIDIA unveiled it at GTC 2026, formally splitting the AI chip world in two: GPUs train, and LPUs infer. This strategic shift is closely tied to the rise of agentic AI — systems that autonomously think, act, and iterate in real time.

Why Does Agentic AI Need a Dedicated Inference Chip?

Agentic AI refers to systems that don't just respond to a single prompt — they reason through multi-step tasks, call external tools, and evaluate results in continuous loops. This loop-driven workflow demands extremely fast token generation, which is the core job of inference. GPUs are powerful at parallel computation for training, but generating tokens one at a time is not their strongest suit. In December 2025, NVIDIA acquired inference startup Groq for $20 billion, securing the LPU technology and engineering talent needed to close this gap. At GTC 2026, the first fruit of that acquisition — the Groq 3 LPU — was officially unveiled alongside NemoClaw, an open-source enterprise AI agent platform designed to run on this infrastructure.

▲ Groq 3 LPU specs: SRAM 150 TB/s bandwidth

Groq 3 LPU Key Specs — What Does 150 TB/s SRAM Mean?

The standout feature of the Groq 3 LPU is its use of on-chip SRAM (static RAM) instead of the HBM (High Bandwidth Memory) found in conventional GPUs. Each chip packs 500MB of SRAM delivering 150 TB/s of bandwidth — roughly 7× faster than the Vera Rubin GPU's HBM4 at 22 TB/s. By stacking 256 LPUs in a single LPX rack and connecting it to a Vera Rubin NVL72 GPU rack via Spectrum-X interconnect, NVIDIA creates a system where the GPU processes the user's prompt (prefill) and the LPU generates the response tokens (decode). This combination achieves 35× more tokens per watt compared to GPU-only inference setups.

▲ Samsung 4nm LPU foundry reshapes AI chips

Samsung 4nm Foundry — A First for NVIDIA Server Chips

The Groq 3 LPU is manufactured on Samsung's 4nm process — marking the first time an NVIDIA data-center server chip has been produced at Samsung's foundry rather than TSMC. According to TrendForce, Samsung's Groq wafer orders jumped from roughly 9,000 to 15,000 units — approximately a 70% increase — with volume production starting in 2026. This is a significant signal for the semiconductor supply chain: part of NVIDIA's AI chip manufacturing is now diversifying away from its traditional TSMC concentration toward Samsung.

Key Takeaways

① Groq 3 LPU performance — 500MB SRAM per chip, 150 TB/s bandwidth, 7× faster inference than Rubin GPU

② GPU+LPU division — GPUs handle training and prompt processing; LPUs handle token generation, achieving 35× better efficiency per watt

③ Samsung 4nm production — First NVIDIA server chip built at Samsung's foundry, with wafer orders up 70% to 15,000 units

NVIDIA's GPU+LPU division strategy signals a fundamental restructuring of the AI chip ecosystem. As training and inference separate onto dedicated silicon, AI infrastructure is evolving toward greater specialization and efficiency. Samsung's entry as an NVIDIA server chip foundry partner shows this shift is already rippling through the semiconductor supply chain.

📌 Sources: The Register, Tom's Hardware, TrendForce, NVIDIA Blog (2026)

이 블로그 검색

Tech News by InClicks