NVIDIA Announces Inference-Optimized Chips
Narrative
New chip line specifically for inference. 5x performance/watt vs Blackwell for inference. Lower cost. Targeting reasoning model deployment.
Reality
Specifications detailed but shipping Q3 2025. Performance claims credible based on architecture. Pricing competitive with Google TPU and AWS Trainium. Pre-orders from hyperscalers strong.
Implication
Acknowledged inference economics as distinct from training. Reasoning model proliferation created inference demand surge. Competition from cloud providers intensified. Inference became largest AI workload.