Fifth-Generation Tensor Cores (Blackwell)
Narrative
FP4 precision support. 20 petaflops per GPU. Second-gen Transformer Engine. 2.5x performance vs Hopper Tensor Cores.
Reality
FP4 performance delivered. Inference cost reductions verified. Precision scaling required software tuning. Sparse performance rarely achieved in practice.
Implication
Ultra-low precision validated for inference. Models requiring 8 H100s ran on 2 B200s. Inference economics fundamentally changed. Precision progression continues: FP32 to FP16 to FP8 to FP4.