III — The Neural Network, Assembled → Chapter 13
FROM SYSTEMS TO FRONTIER ML

Attention, fully assembled

QKᵀ, scaling, softmax, V; multi-head → GQA/MQA. FlashAttention as tiling + online softmax. The capstone of Part III.

§1 Attention as soft dictionary lookup §2 Multi-head, MQA, GQA — projection subspaces §3 FlashAttention — tile + online softmax + running output

← ALL CHAPTERS