III — The Neural Network, Assembled → Chapter 14
FROM SYSTEMS TO FRONTIER ML

Normalization & residuals

LayerNorm → RMSNorm with full derivations, the residual stream, why deep nets train at all.

§1 LayerNorm — derivation and what it actually does §2 RMSNorm and pre-norm placement §3 The residual stream — why deep networks train

← ALL CHAPTERS