Matrices as functions, not number grids. Matmul as composition. The three independent axes. Orthogonal/rotation matrices.
A matrix is a linear function, not a 2D array. Its columns are the images of the standard basis vectors. The 2D array is just one way to write the function down.
Matrix multiplication is function composition. The matmul has three independent loop dimensions — M, N, K — and each tiles separately. That structural fact is what makes FlashAttention possible.
Orthogonal matrices are the linear functions that preserve every length and every angle. Two structural invariants — ‖Qx‖ = ‖x‖ and ⟨Qx, Qy⟩ = ⟨x, y⟩ — make rotation-based quantization (TurboQuant) work.
Naive matmul is memory-bound, not compute-bound. The fix is tiling — keep a small working set in fast memory, reuse it before moving on. The microkernel pattern is the architectural ancestor of FlashAttention.
← ALL CHAPTERS