Deep linear networks are a surprisingly useful toy model of weight-space dynamics
Deep linear networks are simple enough to study analytically but rich enough to exhibit key phenomena of neural network training.
Deep linear networks are simple enough to study analytically but rich enough to exhibit key phenomena of neural network training.
What is the origin of neural scaling laws? What do they tell us about the structure of data? What are the limits of interpretability?
Three essays on building theory that matters.