Can learning in realistic large models be decomposed into a sequence of fundamental "units"?


In several toy models of learning, we encounter the idea that the learning process naturally decomposes into units learned in sequence. Does a story of this sort describe learning in realistic large models?

This idea that learning decomposes into a sequence of atomic “units” seems to consistently appear in exactly-solvable models of learning. For example, in deep linear networks, these units are singular vectors of the data input-output correlation matrix. In kernel regression, these units are the kernel eigenmodes.

Does this sort of story describe neural network learning for deep nonlinear networks that exhibit feature learning? In this light, the quanta hypothesis is a conjecture that the answer is yes: that learning in realistic networks can be understood as the sequential acquisition of “units of knowledge.” If this is indeed the case, can we say, analytically, what these “quanta” are in realistic learning settings? Can we find phase transitions empirically in the learning dynamics of realistic networks? Can we develop a suite of theoretical and empirical tools for answering these questions?

Discussion