Pick up a neural network with the typical propagation
with depth and size at level . If you would want to compute the average norm of the values at level
you would have a typical recursive situation like one has e.g. with spin models.
Let’s assume that the network is large and that the weight are normal
then the expectation of the vector is
which, as aforementioned, is recursive but can be solved by using a mean-field approach. Instead of using the precise value at level we assume that the network is large, there is a thermal equilibrium and the effect at a node is averaged out;
with the Gaussian measure. So, taken together this gives
with the non-linear activation function. As it stands this can now be used and analyzed like any statistical field theory. For example, what are the critical points, is there a phase transition, how does the activation function influence the results and so on.
You can find cool info about this direction here:
- Exponential expressivity in deep neural networks through transient chaos
- Deep Information Propagation
Why does it matter?
All of this might seem too abstract to you but this kind of research in fact really makes a difference:
- when you initialize a neural network, what should you take? Does it matter and if not, why not?
- if you train a neural network, are you sure it will converge? In what context do the gradients disappear or blow up?
Furthermore, there are some interesting extrapolations to our own brain:
- information travels across gigantic amounts of neurons, but what makes it stop? Why don’t we have deadlocks and persistent information flows?
- phase transitions and critical points occur in all large networks, what does it mean in our thinking? Why is it or is it not happening?