Pick up a neural network with the typical propagation

y_k^l = \sum_i w_{ki}^ly_i^{l-1}+b_k

with depth N and size  N_l at level  l. If you would want to compute the average norm of the values at level  l

q^l = \frac{1}{N_l}\sum_i (y_i^l)^2

you would have a typical recursive situation like one has e.g. with spin models.

Let’s assume that the network is large  1\ll N and that the weight are normal
w^l_{mn} \sim N(0,\frac{\sigma^2_w}{N_l})
b_m \sim N(0,\sigma^2_b)
then the expectation of the vector  q^l is

\mathbb{E}(q^l) = \frac{\sigma^2_w}{N_l}\sum_k(y^{l-1}_k)^2 + \sigma^2_b
which, as aforementioned, is recursive but can be solved by using a mean-field approach. Instead of using the precise value at level  l-1 we assume that the network is large, there is a thermal equilibrium and the effect at a node is averaged out;

\frac{1}{N_l}\sum_k(y^{l-1}_k)^2 \sim \int\;dz \;z^2\;\exp{-\frac{z^2}{q^{l-1}}} = \int \;d\mu^{l-1}(z)\;z^2

with  \mu the Gaussian measure. So, taken together this gives
q^l = \sigma_w^2\; \int \;d\mu\; \phi(z)^2 + \sigma^2_b

with  \phi the non-linear activation function. As it stands this can now be used and analyzed like any statistical field theory. For example, what are the critical points, is there a phase transition, how does the activation function influence the results and so on.

You can find cool info about this direction here:

Why does it matter?

All of this might seem too abstract to you but this kind of research in fact really makes a difference:

  • when you initialize a neural network, what should you take? Does it matter and if not, why not?
  • if you train a neural network, are you sure it will converge? In what context do the gradients disappear or blow up?

Furthermore, there are some interesting extrapolations to our own brain:

  • information travels across gigantic amounts of neurons, but what makes it stop? Why don’t we have deadlocks and persistent information flows?
  • phase transitions and critical points occur in all large networks, what does it mean in our thinking? Why is it or is it not happening?