Deep Neural Networks (DNNs) represent a great promise in current machine learning approaches. Yet a key challenge facing their implementation is scalability, which becomes more complicated as networks become more sizeable and intricate. New research from the University College London presents a novel understanding of common learning patterns across different neural network structures.
The researchers behind the paper have recently delved into four unique approaches to unlock the intricacies of learning dynamics within DNNs. The first, ‘Exact Solutions in Simple Architectures’, has made considerable progress in the theoretical analysis of ‘deep linear’ neural networks. The study’s focus on this area has helped elucidate the ‘loss landscape’ and derive exact solutions for particular initial circumstances.
The second approach, the ‘Neural Tangent Kernel’, provides a collection of universal solutions applicable to broad-spectrum models. The third approach, ‘Implicit Biases in the Gradient Descent Technique’, looks at gradient descent’s influences on DNNs’ general performance. Lastly, the ‘Local Elasticity’ approach observes a unique property where updating a feature vector has a minimal impact on other unalike feature vectors.
The team proposed a model for ‘universal representation learning’, which recognises the common phenomena in learning systems. It has demonstrated that the derived theory explains the dynamic of different deep networks, beyond their activation functions and architectures.
An intermediate layer was established in the DNN’s model to understand how its depth impacts the network’s dynamics. In answering this, it became clear that the distance between initial representations should be close for the linear approximation to be accurate. In establishing this, the presumption was made that the initial weights of the network are small, with a constant activational gain factor of less than one.
The study found that the effective learning rates differ in each hidden layer of the network. The encoder’s rate increases with depth due to the addition of parameters, while the decoder’s rate decreases due to parameters being reduced. This relationship holds intact for deeper network layers, though the decoder’s effective learning rate was found to increase in shallower layers.
In conclusion, the team at University College London have presented a theory explaining how neural networks learn, focusing especially on those that begin with small weights. While this theory proved effective for certain applications, applying this to larger datasets presented unique challenges. Further research is needed to apply the theory to handle more complex data.