Skip to content Skip to footer

Understanding Feature Representation: Examining Inductive Biases in Deep Learning

Research conducted by DeepMind has shed new light on the complexities of machine learning and neural representation, providing insights into the dissociations between representation and computation in deep networks. High capacity deep networks frequently demonstrate an implicit bias towards simplicity amidst their learning dynamics and structure. The employed simple functions allow for easier learning of simple features. This can have far-reaching effects on internal representations, even where complex characteristics are in play.

The DeepMind team looked at varying dissociations by developing datasets that hold onto the computational roles of features while controlling their features. They trained different deep learning architectures to compute multiple abstract features from inputs. The outcome demonstrated systematic biases in feature representation – influenced by things like feature complexity, learning order, and feature distribution. Straightforward or earlier-learned features were more strongly represented than their complex or later-learned counterparts. Importantly, these biases were found to be impacted by structures, optimizers, and training approaches. Transformer models, in particular, favor features decoded early in the output sequence.

The team trained networks to classify numerous features, either via separate output units or as a sequence. Key to this approach was the statistical independence of the features. Demonstrating this concept, models achieved a more than 95% accuracy on test sets. They probed how features such as complexity, prevalence, and position resulted in changes in feature representation. Training datasets were designed to methodically manipulate these properties, with validation and test datasets confirming the intended generalization.

In conclusion, DeepMind’s research showed that even when the simple and complex features have been equally well learned, there is a systematic bias in feature representation towards the simpler or earlier-learned features. Crucially, a network’s architecture, optimization, and training methods were also found to influence these tendencies, highlighting the need for further research in this area to improve the interpretability of learned representations and to allow for better comparison between different systems in machine learning, cognitive science, and neuroscience.

Leave a comment

0.0/5