Deep learning’s remarkable success can be partially attributed to its ability to extract useful representations of complex data, a process often achieved via Self-Supervised Learning (SSL). However, the core process by which SSL algorithms achieve this has largely remained a mystery. A recent paper to appear at ICML 2023 provides the first comprehensive mathematical model of SSL-based training in large-scale methods. The model suggests that SSL learns aspects of data incrementally through a series of distinct, well-separated steps, a behavior observable across currently employed, state-of-the-art systems.
The analysis focuses on joint-embedding SSL methods, which are designed to ensure the matching of embeddings, i.e., useful patterns mined from data, for semantically similar views of an image.
The paper provides a theoretical model that elaborates on this stepwise learning in SSL. The proposed model constitutes a linear model of SSL, within which both the learning trajectories and the final embeddings can be derived in closed form. Significantly, the analysis identifies a sequence of discrete, well-separated steps in which the model learns to represent data. The eigendirections of the feature cross-correlation matrix, analyses via the Barlow Twins loss function, illustrates these learning steps, demonstrating the advancement of a new direction in function representation anytime a learning step is completed.
This functioning of stepwise learning can open up possibilities for the optimization of SSL methods, enabling the exploration of new scientific questions. With answers to these questions, we may gain deeper insights into deep learning systems widely employed today.
The findings are significant to understanding the broader concept of spectral bias, commonly seen in learning systems with approximately linear dynamics, where eigendirections with higher eigenvalue are preferentially learned. These findings are also applicable to understanding the dynamics of wide neural networks, through extending the solutions found in relation to linear models
The researchers conducted experiments using ResNet-50 encoders trained by several leading SSL methods, which revealed an evident stepwise learning pattern. This pattern indicates this behavior is central to SSL learning.
The study paves way for many experimental and practical opportunities in the SSL field, ranging from effectiveness of individual eigenmodes to optimizing SSL models for faster training.
The authors contend that the potential applications of their findings, beyond the specific scope of SSL, could extend to a broader range of deep learning paradigms, given that many forms of representation learning converge to similar representations.
The paper from this research was jointly worked on by Maksis Knutins, Liu Ziyin, Daniel Geisz, and Joshua Albrecht. The research was conducted with Generally Intelligent where Jamie Simon served as a Research Fellow.