Deep neural networks, particularly convolutional neural networks (CNNs), have significantly advanced computer vision tasks. However, their deployment on devices with limited computing power can be challenging. Knowledge distillation has become a potential solution to this issue. It involves training smaller “student” models from larger “teacher” models. Despite the effectiveness of this method, the process of training the resource-intensive teacher model can be difficult.
Researchers have tested various techniques to use the power of soft labels – probability distributions that capture inter-class similarities – for knowledge distillation. Some have examined the impact of large teacher models, others have tried crowd-sourced soft labels or decoupled knowledge transfer. A novel approach called Resource-efficient Autoencoder-based Knowledge Distillation (ReffAKD) was developed by researchers who sought to create high-quality soft labels without relying on a large teacher model or costly crowd-sourcing.
The researchers employed the properties of autoencoders – neural networks that learn compact data representations. By using these representations, they could identify essential features and calculate class similarities, effectively mimicking a teacher model’s behavior. The Convolutional AutoEncoder at the heart of ReffAKD was constructed to encode input images into a hidden representation that captures each class’s defining attributes.
During training, samples were randomly taken from each class to generate soft labels and calculate the cosine similarity between their encoded representations. This similarity score fills a matrix, essentially creating a soft probability distribution reflecting inter-class relationships. A tailored loss function was used to train the student model, which combines Cross-Entropy loss with Kullback-Leibler Divergence between the model outputs and the autoencoder-generated soft labels.
ReffAKD performed impressively across three benchmark datasets – CIFAR-100, Tiny Imagenet, and Fashion MNIST. Its resource efficiency was particularly notable on complex datasets. The method was found to be compatible with existing logit-based knowledge distillation techniques, indicating the potential for further performance gains through hybridization.
Despite its success in computer vision applications, researchers envisage that ReffAKD can be potentially applied in other areas like natural language processing. For example, it could assist in distilling compact models like TinyBERT or other BERT variants for text classification tasks.
ReffAKD significantly contributes to the field of deep learning by democratizing knowledge distillation. It eliminates the need for resource-intensive teacher models and enables the technique’s wider usage in resource-limited settings. This method has potential applicability beyond computer vision, suggesting possibilities for hybrid approaches for improved performance across different domains.