Multi-layer perceptrons (MLPs), also known as fully-connected feedforward neural networks, are foundational models in deep learning. They are used to approximate nonlinear functions and despite their significance, they have a few drawbacks. One of the limitations is that in applications like transformers, MLPs tend to control parameters and they lack interpretability compared to attention layers. When searching for alternatives, the focus has largely been on conventional depth-2 width-(2n+1) architectures, while ignoring modern training methods like backpropagation. Therefore, the ongoing exploration aims at developing more effective nonlinear regressors in neural network designs, even though MLPs still remain significant.
Research teams from MIT, Caltech, Northeastern, and the NSF Institute for AI and Fundamental Interactions have developed a potential alternative to MLPs called Kolmogorov-Arnold Networks (KANs). KANs use learnable activation functions on edges, which contrasts MLPs that have fixed node activation functions. They also replace linear weights with parametrized splines. This adaptation enables KANs to offer better accuracy and interpretability than MLPs. Mathematical and empirical examinations have shown that KANs perform better especially when handling high-dimensional data and for scientific problem-solving. This research introduces the architecture of KAN, compares the model with MLPs, and presents the interpretability and applicability of KAN in scientific discoveries.
Previous literature explores the relationship between the Kolmogorov-Arnold theorem (KAT) and neural networks. However, this study expands the network to all sizes and depths, thereby making KAT more relevant in modern deep learning. The researchers also explored the Neural Scaling Laws (NSLs) to demonstrate how the Kolmogorov-Arnold representations aid rapid scaling. In terms of Mechanistic Interpretability (MI), the study designs inherently interpretable structures and explores learnable activations and symbolic regression methods. The KANs promise to replace MLPs in Physics-Informed Neural Networks (PINNs) and other AI applications in mathematics, particularly in knot theory.
Based on the findings of the study, KANs outperformed MLPs in representing functions across various tasks such as regression, solving partial differential equations, and continual learning. KANs showcased superior accuracy and efficiency, especially in capturing the complex structures of special functions and Feynman datasets. KANs also demonstrated the potential for scientific discovery in fields like knot theory. It should be noted, however, that while KANs enhance accuracy and interpretability, their training is slower compared to Multilayer Perceptrons. The study also identifies that improving efficiency still remains a challenge, thus inviting further research to optimize training speed. This means that for tasks that demand speed, MLPs are more practical while for ones that prioritize accuracy and interpretability KANs are better suited.