Kolmogorov-Arnold Networks (KANs) are a recent development that offer an alternative to Multi-Layer Perceptrons (MLPs) in machine learning. Using the Kolmogorov-Arnold representation theorem, KANs use neurons that carry out simple addition operations. Nonetheless, current models of KANs can pose challenges in real-world application, prompting researchers to explore other multivariate functions that could boost its use across different machine-learning tasks.
KANs have demonstrated potential in various areas such as computer vision, time series analysis, and quantum architecture search. In some cases, they have outperformed MLPs in data fitting and Partial Differential Equation tasks whilst using fewer variables. However, some concerns have been raised regarding KANs’ robustness to noise and their performance in comparison with MLPs. As such, variations and developments to KAN architecture are continually being explored. These include graph-based designs, convolutional KANs, and transformer-based KANs. Similarly, researchers have also looked into alternative activation functions, such as wavelets, radial basis functions, and sinusoidal functions, to improve KAN efficiency. Still, further enhancements are needed for better KAN performance.
A researcher from Sweden’s Halmstad University Center for Applied Intelligent Systems Research has proposed a novel way of improving KANs. The goal is to find the best multivariate function for KAN neurons for various machine learning classification tasks. Traditional addition functions can be ineffective, particularly with high-dimensional datasets with numerous features. This can cause inputs to exceed the subsequent activation functions’ effective range, resulting in training instability and decreased generalization performance. To solve this issue, the researcher recommends using the mean rather than the sum as the node function.
Ten commonly used datasets from the UCI Machine Learning Database Repository were used to test the proposed KAN modifications. The datasets cover multiple domains and sizes, and have been divided into 60% training, 20% validation, and 20% testing partitions. For each dataset, a standard preprocessing method was employed, which included categorical feature encoding, missing value imputation, and instance randomization. Each model underwent 2000 iterations of training using the Adam optimizer with a 0.01 learning rate and a batch size of 32. Model accuracy on the testing set was used to evaluate the primary evaluation metric.
The study findings support the idea that using the mean function in KAN neurons is more effective than the traditional sum function. This is due to the mean’s ability to keep input values within the optimal range of the spline activation function, which is between -1.0 and +1.0. Also, by using the mean function in neurons, KAN models can better manage input values across datasets with 20 or more features.
The researcher from Halmstad University has proposed an important modification to KANs. This modification replaces the traditional sum function with an averaging function in KAN neurons, leading to more consistent training processes and keeping inputs within the effective range of spline activations. This adjustment resolves previous challenges related to input range and training stability. Future implementation of this technique could significantly improve KAN’s performance, increasing the applicability of the networks in diverse machine-learning tasks.