Matrices of Quantized Eigenvectors for Second-Order Optimization of 4-bit Deep Learning Networks

Deep neural networks (DNNs) have found widespread success across various fields. This success can be attributed to first-order optimizers such as stochastic gradient descent with momentum (SGDM) and AdamW. However, these methods encounter challenges in efficiently training large-scale models. As an alternative, second-order optimizers like K-FAC, Shampoo, AdaBK, and Sophia have demonstrated superior convergence properties, but often with significant computational and memory costs.

Attempts to reduce the memory consumption have centered on factorization and quantization. Factorization exploits a low-rank approximation strategy to represent optimizer states. Conversely, quantization employs low-bit representations to compress optimizer states. Notably, while quantization has been effective for first-order optimizers, its adaptation to second-order optimizers presents challenges due to the involvement of matrix operations.

A team of researchers from Beijing Normal University and Singapore Management University have introduced the first 4-bit second-order optimizer, using Shampoo as an example, but maintaining comparable performance to 32-bit alternatives. The researchers achieved this by quantizing the eigenvector matrix of the preconditioner in 4-bit Shampoo, preserving the small singular values vital for accurately computing the inverse fourth root and thereby avoiding performance degradation.

To enhance the performance, two techniques were proposed. Firstly, Björck orthonormalization improves the orthogonality of the quantized eigenvector matrix. Secondly, linear square quantization overtakes dynamic tree quantization for second-order optimizer states. The quantized eigenvector matrix U of the preconditioner is used, preserving the singular value matrix crucial for precise computation of the matrix power via matrix decompositions.

Empirical testing showed that the 4-bit Shampoo outperformed first-order optimizers such as AdamW. These first-order methods required 1.2 to 1.5 times more epochs, leading to longer operational times, while achieving lower test accuracies compared to second-order optimizers. 4-bit Shampoo, on the other hand, achieved test accuracies comparable to 32-bit Shampoo, providing memory savings of 4.5% to 41%. Importantly, the memory costs of 4-bit Shampoo were only slightly higher than first-order optimizers, marking a considerable progression in the use of second-order methods.

In conclusion, the 4-bit Shampoo allows for memory-efficient training of DNNs while matching the performance of 32-bit alternatives. This represents a significant advancement in second-order optimizers, potentially enabling their wider use in training large-scale DNNs. Researchers continue to explore ways of enhancing performance and efficiency of optimizers, with an aim towards more cost-effective and streamlined DNN training.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Matrices of Quantized Eigenvectors for Second-Order Optimization of 4-bit Deep Learning Networks

Leave a comment Cancel reply

You May Also Like

Scientists at Northeastern University suggest NeuFlow: An extremely effective Optical Flow Structure that tackles both precision and computational cost issues.

The team from MIT publishes research reports on the management of AI.

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Matrices of Quantized Eigenvectors for Second-Order Optimization of 4-bit Deep Learning Networks

Leave a comment Cancel reply

You May Also Like

Scientists at Northeastern University suggest NeuFlow: An extremely effective Optical Flow Structure that tackles both precision and computational cost issues.

The team from MIT publishes research reports on the management of AI.

+60 12-462 2768

All
Categories

All
Categories

All
Categories