Training deep neural networks with hundreds of layers can be a painstaking process, often taking weeks due to the sequential nature of the backpropagation learning method. While this process works on a single computer unit, it is challenging to parallelize across multiple systems, leading to long waiting times.
This issue escalates further when dealing with enormous neural networks involving hefty data amounts. Consequently, distributed deep learning has surged in popularity, with key solutions like GPipe, PipeDream, and Flower making a mark. These distributed training frameworks are optimized for speed, usability, cost, and size and promote advanced training methods for vast neural networks, such as data, pipeline, and model parallelism.
Another innovative method for training neural networks is the Forward-Forward (FF) technique presented by Hinton, notable for its local computation feature that cuts out dependencies and reduces idle time, communication, and synchronization—a contrast to the widely-used backpropagation.
Recently, a study from Sabanci University introduced Pipeline Forward-Forward Algorithm (PFF), a new approach for training distributed neural networks using the FF. The PFF significantly eliminates idle time and achieves a higher usage rate of computational units, showcasing an improvement over traditional implementations with backpropagation and pipeline parallelism. It was found that PFF accomplishes the same accuracy level as FF but at a speed four times faster.
Furthermore, compared to the existing distributed FF (DFF), PFF displays 5% higher accuracy in 10% fewer epochs. PFF only transmits layer information (weights and biases) resulting in significantly less data sharing compared to DFF, reducing communication overheads.
The research team suggested several ways to enhance PFF. One being exchanging parameters between various layers after each batch, potentially refining weights to yield more precise results. Attempting this process, however, could increase communication overhead. Another consideration is PFF’s application in a Federated Learning system since it doesn’t share data with other nodes during model training.
Communication between nodes was established using sockets in the conducted experiments, adding communication overhead. Researchers recommend using a multi-GPU architecture to decrease the time needed for network training. Finally, the process can be improved by finding better ways to produce negative samples, which heavily influence the FF algorithm, thereby raising system performance.
The researchers hope that their work will bring about a new era in the field of Distributed Neural Network training with the successful implementation of PFF.