Large Language Models (LLMs), such as GPT-3 and ChatGPT, have been shown to exhibit advanced capabilities in complex reasoning tasks, outpacing standard, supervised machine learning techniques. The key to unlocking these enhanced abilities is the incorporation of a ‘chain of thought’ (CoT), a method that replicates human-like step-by-step reasoning processes. Importantly, the use of CoT enhances the reasoning ability of the model regardless of whether the intermediate steps are correct or not.
This study aimed to unpack why the CoT method is particularly effective in enhancing the capabilities of transformer models, such as GPT-3. The research applied circuit complexity theory and computational complexity classes, including NC, AC, and TC, to analyse the problem deeply.
Findings revealed that transformer models without CoT are primarily constrained to efficient parallel computing, solving problems that can be broken down and simultaneously computed in independent sub-tasks. However, many complex reasoning tasks require sequentially structured computations, where each step is reliant on the preceding one. In these situations, the CoT method significantly enhances transformer models, enabling them to perform more sequential computations that would not ordinarily occur.
The researchers demonstrated theoretically that a basic transformer model without CoT can solve problems up to a specific complexity level. However, incorporating a polynomial number of CoT steps immensely improves the model’s capacity to solve nearly any computationally difficult problem, at least theoretically.
The theoretical aspects of the research were further supported by the experiment conducted. Different arithmetic tasks that could be divided into parallelizable and inherently sequential computations were used. Notably, without CoT, transformer models struggled with sequential tasks, but the application of CoT dramatically improved their performance in tackling such tasks. This improvement was more pronounced in relatively smaller and shallower transformer models.
In summary, the study unveiled that the CoT approach greatly enhances the reasoning capabilities of transformer models such as GPT-3. It equips these models with the capacity to handle complex sequencing tasks, which are often problematic for parallel models. Hence, the CoT method is a powerful tool to enhance transformers’ intelligent problem-solving capacity.