Large Language Models (LLMs) are widely used in complex reasoning tasks across various fields. But, their construction and optimization demand considerable computational power, particularly when pretraining on large datasets. To mitigate this, researchers have proposed scaling equations showing the relationship between pretraining loss and computational effort.
However, new findings suggest these rules may not thoroughly represent…
