Be excited for the next advancement in the world of AI! Recent research has highlighted the success of Large Language Models (LLMs) trained on Code, showing excellence in diverse software engineering tasks. These models can be categorized into three main paradigms: (i) Code LLMs specialized in code completion, (ii) Task-specific Code LLMs fine-tuned for individual tasks, and (iii) Instruction-tuned Code LLMs that are adept at adhering to human instructions and demonstrating robustness in handling new tasks. Models such as WizardCoder and OctoCoder have achieved remarkable performance across various tasks without requiring task-specific fine-tuning.
In order to further explore the potential of these models, Monash University and ServiceNow Research researchers have developed ASTRAIOS, a collection comprised of 28 instruction-tuned Code LLMs. These models are fine-tuned using seven tuning methods based on the base models of StarCoder, specifically, models sized at 1B, 3B, 7B, and 16 B. They have used the CommitPackFT dataset from OctoPack to tune the models, ensuring a balanced improvement of their downstream capabilities.
To evaluate the scalability of the different tuning methods, the researchers have assessed the cross-entropy loss during instruction tuning. This evaluation focuses on analyzing model size and training time scales. Then, they have conducted an evaluation on five representative code-related tasks, including clone detection, defect detection, code synthesis, code repair, and code explanation. Furthermore, they have conducted further analysis of the tuning methods, examining model robustness and code security. This evaluation involves assessing the models’ ability to generate Code based on perturbed examples and determining the vulnerabilities in the generated Code.
The results have been astounding! Larger PEFT Code LLMs have excelled in code generation tasks, but have not demonstrated similar advantages in code comprehension tasks such as clone detection and defect detection. As the model size increases, task performance in generation improves, although it raises concerns about the susceptibility to adversarial examples and the tendency of the model to generate insecure Code.
The researchers have also studied the relationship between updated parameters, cross-entropy loss, and task performance. They have determined that the final loss of smaller PEFT models can be used to predict that of larger ones. Moreover, a strong correlation exists between the last loss and overall performance in downstream tasks. Additionally, they have observed uniformity in relative loss performance across various model sizes when comparing other tuning methods. This uniformity implies that the enhancements attained by each tuning method are comparable, regardless of the model’s scale.
This research has presented a great opportunity to explore the possibilities of AI applications. Don’t miss out on this incredible progress and follow us on Twitter. You can also join our 35K+ ML SubReddit, 41K+ Facebook Community, Discord Channel, and LinkedIn Group. If you appreciate our work, you will love our newsletter. Check out the Paper and Github for more information about ASTRAIOS.