Revealing the Power of Big Language Models: Improving Comment Creation in Computer Science Education

Large classroom sizes in computing education are making it crucial to use automation for student success. Automated feedback generation tools are becoming increasingly popular for their ability to rapidly analyze and test. Among these, large language models (LLMs) like GPT-3 are showing promise. However, concerns about their accuracy, reliability, and ethical implications do exist.

Historically, the focus of LLMs in computing education has been primarily on identifying mistakes rather than providing constructive feedback. While some studies show that LLMs can identify issues in student code, they have been found to be inconsistent and inaccurate. In addition, current models struggle to provide feedback on par with humans on programming exercises. Therefore, the idea of using one LLM to judge the output of another, known as LLMs-as-judges, is becoming popular and showing promising results.

A recent study by researchers from Aalto University, the University of Jyväskylä, and The University of Auckland evaluates the effectiveness of LLMs, including open-source models, in providing feedback for student-written programs. The study establishes a baseline by comparing the feedback from GPT-4 with human expert ratings and then looks at how well other LLMs do as compared to proprietary models like GPT-4.

The study uses data from an introductory programming course at Aalto University, which included student help requests and feedback generated by GPT-3.5. Feedback quality, completeness, and perception were assessed, both qualitatively and automatically using GPT-4. A GPT-4-graded rubric was used to judge the feedback generated by various LLMs.

The study found that while most of the feedback was perceptive, only a little over half was complete, and much of the feedback contained misleading content. Moreover, GPT-4 was found to grade feedback more positively than human annotators, indicating possible positive bias. In terms of classification performance, GPT-4 did well in completeness classification, had lower performance in selectivity, and scored higher on perceptivity due to data skew.

To summarize, the study suggests that open-source LLMs have potential for providing programming feedback and GPT-4 shows promise as a tool to evaluate automatically generated feedback. This implies that LLM-generated feedback could be a cost-effective and accessible resource for educators. However, it is also important to remember that LLMs also have limitations and they may still require human help, especially in more complex cases.

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All
Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Revealing the Power of Big Language Models: Improving Comment Creation in Computer Science Education

Leave a comment Cancel reply

You May Also Like

I Requested Assistance from Numerous AIs to Attain Mystical Awakening During the Winter Solstice.

To enhance the proficiency of an AI assistant, it is essential to commence by mapping out the unpredictable actions of human beings.

+60 12-462 2768

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

All Categories

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Electrical Engineering & Computer Science (eecs)(430)

Machine learning(1188)

News(748)

Research(613)

School of Engineering(648)

Artificial Intelligence(2794)

Computer science and technology(559)

Data(164)

Revealing the Power of Big Language Models: Improving Comment Creation in Computer Science Education

Leave a comment Cancel reply

You May Also Like

I Requested Assistance from Numerous AIs to Attain Mystical Awakening During the Winter Solstice.

To enhance the proficiency of an AI assistant, it is essential to commence by mapping out the unpredictable actions of human beings.

+60 12-462 2768

All
Categories

All
Categories

All
Categories