Machine learning is a crucial domain where differential privacy (DP) and selective classification (SC) play pivotal roles in safeguarding sensitive data. DP adds random noise to protect individual privacy while retaining the overall utility of the data, while SC chooses to refrain from making predictions in cases of uncertainty to enhance model reliability. These components…
Large Language Models (LLMs) present a potential problem in their inability to accurately represent uncertainty about the reliability of their output. This uncertainty can have serious consequences in areas such as healthcare, where stakeholder confidence in the system's predictions is critical. Variations in freeform language generation can further complicate the issue, as these cannot be…
With their capacity to process and generate human-like text, Large Language Models (LLMs) have become critical tools that empower a variety of applications, from chatbots and data analysis to other advanced AI applications. The success of LLMs relies heavily on the diversity and quality of instructional data used for training.
One of the operative challenges in…
Artificial Intelligence (AI) aims to create systems that can execute tasks normally requiring human intelligence. These tasks include learning, reasoning, problem-solving, perception, and language understanding. Such technologies are highly beneficial in various industries such as healthcare, finance, transportation, and entertainment. Consequently, optimizing AI models to efficiently and precisely perform these tasks is a significant challenge…
Neural networks using gradient descent often perform well even when overparameterized and initialized randomly. They frequently find global optimal solutions, achieving zero training error without overfitting, a phenomenon referred to as "benign overfitting." However, in the case of Rectified Linear Unit (ReLU) networks, solutions can lead to overfitting if they interpolate the data. Particularly in…
Large language models (LLMs), such as those used in AI, can creatively solve complex tasks in ever-changing environments without the need for task-specific training. However, achieving broad, high-level goals with these models remain a challenge due to the objectives' ambiguous nature and delayed rewards. Frequently retraining models to fit new goals and tasks is also…
Large Language Models (LLMs) like Mistral, Gemma, and Llama have significantly contributed to advancements in Natural Language Processing (NLP), but their dense models make them computationally heavy and expensive. As they utilize every parameter during inference, this intensity makes creating affordable, widespread AI challenging.
Conditional computation is seen as an efficiency-enhancing solution, activating specific model parameters…
A team from Stanford and Duolingo has proposed a new way to manage the proficiency level in texts generated by large language models (LLMs), overcoming limitations in current methods. The Common European Framework of Reference for Languages (CEFR)-aligned language model (CALM) combines techniques of finetuning and proximal policy optimization (PPO) for aligning the proficiency levels…