Skip to content Skip to sidebar Skip to footer

Editors Pick

Matrices of Quantized Eigenvectors for Second-Order Optimization of 4-bit Deep Learning Networks

Deep neural networks (DNNs) have found widespread success across various fields. This success can be attributed to first-order optimizers such as stochastic gradient descent with momentum (SGDM) and AdamW. However, these methods encounter challenges in efficiently training large-scale models. As an alternative, second-order optimizers like K-FAC, Shampoo, AdaBK, and Sophia have demonstrated superior convergence properties,…

Read More

Introducing Tsinghua University’s GLM-4-9B-Chat-1M: A Remarkable Language Model Competing Against GPT 4V, Gemini Pro (focused on vision), Mistral and Llama 3 8B.

Tsinghua University's Knowledge Engineering Group (KEG) has introduced GLM-4 9B, an innovative, open-source language model that surpasses other models like GPT-4 and Gemini in different benchmark tests. Developed by the Tsinghua Deep Model (THUDM) team, GLM-4 9B signals an important development in the sphere of natural language processing. At its core, GLM-4 9B is a colossal…

Read More

Introducing Mesop: A UI Framework built with Python that enables the creation of web applications such as demonstrations and proprietary AI/Machine Learning applications.

Building web applications can be a daunting task, especially for those who are not well-versed with JavaScript, CSS, or HTML. Creating visually appealing and functional web applications can take a lot of time and delays in the development process can negatively impact productivity and innovation. Traditionally, frameworks like Django and Flask have been used to…

Read More

The Skywork team announces the unveiling of Skywork-MoE, a highly efficient Mixture-of-Experts (MoE) model, which boasts 146 billion parameters, 16 experts, and 22 billion activated parameters.

The advancement of natural language processing (NLP) capabilities has been to a large extent, dependent on developing large language models (LLMs). Although these models deliver high performance, they also pose challenges due to their need for immense computational resources and related costs, making them hard to scale up without incurring substantial expenses. These challenges, therefore, create…

Read More