Visually rich documents (VRDs) such as invoices, utility bills, and insurance quotes present unique challenges in terms of information extraction (IE). The varied layouts and formats, coupled with both textual and visual properties, require complex, resource-intensive solutions. Many existing strategies rely on supervised learning, which necessitates a vast pool of human-labeled training samples. This not…
Vector databases have emerged as a significant change in the field of data management and retrieval, revolutionizing how businesses and tech enthusiasts handle complex, high-dimensional data. Unlike traditional databases that deal with simple values such as integers or strings, vector databases are capable of performing operations on complex and unstructured data points in a multi-dimensional…
ST-LLM: An Efficient Video-LLM Framework Incorporating Spatial-Temporal Sequence Modeling within LLM
Artificial general intelligence has advanced significantly, thanks in part to the capabilities of Large Language Models (LLMs) such as GPT, PaLM, and LLaMA. These models have shown impressive knowledge and generation of natural language, highlighting the direction of future AI. However, while LLMs excel at text processing, video processing with complex temporal information remains a…
Researchers at MIT and the MIT-IBM Watson AI Lab have developed an AI system designed to educate users on when to trust an AI's decision-making process - for instance, a radiologist determining if a patient's X-ray shows signs of pneumonia. The training system identifies scenarios where the human should not trust the AI model, automatically…
MIT has released a set of policy briefs offering guidance for the governance of artificial intelligence (AI) for lawmakers. The goal of these documents is to strengthen the U.S.'s leadership in AI, minimize potential harm from misapplication, and promote the beneficial uses of AI in our society.
The primary policy paper, titled “A Framework for U.S.…
A team from the Massachusetts Institute of Technology (MIT) has found that machine learning (ML) models can effectively mimic and understand the human auditory system, potentially helping to improve technologies such as cochlear implants, hearing aids and brain-machine interfaces.
These findings are based on the largest-ever study of deep neural networks used to perform auditory…
Researchers from the Shanghai AI Laboratory and TapTap have developed a Linear Attention Sequence Parallel (LASP) technique that optimizes sequence parallelism on linear transformers, side-stepping the limitations led by the memory capacity of a single GPU.
Large language models, due to their significant size and long sequences, can place a considerable strain on graphical unit…
Large language models and multimodal foundation models like GPT4V, Claude, and Gemini, that blend visual encoders and language models, have made profound strides in the realms of Natural Language Processing (NLP) and Natural Language Generation (NLG). They show impressive performance when working with text-only inputs or a combination of image and text-based inputs. Nonetheless, queries…
Artificial intelligence (AI) continues to make significant strides forward with the development of Viking, a cutting-edge language model designed to cater to Nordic languages alongside English and a range of programming languages. Developed by Silo AI, Europe's largest private AI lab in partnership with the TurkuNLP research group at the University of Turku and HPLT,…