Language modeling, a key aspect of machine learning, aims to predict the likelihood of a sequence of words. Used in applications such as text summarization, translation, and auto-completion systems, it greatly improves the ability of machines to understand and generate human language. However, processing and storing large data sequences can present significant computational and memory…
Graph Neural Networks (GNNs) are essential for processing complex data structures in domains such as e-commerce and social networks. However, as graph data volume increases, existing systems struggle to efficiently handle data that exceed memory capacity. This warrants out-of-core solutions where data resides on disk. Yet, such systems have faced challenges balancing speed of data…
Language models (LMs) are becoming increasingly important in the field of software engineering. They serve as a bridge between users and computers, improving code generated by LMs based on feedback from the machines. LMs have made significant strides in functioning independently in computer environments, which could potentially fast-track the software development process. However, the practical…
Large language models (LLMs) have introduced ground-breaking advancements to the field of natural language processing, such as improved machine translation, question-answering, and text generation. Yet, training these complex models poses significant challenges, including high resource requirements and lengthy training times.
Former methods addressing these concerns involved loss-scaling and mixed-precision strategies, which aimed to further training efficiency…
The standard Transformer models in machine learning have encountered significant challenges when applied to graph data due to their quadratic computational complexity, which scales with the number of nodes in the graph. Past efforts to navigate these obstacles have tended to diminish the key advantage of self-attention, which is a global receptive field, or have…
Training large-scale Generative AI models can be challenging due to the immense computational resources and time they require. This complexity gives rise to frequent instabilities, manifested as disruptive loss spikes during prolonged training periods. These instabilities can result in costly interruptions, requiring the training process to be paused and restarted. For example, the LLaMA2's 70-billion…