Artificial Intelligence (AI) has demonstrated transformative potential in scientific research, particularly when scalable AI systems are applied to high-performance computing (HPC) platforms. This necessitates the integration of large-scale computational resources with expansive datasets to tackle complex scientific problems.
AI models like ChatGPT serve as exemplars of this transformative potential. The success of these models can be attributed to two key advancements: the development of the transformer architecture, and the capacity to train on extensive amounts of internet-scale data. Applications of these technologies have resulted in significant scientific breakthroughs in areas such as black hole modeling, fluid dynamics, and protein structure prediction.
Scalable AI has significantly impacted drug discovery, where transformer-based language models (LLMs) have revolutionized the exploration of chemical space. LLMs use vast datasets and specific task fine-tuning to autonomously learn and predict molecular structures, expediting the discovery process. They utilize tokenization and mask prediction techniques, integrating pre-trained models for molecules and protein sequences with small labeled datasets to enhance performance.
HPC is essential for such scientific advancements. Different scientific problems require varying levels of computational scale, and HPC provides the infrastructure to meet these diverse needs. AI for Science (AI4S) differs from consumer-centric AI, often dealing with sparse, high-precision data from costly experiments or simulations.
The application of AI in the scientific realm differs from consumer AI in terms of data handling and precision requirements. Scientific models require high-precision floating-point numbers and strict adherence to physical laws. This calls for integrating machine learning with traditional physics-based approaches, particularly in surrogate models replacing parts of larger simulations.
Achieving scalability in AI systems can be accomplished through both model-based and data-based parallelism. For instance, training a large model like GPT-3 on a single NVIDIA V100 GPU could take centuries, but parallel scaling techniques can reduce this time to just over a month on thousands of GPUs. Parallel scaling techniques include data-parallel and model-parallel approaches.
The evolution of AI for science also includes the development of hybrid AI-simulation workflows. These workflows combine traditional simulations with AI models to enhance prediction accuracy and decision-making processes.
Several trends are shaping scalable AI for science. The shift towards mixture-of-experts (MoE) models, which are sparsely connected and more cost-effective than monolithic models, is gaining traction. The concept of an autonomous laboratory driven by AI, conducting real-time experiments and analyses is another exciting development.
Despite the progress, there are limitations. For example, the constraints of transformer-based models have renewed interest in linear recurrent neural networks (RNNs), which offer greater efficiency for long token lengths. Furthermore, as scientists remain cautious of AI methods, developing tools to elucidate the rationale behind AI predictions is crucial. Techniques like Class Activation Mapping (CAM) and attention map visualization can offer insights into how AI models make decisions, fostering trust and adoption in the scientific community.