Skip to content Skip to footer

Progress in Chemical Illustrations and AI: Revolutionizing the Drug Discovery Process

Advances in technology over the past century, specifically the proliferation of computers, has facilitated the development of molecular representations that can be understood by these machines, assisting the process of drug discovery. Initial representations of molecules were simplified, showing only bonds and atoms. However, as the complexity of computational processing increased, more sophisticated representations were needed.

Chemical notations were created to encode molecular structures, with early versions such as the empirical formula providing information on atomic composition, but lacking detail on connectivity or geometry. The advent of computers facilitated the rapid digital storage and modification of chemical data, leading to more sophisticated machine-readable notations and algorithms capable of visualising molecular structures in 2D and 3D.

The chemical representations used in AI-driven drug discovery are crucial, encoding structural information for computational analysis. In this process, molecular graphs represent the most common machine-readable representation. Other notations are also used, highlighting the importance of these representations in AI applications, such as Machine Learning (ML) models.

One of the key types of molecular graph representations is Ctabs (Connection tables), which include information on atoms, bonds, atoms lists, Stext, and properties blocks. They provide an efficient description of molecular structures, specifying atom and bond details. These tables help in reducing file size by avoiding the need for explicit hydrogen representation.

MDL (now BIOVIA) file formats are built upon Ctabs, including Molfiles for single molecules and extensions such as SD, RXN, RD, and RG files for additional data and reactions. These compact, systematic methods of storing and transferring chemical information support a wide range of cheminformatics applications.

Introduced in 1988, SMILES is a popular notation for encoding molecular structures, allowing multiple representations of the same molecule. In 2006, the International Chemical Identifier (InChI) was launched offering a standard, open-source notation with several layers providing detailed molecular representation.

Chemical representations include various methods to model molecules, reactions, and macromolecules. Numerous encoded structural keys represent specific chemical groups, hash fingerprints represent molecular patterns and reactions are described using unique formats. These methods facilitate accurate analysis and prediction in chemical informatics and drug discovery.

Graphically, molecules are represented in both 2D and 3D models, each of which serves a crucial role in their visualisation and analysis. 2D models depict skeletal structures, though challenges in layout and rendering persist. 3D representations, often rendered using software such as Avogadro and PyMOL, are particularly beneficial for studies examining docking, protein-ligand interactions and mechanism-based studies. These representations enhance overall understanding of cheminformatics and drug discovery.

Leave a comment

0.0/5