Modern bioprocess development is significantly influenced by machine learning (ML), which is a part of a wide range of analytics techniques, digitalisation, and automation methods. These tools generate large sets of experimental data which are crucial in the optimisation of bioprocessing methodologies. With the help of ML, these vast datasets can be efficiently examined to explore design spaces in bioprocessing. Specifically, ML techniques are employed in areas such as strain engineering, bioprocess optimisation, scale-up, as well as real-time monitoring and control.
Traditional sensors used in bioprocessing and chemical processing measure basic variables such as pressure, temperature, and pH. They often struggle to measure the concentration of other chemical species, requiring slower or invasive at-line or off-line methods. By contrast, Raman spectroscopy enables real-time sensing and differentiation of chemical species by leveraging the interaction of monochromatic light with molecules. When combined with ML and deep learning (DL) techniques, Raman spectral data can increase the prediction accuracy and robustness of analyte concentrations in complex mixtures.
In the domain of bioprocess development, ML tools such as the Automated Recommendation Tool for Synthetic Biology have been crucial. They leverage large, complex datasets to optimise features like biocatalyst design and metabolic pathway predictions, which leads to improved productivity and efficiency. ML techniques, including support vector machine and Gaussian process regression, predict optimal conditions for enzymatic activities and media composition, aiding in the optimisation of bioprocesses. They are also deployed for complex data analysis from microscopy images to support microfluidic-based high-throughput bioprocess development.
In commercial bioprocess development, Process Analytical Technology (PAT) ensures compliance with regulatory standards. ML techniques form the backbone of monitoring critical process parameters and maintaining the critical quality attributes of biopharmaceutical products. ML models, integrated with digital twins, also facilitate the real-time prediction of process variables where direct measurement is difficult, aiding in predictive process behaviour analysis and optimisation.
ML and DL methods can enhance traditional online sensors’ capabilities by improving the predictive accuracy and robustness of Raman spectroscopy. This advancement aids in monitoring multiple variables crucial to bioprocess control. Examples of successful applications include predicting concentrations of biomolecules like glucose and lactate.
Embracing open-source methodologies and databases is fundamental for the rapid advancement of ML in bioprocess development, as it fosters collaboration and data accessibility. As ML methods, including deep learning and reinforcement learning, continue to improve with computational capabilities, they offer transformative potential for optimising bioprocesses and shaping a data-driven future in biotechnology. Challenges such as data scarcity, overfitting, and underfitting can be addressed through techniques like transfer learning and ensemble methods.