A recent research study by teams at Imperial College Business School, Samsung AI, and IBM has proposed an innovative solution for scientific discovery, using a framework that they call AI-Hilbert. The system is designed to discover natural laws by modeling axioms and laws as polynomials. The research leverages binary variables and logical constraints to solve polynomial optimization problems using mixed-integer linear or semidefinite optimization. The method has been validated by Positivstellensatz certificates, and can derive laws such as Kepler’s Law and the Radiated Gravitational Wave Power equation from hypotheses and data, ensuring a consistency with background theory and experimental data. Unlike deep learning methods, which may deliver unverifiable results, AI-Hilbert guarantees a scalable and reliable process for the discovery of new scientific laws.
The methodology put forward by the AI-Hilbert system integrates theory and data to develop hypotheses. The theory is used to reduce the search space and to offset the effects of noisy or sparse data, with the data itself used to address any inconsistencies or gaps in the theory. This involves constructing a polynomial optimization problem from the background knowledge and data, reducing it to a semidefinite optimization problem, and then resolving it to acquire a candidate formula and its formal derivation. The system incorporates hyperparameters to control model complexity and to define a distance metric that can quantify the relationship between the background theory and the law uncovered.
AI-Hilbert is designed to unearth polynomial laws that are consistent with experimental data and a background knowledge base of polynomial equalities and inequalities. The laws discovered are required to be axiomatically correct in terms of the background theory. Should the theory be inconsistent, the system has the ability to identify the sources of this, selecting the hypotheses that are best able to explain the data.
Experimental validation of the AI-Hilbert system shows that it is capable of deriving accurate symbolic expressions from complete and consistent background theories, even where numerical data is not available. It is also able to handle inconsistent axioms, and has proven to outperform other methods in various tests. Future directions for improving the stimulus include extending the framework to non-polynomial contexts, automating hyperparameter tuning, and improving scalability by optimizing the underlying computational techniques. This innovative approach could revolutionize the method of scientific discovery by uniting real algebraic geometry and mixed-integer optimization to derive new scientific laws from incomplete theories and noisy data.