In the rapidly advancing world of generative AI, the term “open source” has become widely used. Traditional open-source software refers to code freely available for anyone to view, modify, and distribute, fostering a sense of knowledge-sharing and collaborative innovation. However, in the sphere of AI, this definition can become blurred and problematic. Due to the complexity, size, and expense of AI models, transparent and fully accessible open-sourcing can be difficult.
Such a trend can be observed with OpenAI, an organization that initially advocated open research before shifting toward opacity in order to attract investment and protect its valuable models. Despite this, other AI companies such as Mistral, Meta, BLOOM, and xAI are releasing open-source models to promote research and create balance within the field.
To explore the true extent of open sourcing in AI models, researchers at Radboud University, Netherlands, assessed critical features of various AI models such as the accessibility of their source code, training data, model weights, research papers, and APIs. They found that many models labeled as “open source” showed varying degrees of access and transparency in these components, with models such as Meta’s LLaMA model and Google’s Gemma only being “open weight”, meaning they were publicly released for use without full transparency into other essential components.
The issue of transparency in AI models carries immense implications concerning accountability and oversight. Without total acccess to the code, data, and weights, understanding how models make decisions can be extremely difficult, leading to potential biases, errors, or forgery. A recent lawsuit between The New York Times and OpenAI exemplifies this problem, where the news organization claimed OpenAI used copyrighted material in its training data.
To address the “black box” problem of AI models, where understanding the model’s decision-making process is challenging, approaches like “Explainable AI” or XAI are being developed. XAI aims to build tools to make models more transparent by breaking down their decision-making processes and identifying influential data. However, this research requires access to AI’s source code, training data, and other key components, only highlighting the need for truly open-sourced AI.
The recently passed AI Act in the European Union further complicates the situation. The Act exempts certain open-source models from transparency requirements but leaves the definition of “open-source AI” vague. This ambiguity could create loopholes or incentives for “open-washing” where firms claim to be open source while keeping key aspects proprietary for legal and PR benefits.
This confusion around open-source AI models is amplified by the global nature of AI development and the differing regulatory approaches that countries may adopt. A fragmented ecosystem with varying degrees of openness depending on the model’s origin could result. Hence, ongoing collaboration between regulators and the academic community is vital to ensure an accurate understanding of AI technology and the principles of open-source. As the AI industry advances, the practical and legal implications of the term “open source” will undoubtedly carry more weight, influencing the future directions of AI research and deployment.