The official definition open source AI is finally out, but Meta strongly disagrees.
Having a vibrant open source AI ecosystem is considered to be of crucial importance. As Mark Zuckerberg put it, “Open source will ensure that more people around the world have access to the benefits and opportunities of AI, that power isn’t concentrated in the hands of a small number of companies, and that the technology can be deployed more evenly and safely across society.”
But, despite its importance, we didn’t have a definition of open source AI…until now.
After two years of deliberation with academics, philosophers, and content creators, that definition finally comes from the Open Source Initiative. For those who don’t know, the Open Source Initiative is the undisputed authority of open source software, as they were the ones that initially defined open source software all the way back in 1998.
So, it shouldn’t be a surprise that people are taking this definition very seriously.
According to the Open Source Initiative, for an AI to be considered open source, it must grant the freedoms to:
Use the system for any purpose and without having to ask for permission.
Study how the system works and its components.
Modify the system for any purpose, including to change its outputs.
Share the system for others to use with or without modifications, for any purpose.
In practical terms, this means that open source AI models must provide enough details about how the AI was trained so that others can re-create it, including the complete code, settings, and weights.
The definition fits well with the existing definition for general open source software, but, interestingly, it doesn’t fit well with Meta’s Llama models, the current largest “open source” models.
As it stands, Llama is publicly available for download and use, but it restricts commercial use and blocks access to its training data. By definition, that disqualifies it from being considered an open source AI model.
Naturally, Meta doesn’t buy it, telling The Verge, “There is no single open source AI definition, and defining it is a challenge because previous open source definitions do not encompass the complexities of today’s rapidly advancing AI models.”
Why doesn’t Meta just open up the model for commercial use and release the training data? After all, Zuckerberg has already proclaimed that open source is good for Meta, citing that open-sourcing Llama would ensure it develops into a complete ecosystem, keep it on the cutting edge, and save AI developers from the headaches that Zuckerberg himself faced building on Apple.
The simple answer is that it’s not in their best interest to do so, as fully open-sourcing their models would open themselves up to copyright lawsuits (as they have internally admitted that there is copyrighted material in their training data) and erode their competitive advantage.
So, we are at an impasse. The Open Source Initiative is just a nonprofit. They can’t force Meta to stop marketing their models as open source. All they can do is criticize the company for “polluting” the term open source.
But, among those in the know, it is now clear that Meta is not a true steward of open source AI.
These things are should be clear now, neither we are going to see another fight like WordPress and WP Engine.
Fantastic post! Your insights on finding market fit and validating ideas are super useful. Tools like EchoAPI have been game-changers for me in this area—using it for API design lets me quickly bring concepts to life and test them without a huge development cost (cz I'm broke lol). It’s been invaluable for focusing on ideas with real potential!