Wikipedia Deep Dive

Open-source artificial intelligence

12 min read

Based on Wikipedia: Open-source artificial intelligence

The Battle Over What "Open" Really Means

In January 2025, a Chinese company called DeepSeek released an artificial intelligence model that sent shockwaves through Silicon Valley. The model was powerful, it was cheap to run, and most provocatively, it was released under an open license. This wasn't just a technical achievement—it was a statement. China had decided that the path to AI dominance ran through openness, not secrecy.

But here's the twist: what counts as "open" in artificial intelligence has become one of the most contentious debates in technology. And the answer matters enormously—for national security, for scientific progress, and for whether a handful of corporations will control the most transformative technology of our era.

The Promise and the Problem

Open-source software changed the world. Linux runs most of the internet's servers. Firefox democratized web browsing. Android put smartphones in billions of pockets. The recipe was simple: release your code, let anyone use it, let anyone improve it, and watch as a global community of developers builds something greater than any single company could create alone.

Artificial intelligence should work the same way, in theory. Release the code, release the model, let researchers around the world tinker and improve and innovate. The rising tide lifts all boats.

Except AI isn't quite like traditional software.

A conventional program is essentially a recipe: here are the instructions, follow them, and you'll get the same result every time. An AI model is more like a brain that's been trained through experience. The "recipe" has three parts: the code that defines the architecture, the data the model learned from, and the resulting neural network weights—billions of numbers that encode everything the model knows.

You can release all three. Or you can release just some of them. And this is where the fighting starts.

The Openwashing Problem

Meta, the company formerly known as Facebook, released a series of AI models called Llama. The company loudly proclaimed these models as "open source." The press repeated the claim. Developers downloaded the models by the millions.

But there was a catch. Several catches, actually.

Meta released the model weights—the trained brain—but not the training data. Without the data, you can use the model, but you can't really understand why it behaves the way it does. You can't verify whether it was trained on copyrighted material, or biased datasets, or information that might compromise privacy.

More significantly, Meta's license included restrictions. You couldn't use Llama for certain purposes. If your company had more than 700 million monthly users, you needed special permission. And critically for international relations, the license prohibited certain military uses.

The Open Source Initiative, which has been the official arbiter of what counts as "open source" since 1998, was not amused. They spent two years consulting with experts, trying to figure out how to adapt their definition of open source—originally designed for traditional software—to the strange new world of AI.

In 2024, they published their answer: the Open Source AI Definition, version 1.0. To qualify as truly open source, an AI system must release the code for processing data, training the model, and running inference. The model weights must be available. And while you don't necessarily have to release the exact training data—which might include private information that can't be shared—you do have to provide enough detail about that data for someone else to recreate something substantially similar.

By this definition, Llama isn't open source. It's something else—open weights, perhaps, or source available. The Open Source Initiative and others have accused Meta of "openwashing," using the prestige of open source to gain goodwill while keeping meaningful control.

Why Companies Play This Game

The cynical answer is obvious: marketing. The open source label attracts developers. It attracts talent—engineers want to work on projects they can show off, projects with communities, projects that feel like they're advancing human knowledge rather than just corporate profits.

But there's a more interesting strategic calculation happening.

If you're Meta, you're not primarily an AI company. You're a social media company that needs AI to power your products. You don't necessarily need to have the best AI model in the world. What you need is for AI to be a commodity—something cheap and widely available—so you can focus on your actual business of connecting people and selling ads.

Releasing Llama, even with restrictions, floods the market with capable AI. It makes it harder for OpenAI or Anthropic or Google to charge premium prices. It undermines their business model while costing Meta relatively little.

This is the same strategy IBM used with Linux decades ago. IBM wasn't a software company—it sold hardware and services. By supporting Linux, IBM commoditized operating systems, which hurt Microsoft and helped IBM's actual business.

The Geopolitical Dimension

China's embrace of open AI isn't just about technology. It's about reducing dependence on American companies and American export controls.

When DeepSeek released their R1 reasoning model in January 2025, it demonstrated something important: you didn't need access to the most advanced American chips to build competitive AI. The model was trained on older hardware that wasn't subject to export restrictions. And by releasing it openly, China was making a statement about which side of the open-versus-closed divide it stood on.

The message to the rest of the world was clear: if you want AI and you don't want to depend on Silicon Valley, we've got you covered.

This creates fascinating dynamics. Countries and companies that don't have their own leading AI models—which is to say, almost everyone—suddenly have options. They can use American proprietary models and pay American prices and accept American terms of service. Or they can use open models, whether from China or from European efforts or from academic projects, and maintain more independence.

Open source AI, in other words, has become a tool of geopolitical competition. Meta's Llama, despite its restrictions, was adopted by American defense contractors like Lockheed Martin and Oracle. This happened partly as a reaction to Chinese researchers using an earlier Llama version to develop military AI tools—a use that technically violated Meta's license but couldn't really be enforced.

The cat was out of the bag. Once you release model weights, you can't take them back.

The Security Paradox

This irreversibility is one of the core tensions in open AI. With traditional software, if someone finds a security flaw, you can patch it. You release an update, users install it, problem solved.

With open AI models, there are no updates. Once the weights are public, they're public forever. If researchers discover that a model can be manipulated into producing dangerous content, or that it has unexpected biases, or that it leaks private information from its training data—there's no fix. The old version will always exist on someone's hard drive somewhere.

Critics worry about bioterrorism. An open model could potentially help bad actors design pathogens or synthesize dangerous chemicals. With proprietary models, companies can implement safeguards—refusing certain requests, filtering certain outputs. With open models, anyone can remove those safeguards.

A White House report in July 2024 examined this question and concluded that the evidence didn't yet justify restricting the release of model weights. The report acknowledged concerns but noted that the main barriers to actual terrorism remain physical—getting the materials and equipment—rather than informational.

There's also a counterargument: security through transparency. Open models can be audited. Researchers can examine them for flaws. An analysis of over 100,000 open models on platforms like Hugging Face found that more than 30 percent had high-severity security vulnerabilities—but at least someone was able to check. With closed models, you're trusting the company that built them.

A Brief History of Opening Up

The story of open AI really begins with the open source software movement itself. Richard Stallman, a programmer at the Massachusetts Institute of Technology, became frustrated in the early 1980s with proprietary software that he couldn't modify or share. In 1985, he founded the Free Software Foundation, advocating for software that users could freely run, study, modify, and distribute.

Stallman's motivations were philosophical—he believed proprietary software was fundamentally unethical. But the practical benefits of open source became increasingly obvious. Linux, created by Linus Torvalds in 1991, eventually became the dominant operating system for servers and supercomputers. Companies realized they could get better software by collaborating than by competing.

AI developed in parallel but remained largely academic until the 2000s. The field had its own sharing culture—researchers published papers, shared datasets, released code. But AI was still mostly a research curiosity, not a commercial product.

The key frameworks emerged gradually. OpenCV, a computer vision library, was released in 2000. Torch, a deep learning framework, came out in 2002 and was made fully open source in 2011. These tools lowered the barrier to entry, letting anyone with a computer experiment with AI.

Then came the deep learning revolution. In 2012, a neural network called AlexNet won an image recognition competition by a shocking margin, demonstrating that deep learning could outperform traditional approaches. Suddenly everyone wanted to do AI, and the open frameworks—TensorFlow from Google in 2015, PyTorch from Facebook in 2016—became essential infrastructure.

OpenAI was founded in 2015 with an explicit mission to create open source AI that benefited humanity. The name was not subtle. But as the company's models grew more powerful, the commitment to openness wavered.

When OpenAI announced GPT-2 in 2019, they initially planned to keep it private, citing concerns about misuse. The backlash was immediate and intense. Who were they to decide what the world could and couldn't have? They relented, releasing the code three months later.

By GPT-3 in 2020, the openness was gone entirely. The model was only available through an API, with OpenAI controlling access and charging for usage. By ChatGPT in 2022, OpenAI had completed its transformation from idealistic nonprofit to aggressive commercial enterprise. The name had become ironic.

The Fully Open Alternatives

The retreat of OpenAI created a vacuum. Various groups rushed to fill it.

EleutherAI, a grassroots collective of researchers, began training and releasing fully open language models. Their GPT-NeoX and Pythia models weren't as capable as the commercial alternatives, but they were genuinely open—code, data, weights, everything.

Academic institutions continued their tradition of sharing. The BigScience project, a collaboration of over 1,000 researchers from 60 countries, created BLOOM, a multilingual language model trained on data from 46 languages.

But truly open large language models remained rare. The cost of training a competitive model runs into the tens or hundreds of millions of dollars. Most organizations simply couldn't afford it.

In September 2025, a Swiss consortium added to the short list by releasing Apertus, a fully open model. And in December 2025, the Linux Foundation—the organization that coordinates Linux development—created the Agentic AI Foundation to steward open source AI projects, taking over some protocols originally created by OpenAI, Anthropic, and Block.

The infrastructure for open AI is being built. Whether it can compete with the billions being poured into proprietary development remains an open question.

The Practical Implications

For individual developers and researchers, open AI—even the partially-open kind—has been transformative. You can download Llama and run it on your laptop. You can fine-tune it for specific tasks. You can build applications without paying per-query API fees or worrying about rate limits.

For companies, the calculus is more complex. Some businesses have been scared off by legal uncertainty. If Meta changes Llama's license terms, what happens to products built on it? If a partially open model was trained on copyrighted data, are you liable for using it?

For healthcare, the stakes are particularly high. A Nature editorial warned that medical institutions might become dependent on AI models that could be taken down at any time, that are difficult to evaluate, and that might threaten patient privacy. The authors argued for collaborative development of truly open medical AI—models whose code and training data are transparent and auditable.

This transparency argument extends beyond healthcare. In criminal justice, finance, and other high-stakes domains, people increasingly demand "explainable AI"—systems that can justify their decisions in human-readable terms. Open models make this easier, or at least possible. With closed models, you're trusting a black box.

The Cost Barrier

Even truly open AI isn't free in practice. Downloading PyTorch costs nothing. Downloading a model's weights costs nothing. But training your own model from scratch? That costs millions.

This is different from traditional open source software. If you want to modify Linux, you download the code and start hacking. Your laptop is sufficient. If you want to train a competitive language model, you need thousands of specialized processors running for months. You need vast datasets. You need expertise that commands high salaries.

Open AI democratizes use but not necessarily development. You can run the models that others have built. Meaningfully improving them or building alternatives remains the province of well-funded organizations.

This may change as hardware becomes cheaper and techniques become more efficient. DeepSeek's success suggested that competitive models might be trainable for less than previously thought. But for now, the barrier remains high.

Where This Leaves Us

The debate over open source AI is really several debates at once.

There's the definitional debate: what should count as "open"? The Open Source Initiative has provided one answer, but the industry hasn't universally adopted it. Marketing departments continue to use "open" however they like.

There's the strategic debate: should AI be open at all? The potential benefits—transparency, collaboration, democratization—compete with potential risks—misuse, irreversibility, concentration of power in whoever trains the models.

There's the geopolitical debate: openness has become a competitive strategy, with different nations and companies using it to undercut rivals, build ecosystems, and attract allies.

And there's the practical debate: for any given application, does it make sense to use open models or proprietary ones? The answer depends on your resources, your risk tolerance, your values, and your specific needs.

What seems clear is that the future won't be entirely open or entirely closed. We're heading toward a mixed ecosystem, with fully open academic models, partially open corporate models, and fully proprietary commercial offerings. Different niches for different needs.

The question is who gets to set the terms. And that question—like so much in technology—will be answered by whoever builds the most compelling systems and convinces the most people to use them.

The battle over what "open" means is really a battle over the future of AI itself.