New open-weight LLMs from OpenAI are already available

0
173
New open-weight LLMs from OpenAI are already available

In 2019, for the first time since GPT-2, OpenAI is releasing new large open-weight language models. This is an important milestone for the company, which has been increasingly accused of abandoning its original mission to “bring the benefits of artificial general intelligence to all of humanity.” Now, after numerous delays due to additional security testing and refinements, gpt-oss-120b and gpt-oss-20b are available for download from Hugging Face.

Before going any further, it is worth explaining what exactly OpenAI does. The company does not release new open-source models that include the underlying code and data that the company used to train them. Instead, it shares the weights-that is, the numerical values that the models have learned to assign to inputs during training-that inform new systems. According to Benjamin C. Lee, a professor of engineering and computer science at the University of Pennsylvania, open-weight models and open-source models serve two very different purposes.

“An open-weight model provides the values that have been learned during the training of a large language model, and they essentially allow you to use the model and build on it. You can use the model out of the box, or you can override or modify it for a specific application by adjusting the weights the way you like,” he said. If commercial models are an absolute black box, and an open-source system allows full customization and modification, then open-weighing AI is somewhere in between.

OpenAI does not release open-source models, likely because a competitor could use the training data and code to reverse engineer its technology. “An open source model is more than just a scale. It can also potentially include the code that is used to run the training process,” said Lee. And in practice, the average person won’t get much use out of an open-source model unless they have a farm of high-end NVIDIA GPUs driving up their electricity bill. (However, they can be useful for researchers who want to learn more about the data that a company used to train its models, and there are several open-source models available, such as Mistral NeMo and Mistral Small 3.)

Aside from that, the main difference between gpt-oss-120b and gpt-oss-20b is how many parameters each offers. If you’re not familiar with the term, parameters are settings that a large language model can change to produce a response. The names are a bit confusing, but gpt-oss-120b is a model with 117 billion parameters, while its smaller sister has 21 billion parameters.

In practice, this means that more powerful hardware is required to run the gpt-oss-120b, as OpenAI recommends using a single GPU with 80 GB of RAM for efficient use. The good news is that the company claims that any modern computer with 16 GB of RAM can run gpt-oss-20b. As a result, you can use the smaller model to create something like a vibe code on your own computer without an internet connection. What’s more, OpenAI makes the models available through the Apache 2.0 license, which gives people a lot of flexibility to modify the systems to suit their needs.

Although this is not a new commercial release, OpenAI claims that the new models are largely comparable to their proprietary systems. The only limitation of the oss models is that they don’t offer multimodal input, meaning they can’t handle images, video, and voice. For these capabilities, you will still have to turn to cloud-based and commercial OpenAI models, which both new open-weight systems can be configured to do. Beyond that, however, they offer many of the same capabilities, including chain reasoning and tooling. This means that the models can solve more complex problems by breaking them down into smaller steps, and if they need additional help, they know how to use the Internet and coding languages such as Python.

Additionally, OpenAI trained the models using methods the company has previously used in the development of o3 and its other recent advanced systems. In competition-level coding, gpt-oss-120b scored only slightly worse than o3, OpenAI’s current state-of-the-art reasoning model, while gpt-oss-20b fell between o3-mini and o4-mini. Of course, we’ll have to wait for real-world testing to see how the two new models compare to commercial offerings from OpenAI and its competitors.

The release of the gpt-oss-120b and gpt-oss-20b and OpenAI’s apparent desire to double the number of open source models comes after Mark Zuckerberg announced that Meta would be releasing fewer such systems to the public. Previously, open source software was central to Zuckerberg’s messages about his company’s AI efforts, and the CEO once said of closed source systems: “To hell with that.” At least among the sect of tech enthusiasts eager to work with LLMs, the timing, random or not, is somewhat disconcerting to Meta.

“You could argue that open-weight models democratize access to the largest, most powerful models for people who don’t have these massive, hyperscale data centers with lots of GPUs,” said Prof. Li. “It allows people to use the results or products of a months-long training process in a huge data center without having to invest in that infrastructure themselves. From the perspective of someone who just wants to get a really workable model to start with and then build it for some application. I think open weight models can be very useful.”

OpenAI is already working with several different organizations to deploy their own versions of these models, including AI Sweden, the country’s national center for applied artificial intelligence. At OpenAI’s press briefing prior to today’s announcement, the team behind gpt-oss-120b and gpt-oss-20b stated that they view these two models as an experiment; the more people use them, the more likely it is that OpenAI will release additional open-weight models in the future.

LEAVE A REPLY

Please enter your comment!
Please enter your name here