DeepSeek’s new AI model seems to be one of the best open-source challenger

0
99
DeepSeek's new AI model seems to be one of the best open-source challenge

The model, DeepSeek V3, was developed by DeepSeek and released on Wednesday under a license that allows developers to download and modify it for most applications, including commercial ones.

DeepSeek V3 can handle a range of text-based workloads and tasks, such as coding, translation, essays, and emails based on descriptive prompts.

According to DeepSeek’s internal benchmarking, DeepSeek V3 outperforms both downloadable, “openly” available models and “closed” AI models that can only be accessed through the API. In the subset of programming competitions held on the Codeforces platform, DeepSeek outperforms other models, including Llama 3.1 405B by Meta, GPT-4o by OpenAI, and Qwen 2.5 72B by Alibaba.

DeepSeek V3 also outperforms competitors in the Aider Polyglot benchmark, designed to measure, among other things, whether a model can successfully write new code that integrates into existing code.

DeepSeek claims that DeepSeek V3 was trained on a dataset of 14.8 trillion tokens. In data science, tokens are used to represent bits of raw data – 1 million tokens equals about 750,000 words.

It’s not just the training set that’s massive. DeepSeek V3 is huge: 671 billion parameters, or 685 billion on the Hugging Face AI development platform. (Parameters are internal variables that models use to make predictions or decisions.) This is about 1.6 times more than Llama 3.1 405B, which has 405 billion parameters.

The number of parameters often (but not always) correlates with skill; models with more parameters tend to outperform models with fewer parameters. But large models also require more powerful hardware to run. An unoptimized version of DeepSeek V3 would need a bank of high-performance GPUs to answer questions at a reasonable speed.

While not the most practical model, DeepSeek V3 is an achievement in some respects. DeepSeek was able to train the model using a data center with Nvidia H800 GPUs in just two months – GPUs that the U.S. Department of Commerce recently banned Chinese companies from purchasing. The company also claims to have spent only $5.5 million to train DeepSeek V3, a fraction of the cost of developing models like OpenAI’s GPT-4.

The downside is that the model’s political views are a bit… limited. Ask DeepSeek V3 about Tiananmen Square, for example, and it won’t answer.

Because DeepSeek is a Chinese company, China’s Internet regulator conducts benchmarking to ensure that its models’ answers “embody core socialist values.” Many Chinese artificial intelligence systems refuse to answer topics that may draw the ire of regulators, such as speculation about the Xi Jinping regime.

DeepSeek, which in late November unveiled DeepSeek-R1, a response to OpenAI’s o1 “reasoning” model, is an interesting entity. It is backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that uses artificial intelligence to make trading decisions.

High-Flyer builds its own server clusters to train models, one of the latest of which reportedly has 10,000 Nvidia A100 GPUs and costs 1 billion yen (~138 million USD). Founded by Liang Wenfeng, a computer science graduate, High-Flyer aims to achieve “super-intelligent” AI through its organization DeepSeek.

In an interview earlier this year, Wenfeng characterized closed-source AI such as OpenAI as a “temporary” moat. “That hasn’t stopped others from catching up,” he said.

LEAVE A REPLY

Please enter your comment!
Please enter your name here