茉莉花新闻网

中華青年思想與行動的聚合地

DeepSeek是如何把构建AI的价格“打下来”的

CADE METZ

2025年2月13日

深度求索使用了一些技术手段,大大降低了构建系统的成本。 Caroline Brehman/EPA, via Shutterstock

Last month, U.S. financial markets tumbled after a Chinese start-up called DeepSeek said it had built one of the world’s most powerful artificial intelligence systems using far fewer computer chips than many experts thought possible.

上个月,一家名为“深度求索”(DeepSeek)的中国初创公司表示,它用比许多专家认为的最低限度要少得多的芯片打造出世界上最强大的人工智能系统之一,随后美国金融市场出现暴跌

A.I. companies typically train their chatbots using supercomputers packed with 16,000 specialized chips or more. But DeepSeek said it needed only about 2,000.

人工智能公司通常使用装有1.6万枚或更多专用芯片的超级计算机来训练聊天机器人。但深度求索表示,该公司只用了大约2000枚芯片。

As DeepSeek engineers detailed in a research paper published just after Christmas, the start-up used several technological tricks to significantly reduce the cost of building its system. Its engineers needed only about $6 million in raw computing power, roughly one-tenth of what Meta spent in building its latest A.I. technology.

正如深度求索工程师在圣诞节甫一过后发表的一篇研究论文中详细说明的那样,这家初创公司使用了一些技术手段,显著降低了系统构建成本。它的工程师只需要约600万美元的纯算力,大约是Meta在构建其最新人工智能技术时所花费的十分之一。

What exactly did DeepSeek do? Here is a guide.

深度求索到底做了什么?这里是一些介绍。

How are A.I. technologies built?

AI技术是如何构建的?

The leading A.I. technologies are based on what scientists call neural networks, mathematical systems that learn their skills by analyzing enormous amounts of data.

领先的人工智能技术基于科学家所说的神经网络,即通过分析大量数据来学习技能的数学系统。

The most powerful systems spend months analyzing just about all the English text on the internet as well as many images, sounds and other multimedia. That requires enormous amounts of computing power.

最强大的系统需要花费数月时间分析互联网上几乎所有的英文文本,以及许多图像、声音和其他多媒体内容。这需要庞大的运算能力。

About 15 years ago, A.I. researchers realized that specialized computer chips called graphics processing units, or GPUs, were an effective way of doing this kind of data analysis. Companies like the Silicon Valley chipmaker Nvidia originally designed these chips to render graphics for computer video games. But GPUs also had a knack for running the math that powered neural networks.

大约15年前,人工智能的研究者意识到,一种被称为图形处理单元(GPU)的专用计算机芯片是进行这种数据分析的有效方式。像硅谷芯片制造商英伟达这样的公司最初设计这些芯片是为了在电脑游戏中渲染图形。但GPU也擅长运行推动神经网络的数学运算。

As companies packed more GPUs into their computer data centers, their A.I. systems could analyze more data.

随着各家公司将更多的GPU集成到计算机数据中心,它们的人工智能系统可以分析更多的数据。

But the best GPUs cost around $40,000, and they need huge amounts of electricity. Sending the data between chips can use more electrical power than running the chips themselves.

但最先进的GPU每块售价在4万美元上下,而且需要大量的电力。在芯片之间传输数据比运行芯片本身更耗电。

How was DeepSeek able to reduce costs?

深度求索是如何把成本降下来的?

It did many things. Most notably, it embraced a method called “mixture of experts.”

它做了很多工作。其中最值得注意的是,它采用了一种所谓的“混合专家”法。

Companies usually created a single neural network that learned all the patterns in all the data on the internet. This was expensive, because it required enormous amounts of data to travel between GPU chips.

公司通常会创建一个单一的神经网络,学习互联网上所有数据的所有模式。这样做的成本很高,因为它需要大量的数据在GPU芯片之间传输。

If one chip was learning how to write a poem and another was learning how to write a computer program, they still needed to talk to each other, just in case there was some overlap between poetry and programming.

如果一枚芯片正在学习如何写一首诗,而另一枚芯片正在学习如何编写计算机程序,它们还是需要相互交流,以防诗歌和编程之间出现某种重叠。

With the mixture of experts method, researchers tried to solve this problem by splitting the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics and so on. There might be 100 of these smaller “expert” systems. Each expert could concentrate on its particular field.

研究人员尝试通过混合专家法来解决这个问题,他们将系统拆分成许多神经网络:一个用于诗歌,一个用于计算机编程,一个用于生物学,一个用于物理学,等等。这样较小的专家系统可能多达100个。每个专家都可以专注在特定领域。

Many companies have struggled with this method, but DeepSeek was able to do it well. Its trick was to pair those smaller “expert” systems with a “generalist” system.

许多公司在尝试这种方法时并不顺利,但深度求索能够很好地做到这一点。它的诀窍是将那些较小的“专家”系统与一个“通才”系统配对。

The experts still needed to trade some information with one another, and the generalist — which had a decent but not detailed understanding of each subject — could help coordinate interactions between the experts.

专家系统仍然需要相互交换一些信息,而通才系统可以帮助协调专家系统之间的互动。通才系统对每个主题都有不错的理解,但比较粗略。

It is a bit like an editor’s overseeing a newsroom filled with specialist reporters.

这有点像一个主编负责一个全是专业记者的新闻编辑室。

And that is more efficient?

这样做的效率更高吗?

Much more. But that is not the only thing DeepSeek did. It also mastered a simple trick involving decimals that anyone who remembers his or her elementary school math class can understand.

高很多。但深度求索做的不仅仅是这些。它还掌握了一个涉及小数的简单技巧,只要你还记得小学数学,就能理解。

There is math involved in this?

这里涉及数学?

Remember your math teacher explaining the concept of pi. Pi, also denoted as π, is a number that never ends: 3.14159265358979 …

还记得你的数学老师讲过的π吗?圆周率,也就是π,是一个无限数字:3.14159265358979……

You can use π to do useful calculations, like determining the circumference of a circle. When you do those calculations, you shorten π to just a few decimals: 3.14. If you use this simpler number, you get a pretty good estimation of a circle’s circumference.

你可以用π来做一些有用的计算,比如确定圆的周长。当你做这些计算时,你会把π缩短到仅几位小数:3.14。使用这个更简单的数字,你就能很好地估算出一个圆的周长。

DeepSeek did something similar — but on a much larger scale — in training its A.I. technology.

深度求索在训练它的人工智能技术时做了类似的事情,不过规模要大得多。

The math that allows a neural network to identify patterns in text is really just multiplication — lots and lots and lots of multiplication. We’re talking months of multiplication across thousands of computer chips.

让神经网络识别文本模式的数学实际上只是乘法——很多很多很多的乘法。我们说的是数千枚计算机芯片进行持续数月的乘法运算。

Typically, chips multiply numbers that fit into 16 bits of memory. But DeepSeek squeezed each number into only 8 bits of memory — half the space. In essence, it lopped several decimals from each number.

通常,芯片会将能放入16位存储器的数字相乘。但深度求索将每个数字压缩到只有八位的存储器中,节省了一半的空间。实际上就是在每个数字中删掉了几位小数。

This meant that each calculation was less accurate. But that didn’t matter. The calculations were accurate enough to produce a really powerful neural network.

这意味着每次计算的准确性都会降低。但这并不重要。这些计算准确度足以产生一个非常强大的神经网络。

That’s it?

就这么简单?

Well, they added another trick.

这个嘛,他们另外还有一招。

After squeezing each number into 8 bits of memory, DeepSeek took a different route when multiplying those numbers together. When determining the answer to each multiplication problem — making a key calculation that would help decide how the neural network would operate — it stretched the answer across 32 bits of memory. In other words, it kept many more decimals. It made the answer more precise.

在将每个数字塞进八位存储器后,深度求索在将这些数字相乘时采取了不同的方法。在确定每个乘法问题的答案时——进行有助于决定神经网络将如何运作的关键计算——它将答案扩展到32位存储器中。换句话说,这样就保留了更多的小数,使得答案更为精确。

So any high school student could have done this?

所以高中生都能做到这一点吗?

Well, no. The DeepSeek engineers showed in their paper that they were also very good at writing the very complicated computer code that tells GPUs what to do. They knew how to squeeze even more efficiency out of these chips.

当然不是。深度求索的工程师在论文中表明,他们也非常擅长编写非常复杂的计算机代码,告诉GPU该做什么。他们知道如何从这些芯片中榨取更高的效率。

Few people have that kind of skill. But serious A.I. labs have the talented engineers needed to match what DeepSeek has done.

具备这种技能的人不多。但一个人工智能实验室只要有心成事,就能找到与深度求索所做的事情相匹配的优秀工程师。

Then why didn’t they do this already?

那为什么他们没有早些做到这一点呢?

Some A.I. labs may be using at least some of the same tricks already. Companies like OpenAI do not always reveal what they are doing behind closed doors.

一些人工智能实验室可能已经在使用相同的技巧了,至少是其中的一部分。像OpenAI这样的公司并不总是透露他们在幕后所做的事情。

But others were clearly surprised by DeepSeek’s work. Doing what the start-up did is not easy. The experimentation needed to find a breakthrough like this involves millions of dollars — if not billions — in electrical power.

但显然还是有人对深度求索的工作感到惊讶。要做到这家初创公司所做的事情,并不简单。找到这样的突破点所需的实验,需要用到数百万甚至数十亿美元的电力。

In other words, it requires enormous amounts of risk.

换句话说,需要冒巨大的风险。

“You have to put a lot of money on the line to try new things — and often, they fail,” said Tim Dettmers, a researcher at the Allen Institute for Artificial Intelligence in Seattle who specializes in building efficient A.I. systems and previously worked as an A.I. researcher at Meta.

西雅图艾伦人工智能研究所的研究员蒂姆·德特默斯说,“你必须投入大量资金来尝试新事物——而且它们往往会以失败告终。”德特默斯从事构建高效人工智能系统的探究,之前曾在Meta担任人工智能研究员。

“That is why we don’t see much innovation: People are afraid to lose many millions just to try something that doesn’t work,” he added.

“这就是为什么我们看到的创新没有那么多的原因:人们害怕大量投入都打了水漂,”他补充道。

Many pundits pointed out that DeepSeek’s $6 million covered only what the start-up spent when training the final version of the system. In their paper, the DeepSeek engineers said they had spent additional funds on research and experimentation before the final training run. But the same is true of any cutting-edge A.I. project.

许多专家指出,深度求索的600万美元只涵盖了这家初创公司在训练系统最终版本时的费用。深度求索的工程师在论文中表示,他们在最终的训练运行之前,还在研究和实验上花费了额外的资金。但任何尖端人工智能项目都是如此。

DeepSeek experimented, and it paid off. Now, because the Chinese start-up has shared its methods with other A.I. researchers, its technological tricks are poised to significantly reduce the cost of building A.I.

深度求索进行了尝试,并取得了成功。现在,由于这家中国初创公司已经与其他人工智能研究人员分享了方法,它所采用的技术手段有望显著降低构建人工智能的成本。

茉莉花新闻网

        中国茉莉花革命网始创于2011年2月20日,受阿拉伯之春的感召,大家共同组织、发起了中国茉莉花革命。后由数名义工无偿坚持至今,并发展成为广受翻墙网民欢迎的新闻聚合网站并提供论坛服务。

新闻汇总

邮件订阅

输入您的邮件地址:

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram