人工智能时代的算力挑战
<img src="https://mmbiz.qpic.cn/mmbiz_jpg/FOHTfbK2JUh6iacTCJhtlwhxSicE5EPL4y1uicy0Qv1YC3o5HgMLlctniaZgMgJqJqluzUzAMCXmBqyWa6THI0D1WQ/640?wx_fmt=jpeg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"><p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">照片</span><span style="color: black;">源自</span>:图虫创意</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">近期</span>,</span>OpenAI<span style="color: black;">推出的聊天<span style="color: black;">设备</span>人</span>ChatGPT<span style="color: black;">真可谓是红到发紫。无论是做技术的、做投资的,还是普通网友,<span style="color: black;">好似</span>不聊几句</span>ChatGPT<span style="color: black;">就<span style="color: black;">显出</span>落伍了。当然,在一片对</span>ChatGPT<span style="color: black;">的追捧<span style="color: black;">其中</span>,<span style="color: black;">亦</span>有<span style="color: black;">有些</span><span style="color: black;">区别</span>的意见。<span style="color: black;">例如</span>,图灵奖得主、</span>Meta<span style="color: black;">的首席</span>AI<span style="color: black;"><span style="color: black;">专家</span>杨立昆(</span>Yann LeCun<span style="color: black;">)就在社交<span style="color: black;">媒介</span>上发帖说:从底层技术看,</span>ChatGPT<span style="color: black;">并<span style="color: black;">无</span>什么创新。与其说它是一次巨大的技术革新,倒不如说它是一个工程上的杰作。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">杨立昆的这番言论一出,就遭遇到了网友的一片嘲讽,<span style="color: black;">非常多</span>人<span style="color: black;">乃至</span>毫不客气地说,<span style="color: black;">做为</span></span>Meta<span style="color: black;">的</span>AI<span style="color: black;">掌门人,这完全<span style="color: black;">便是</span>一种“吃不到葡萄说葡萄酸”的狡辩。<span style="color: black;">因为</span></span>Meta<span style="color: black;">先前在同类<span style="color: black;">制品</span>上的失败经历,<span style="color: black;">因此</span>面对如此汹汹的舆论,杨立昆<span style="color: black;">亦</span>是百口莫辩,只能就此噤声,<span style="color: black;">再也不</span>对</span>ChatGPT<span style="color: black;">进一步<span style="color: black;">发布</span>评论。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">不外</span>,<span style="color: black;">倘若</span><span style="color: black;">咱们</span>认真回味一下杨立昆的话,就会<span style="color: black;">发掘</span>他的话其实是非常有道理的:虽然从表现上看,<span style="color: black;">此刻</span>的</span>ChatGPT<span style="color: black;">确实非常惊艳,但从<span style="color: black;">基本</span>上讲,它依然是深度学习技术的一个小拓展。事实上,与之类似的<span style="color: black;">制品</span>在几年前已经<span style="color: black;">显现</span>过,所<span style="color: black;">区别</span>的是,</span>ChatGPT<span style="color: black;">在参数数量上要远远多于之前的<span style="color: black;">制品</span>,其<span style="color: black;">运用</span>的训练样本<span style="color: black;">亦</span>要大得多。而它卓越的性能,其实在很大程度上只是这些数量<span style="color: black;">优良</span><span style="color: black;">累积</span>到了<span style="color: black;">必定</span>程度之后产生的质变。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">有意思的是,<span style="color: black;">倘若</span><span style="color: black;">咱们</span>回顾一下深度学习的历史,就会<span style="color: black;">发掘</span>这种利用神经网络进行<span style="color: black;">设备</span>学习的思路其实在上世纪</span>50<span style="color: black;">年代就有了,<span style="color: black;">能够</span><span style="color: black;">叫作</span>得上是人工智能<span style="color: black;">行业</span>最古老的理论之一。早在</span>1958<span style="color: black;">年,罗森布拉特就曾经用这个原理制造了一台<span style="color: black;">设备</span>来识别字符。然而,在很长的一段时间内,这个理论却<span style="color: black;">始终</span>无人问津,即使<span style="color: black;">此刻</span>被尊为“深度学习之父”的杰弗里·辛顿(</span>Geoffrey Hinton<span style="color: black;">)<span style="color: black;">亦</span><span style="color: black;">长时间</span>遭受孤立和排挤。究其<span style="color: black;">原由</span>,固然有来自当时在人工智能<span style="color: black;">行业</span>占主导地位的“符号主义”的打压,但更为重要的是,当时的深度学习模型确实表现<span style="color: black;">不良</span>。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">直到本世纪初,这一切才<span style="color: black;">出现</span>了改变。<span style="color: black;">长时间</span>蛰伏的深度学习理论<span style="color: black;">最终</span>翻身<span style="color: black;">作为</span>了人工智能的主流,一个个基于这一理论<span style="color: black;">研发</span>的模型如雨后春笋<span style="color: black;">通常</span><span style="color: black;">显现</span>。从打败围棋<span style="color: black;">能手</span>的</span>AlphaGo<span style="color: black;">到识别出几亿种蛋白质结构的</span>AlphaFold<span style="color: black;">,从<span style="color: black;">能够</span>瞬间生成大师画作的</span>Dall-E<span style="color: black;">、</span>Stable Diffusion<span style="color: black;">到当今如日中天的</span>ChatGPT<span style="color: black;">,所有的这些在短短的几年之间涌现了。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">那样</span>,到底是什么<span style="color: black;">原由</span>让深度学习在过去的几年中扭转了<span style="color: black;">长时间</span>的颓势,让它得以完<span style="color: black;">成为了</span>从异端到主流的转换?我想,最为关键的一点<span style="color: black;">便是</span>算力的突破。</span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">算力及其经济效应</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">所说</span>算力,<span style="color: black;">便是</span>设备处理数据、输出结果的能力,<span style="color: black;">或</span>简而言之,<span style="color: black;">便是</span>计算的能力。它的基本单位是用“每秒完成的标准操作数量”(</span>standardized operations per second<span style="color: black;">,简<span style="color: black;">叫作</span></span>SOPS<span style="color: black;">)来进行衡量。<span style="color: black;">不外</span>,<span style="color: black;">因为</span><span style="color: black;">此刻</span>的设备性能都非常高,因而在实践中用</span>SOPS<span style="color: black;">来衡量算力<span style="color: black;">已然</span>变得不<span style="color: black;">那样</span>方便。相比之下,“每秒完成的百万次操作数”(</span>million operations per second<span style="color: black;">,简<span style="color: black;">叫作</span></span>MOPS<span style="color: black;">)、“每秒完成的十亿次操作数”(</span>giga operations per second<span style="color: black;">,简<span style="color: black;">叫作</span></span>GOPS<span style="color: black;">),以及“每秒完成的万亿次操作数”(</span>tera operations per second<span style="color: black;">,简<span style="color: black;">叫作</span></span>TOPS<span style="color: black;">)等单位变得更为常用。当然,在<span style="color: black;">有些</span>文献中,<span style="color: black;">亦</span>会<span style="color: black;">运用</span>某些特定性能的设备在某一时间段内完成的计算量来<span style="color: black;">做为</span>算力的单位——其<span style="color: black;">规律</span>有点类似于<span style="color: black;">理学</span>学中用到的“马力”。<span style="color: black;">例如</span>,一个比较常用的单位叫做“算力当量”,它就被定义为一台每秒运算千万亿次的计算机完整运行一天所实现的算力总量。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">那样</span>,算力的<span style="color: black;">道理</span><span style="color: black;">到底</span><span style="color: black;">安在</span>呢?关于这个问题,阿格拉沃尔(</span>Ajay Agrawal<span style="color: black;">)、甘斯(</span>Joshua Gans<span style="color: black;">)和戈德法布(</span>Avi Goldfarb<span style="color: black;">)在<span style="color: black;">她们</span>合著的《预测<span style="color: black;">设备</span>》(</span>Prediction Machines<span style="color: black;">,中文译名为《</span>AI<span style="color: black;">极简经济学》)中,曾经提出过一个有启发的观点</span><strong style="color: blue;"><span style="color: black;">:算力的成本将关系到</span>AI<span style="color: black;">模型的“价格”</span></strong><span style="color: black;">。经济学的原理告诉<span style="color: black;">咱们</span>,在给定其他<span style="color: black;">要求</span>的前提下,人们对一种商品的<span style="color: black;">需要</span>量取决于该商品的价格。而<span style="color: black;">针对</span>两种性能相近,<span style="color: black;">拥有</span>替代关系的商品<span style="color: black;">来讲</span>,<span style="color: black;">拥有</span>更低价格的那种商品会在市场上<span style="color: black;">拥有</span>更高的竞争力。将这一点应用到人工智能<span style="color: black;">行业</span>,<span style="color: black;">咱们</span>就<span style="color: black;">能够</span>找到深度学习理论<span style="color: black;">为何</span>在几十年中都不被待见,却在<span style="color: black;">近期</span>几年中实现爆发的<span style="color: black;">原由</span>。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">虽然深度学习的理论并不算困难,<span style="color: black;">然则</span>为了实现它,要投入的运算量是<span style="color: black;">非常</span>巨大的。在算力低下的时代,算力的单位成本非常高。在罗森布拉特提出深度学习思想雏形的那个年代,一台计算机的体积几乎和一间房子<span style="color: black;">那样</span>大,但即便如此,让它运算一个大一点的矩阵都还需要很<span style="color: black;">长期</span>。虽然理论上<span style="color: black;">咱们</span><span style="color: black;">亦</span><span style="color: black;">能够</span>用深度学习来训练大模型并达到比较好的效果,但<span style="color: black;">这般</span>的成本显然是<span style="color: black;">无</span>人能够承受的。而相比之下,符号学派的模型<span style="color: black;">针对</span>计算量的<span style="color: black;">需求</span>要小得多,<span style="color: black;">因此呢</span>这些模型的相对价格<span style="color: black;">亦</span>要比深度学习模型来得低。在这种<span style="color: black;">状况</span>下,深度学习理论当然不会在市场上有竞争力。<span style="color: black;">然则</span>,当算力成本大幅度降低之后,深度学习模型的相对价格就降了下来,它的竞争力<span style="color: black;">亦</span>就<span style="color: black;">提高</span>了。<strong style="color: blue;">从这个<span style="color: black;">方向</span>看,深度学习在现<span style="color: black;">周期</span>的胜利其实并不是一个纯粹的技术事件,在很大程度上,它还是一个经济事件。</strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">改进算力的<span style="color: black;">办法</span></span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">那样</span>,决定算力的<span style="color: black;">原因</span>有<span style="color: black;">那些</span>呢?</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">为了直观起见,<span style="color: black;">咱们</span>不妨以计算数学题来对此进行说明:<span style="color: black;">倘若</span><span style="color: black;">咱们</span>要<span style="color: black;">加强</span>在单位时间内计算数学题的效率,有<span style="color: black;">那些</span><span style="color: black;">办法</span><span style="color: black;">能够</span>达到这一目的呢?我想,可能有以下几种<span style="color: black;">办法</span>是可行的:一是找<span style="color: black;">更加多</span>人<span style="color: black;">一块</span>来计算。<span style="color: black;">倘若</span>一个人一分钟<span style="color: black;">能够</span>算一个题,<span style="color: black;">那样</span>十个人一分钟就<span style="color: black;">能够</span>算十个题。<span style="color: black;">这般</span>,即使<span style="color: black;">每一个</span>人的效率<span style="color: black;">无</span><span style="color: black;">提高</span>,随着人数的<span style="color: black;">增多</span>,单位时间内<span style="color: black;">能够</span>计算的数学题数量<span style="color: black;">亦</span><span style="color: black;">能够</span>成倍<span style="color: black;">增多</span>。二是改进设备。<span style="color: black;">例如</span>,最早时,<span style="color: black;">咱们</span>完全是依靠手算的,效率就很低。<span style="color: black;">倘若</span>改用计算器,效率会高一点。<span style="color: black;">倘若</span><span style="color: black;">运用</span>了</span>Excel<span style="color: black;">,效率就可能更高。三是将问题转化,用更好的<span style="color: black;">办法</span>来计算。<span style="color: black;">例如</span>,计算从</span>1<span style="color: black;">加到</span>100<span style="color: black;">,<span style="color: black;">倘若</span><span style="color: black;">根据</span><span style="color: black;">次序</span>一个个把数字加上去,<span style="color: black;">那样</span>可能要算很久。<span style="color: black;">然则</span>,<span style="color: black;">倘若</span><span style="color: black;">咱们</span>像高斯那样用等差数列求和公式来解这个问题,<span style="color: black;">那样</span><span style="color: black;">火速</span>就<span style="color: black;">能够</span>计算出结果。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">将以上三个<span style="color: black;">方法</span>对应到<span style="color: black;">提高</span>算力问题,<span style="color: black;">咱们</span><span style="color: black;">亦</span><span style="color: black;">能够</span>找到类似的三种<span style="color: black;">办法</span>:一是借助高性能计算和分布式计算;二是实现计算模式上的突破;三是改进算法——尽管严格地说这并不能让算力本身得到<span style="color: black;">提高</span>,<span style="color: black;">然则</span>它却能让<span style="color: black;">一样</span>的算力完成<span style="color: black;">更加多</span>的计算,从某个<span style="color: black;">方向</span>看,这就<span style="color: black;">好似</span>让算力<span style="color: black;">增多</span>了<span style="color: black;">同样</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">1<span style="color: black;">、高性能计算和分布式计算</span></span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">从<span style="color: black;">基本</span>上讲,高性能计算和分布式计算都是<span style="color: black;">经过</span><span style="color: black;">同期</span>动用<span style="color: black;">更加多</span>的计算资源去完成计算任务,就<span style="color: black;">好似</span><span style="color: black;">咱们</span>前面讲的,用<span style="color: black;">更加多</span>的人手去算数学题<span style="color: black;">同样</span>。所<span style="color: black;">区别</span>的是,<strong style="color: blue;">前者聚集的计算资源<span style="color: black;">通常</span>是聚集在本地的,而后者动用的计算资源则可能是分散在网上的。</strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(</span>1<span style="color: black;">)高性能计算</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">先看高性能计算。高性能计算中,最为重要的核心技术是并行计算(</span>Parallel Computing<span style="color: black;">)。<span style="color: black;">所说</span>并行计算,是相<span style="color: black;">针对</span>串行计算而言的。在串行计算<span style="color: black;">其中</span>,计算任务不会被拆分,一个任务的执行会固定占有<span style="color: black;">一起</span>计算资源。而在并行计算中,任务则会被分解并交给多个计算资源进行处理。打个比方,串行计算过程就像是让一个人独立<span style="color: black;">根据</span><span style="color: black;">次序</span>完成一张试卷,而并行计算则像是把试卷上的题分配给<span style="color: black;">非常多</span>人<span style="color: black;">同期</span>作答。显然,这种任务的分解和分配<span style="color: black;">能够</span>是多样的:既<span style="color: black;">能够</span>是把计算任务分给多个设备,让它们协同求解,<span style="color: black;">亦</span><span style="color: black;">能够</span>是把被求解的问题分解成若干个部分,各部分均由一个独立的设备来并行计算。并行计算系统既<span style="color: black;">能够</span>是含有多个处理器的超级计算机,<span style="color: black;">亦</span><span style="color: black;">能够</span>是以某种方式互连的若干台独立计算<span style="color: black;">公司</span>成的集群。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">从架构上看,并行计算<span style="color: black;">能够</span>分为同构并行计算(</span>homogeneous parallel computing<span style="color: black;">)和异构并行计算(</span>heterogeneous parallel computing<span style="color: black;">)。顾名思义,同构并行计算是把计算任务分配给一系列相同的计算单元;异构并行计算则是把计算任务分配给<span style="color: black;">区别</span>制程架构、<span style="color: black;">区别</span>指令集、<span style="color: black;">区别</span>功能的计算单元。<span style="color: black;">例如</span>,多核</span>CPU<span style="color: black;">的并行运算就属于同构并行,而</span>CPU+GPU<span style="color: black;">的架构就属于异构并行。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">对比于同构并行,异构并行<span style="color: black;">拥有</span><span style="color: black;">非常多</span>的<span style="color: black;">优良</span>。用通俗的语言解释,这种<span style="color: black;">优良</span>来自于<span style="color: black;">各样</span>计算单元之间的“术业专攻”,在异构架构之下,<span style="color: black;">区别</span>计算单元之间的<span style="color: black;">优良</span><span style="color: black;">能够</span>得到更好的互补。正是<span style="color: black;">因为</span>这个<span style="color: black;">原由</span>,异构并行计算正得到越来越多的<span style="color: black;">注重</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">例如</span>,<span style="color: black;">此刻</span>越来越多的设备<span style="color: black;">其中</span>,都采用了将</span>GPU<span style="color: black;">和</span>CPU<span style="color: black;">混搭的架构。<span style="color: black;">为何</span>要这么做呢?为了说明白这一点,<span style="color: black;">咱们</span>需要略微介绍一下</span>CPU<span style="color: black;">和</span>GPU<span style="color: black;">的结构:从总体上看,无论是</span>CPU<span style="color: black;">还是</span>GPU<span style="color: black;">,都<span style="color: black;">包含</span>运算器(</span>Arithmetic and Logic Unit<span style="color: black;">,简<span style="color: black;">叫作</span></span>ALU<span style="color: black;">)、<span style="color: black;">掌控</span>单元(</span>Control Unit<span style="color: black;">,简<span style="color: black;">叫作</span></span>CL<span style="color: black;">)、高速缓存器(</span>Cache<span style="color: black;">)和动态随机存取存储器(</span>DRAM<span style="color: black;">)。<span style="color: black;">然则</span>,这些<span style="color: black;">成份</span>在两者中的<span style="color: black;">形成</span>比例却是<span style="color: black;">区别</span>的。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在</span>CPU<span style="color: black;"><span style="color: black;">其中</span>,<span style="color: black;">掌控</span>单元和存储单元占的比例很大,而<span style="color: black;">做为</span>计算单位的</span>ALU<span style="color: black;">比例则很小,数量<span style="color: black;">亦</span>不多;而在</span>GPU<span style="color: black;"><span style="color: black;">其中</span>则正好相反,它的</span>ALU<span style="color: black;">比例很大,而<span style="color: black;">掌控</span>单元和存储单元则只占很小的一个比例。这种结构上的差异就决定了</span>CPU<span style="color: black;">和</span>GPU<span style="color: black;">功能上的区别。</span>CPU<span style="color: black;">在<span style="color: black;">掌控</span>和存储的能力上比较强,就能进行比较<span style="color: black;">繁杂</span>的计算,<span style="color: black;">不外</span>它<span style="color: black;">能够</span><span style="color: black;">同期</span>执行的线程很少。而</span>GPU<span style="color: black;">则相反,<span style="color: black;">海量</span>的计算单位让它<span style="color: black;">能够</span>同时执行多线程的任务,但每一个任务都比较简单。打个比喻,</span>CPU<span style="color: black;">是一个精通数学的博士,微积分、线性代数样样都会,但尽管如此,让他做一万道四则运算<span style="color: black;">亦</span>很难;而</span>GPU<span style="color: black;">呢,则是一群只会四则运算的小学生,虽然<span style="color: black;">她们</span>不会微积分和线性代数,但人多力量大,<span style="color: black;">倘若</span><span style="color: black;">一块</span>开干,一万道四则运算分分钟就能搞定。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">因为</span></span>GPU<span style="color: black;">的以上性质,它最初的用途是<span style="color: black;">做为</span>显卡,<span style="color: black;">由于</span>显卡负责图形和色彩的变换,需要的计算量很大,但每一个计算的<span style="color: black;">繁杂</span>性都不高。当深度学习兴起之后,人工智能专家们<span style="color: black;">发掘</span></span>GPU<span style="color: black;">其实<span style="color: black;">亦</span>很适合用来训练神经网络模型。<span style="color: black;">由于</span>在深度学习模型中,最<span style="color: black;">重点</span>的运算<span style="color: black;">便是</span>矩阵运算和卷积,而这些运算从<span style="color: black;">基本</span>上都<span style="color: black;">能够</span>分解为简单的加法和乘法。<span style="color: black;">这般</span>一来,</span>GPU<span style="color: black;">就找到了新的“就业”空间,<span style="color: black;">起始</span>被广泛地应用于人工智能。<span style="color: black;">然则</span>,</span>GPU<span style="color: black;">并<span style="color: black;">不可</span>单独执行任务,<span style="color: black;">因此</span>它必须搭配上一个</span>CPU<span style="color: black;">,<span style="color: black;">这般</span>的组合就<span style="color: black;">能够</span>完成<span style="color: black;">非常多</span><span style="color: black;">繁杂</span>的任务。这就<span style="color: black;">好似</span>让一个能把握方向的导师带着<span style="color: black;">非常多</span>肯卖力的学生,<span style="color: black;">能够</span>干出<span style="color: black;">非常多</span><span style="color: black;">研究</span>成果<span style="color: black;">同样</span>。正是在这种<span style="color: black;">状况</span>下,异构并行<span style="color: black;">起始</span><span style="color: black;">作为</span>了高性能计算的流行架构模式。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">不外</span>,异构架构<span style="color: black;">亦</span>是有代价的。相<span style="color: black;">针对</span>同构架构,它<span style="color: black;">针对</span>应用者的编程<span style="color: black;">需求</span>更高。换言之,<span style="color: black;">仅有</span>当<span style="color: black;">运用</span>者<span style="color: black;">能够</span>更好地把握好<span style="color: black;">区别</span>计算单元之间的属性,并进行有针对性的编程,才可能更好地利用好它们。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">除此之外,<span style="color: black;">咱们</span>还必须认识到,哪怕是借助异构架构,<span style="color: black;">经过</span>并行运算来<span style="color: black;">提高</span>运算效率的可能<span style="color: black;">亦</span>是有限的。<span style="color: black;">按照</span>阿姆达尔定律(</span>Amdahl<span style="color: black;">’</span>s Law<span style="color: black;">),<span style="color: black;">针对</span>给定的运算量,当并行计算的线程趋向于无穷时,系统的加速比会趋向于一个上限,这个上限将是串行运算在总运算中所占比例的倒数。举例<span style="color: black;">来讲</span>,<span style="color: black;">倘若</span>在一个运算中,串行运算的比例是</span>20%<span style="color: black;">,<span style="color: black;">那样</span>无论<span style="color: black;">咱们</span>在并行运算部分投入多少处理器,引入多少线程,其加速比<span style="color: black;">亦</span>不会突破</span>5<span style="color: black;">。这就<span style="color: black;">好似</span>,<span style="color: black;">倘若</span>我要写一本关于生成式</span>AI<span style="color: black;">的书,<span style="color: black;">能够</span>将<span style="color: black;">有些</span>资料<span style="color: black;">查询</span>的工作交给<span style="color: black;">科研</span>助手。显然,<span style="color: black;">倘若</span>我有<span style="color: black;">更加多</span>的<span style="color: black;">科研</span>助手,写书的进度<span style="color: black;">亦</span>会加快。但这种加快不是无限的,<span style="color: black;">由于</span><span style="color: black;">最后</span>这本书什么时候写完,还要看我自己“码字”的速度。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(</span>2<span style="color: black;">)分布式计算</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">采用聚集资源的方式来<span style="color: black;">加强</span>算力的另一种思路<span style="color: black;">便是</span>分布式计算。和高性能计算<span style="color: black;">重点</span>聚集本地计算单位<span style="color: black;">区别</span>,分布式计算则是将分散在不同<span style="color: black;">理学</span>区域的计算单位聚集起来,去<span style="color: black;">一起</span>完成某一计算任务。<span style="color: black;">例如</span>,刘慈欣在他的小说《球状闪电》中就<span style="color: black;">说到</span>过一个叫做</span>SETI@home<span style="color: black;">的<span style="color: black;">研究</span>计划(注:这个项目是真实存在的),这个计划试图将互联网上闲置的个人计算机算力集中起来处理天文数据,这<span style="color: black;">便是</span>一个典型的分布式计算用例。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">分布式计算的一个典型<span style="color: black;">表率</span><span style="color: black;">便是</span><span style="color: black;">咱们</span><span style="color: black;">此刻</span>经常听说的云计算。关于云计算的定义,<span style="color: black;">日前</span>的说法并不统一。一个比较有<span style="color: black;">表率</span>性的观点来自于美国国家标准和技术<span style="color: black;">科研</span>所(</span>NIST<span style="color: black;">),<span style="color: black;">按照</span>这种观点,“云计算是一种按<span style="color: black;">运用</span>量付费的模式。这种模式对可配置的</span>IT<span style="color: black;">资源(<span style="color: black;">包含</span>网络、服务器、存储、应用软件、服务)共享池<span style="color: black;">供给</span>了可用的、<span style="color: black;">方便</span>的、按需供应的网络<span style="color: black;">拜访</span>。在这些</span>IT<span style="color: black;">资源被提供的过程中,只需要投入很少的管理和交流工作”。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">这个定义很抽象、很学院派,<span style="color: black;">咱们</span><span style="color: black;">能够</span>用一个通俗的比喻来对其进行理解。在传统上,用户<span style="color: black;">重点</span>是<span style="color: black;">经过</span>调用自有的单一</span>IT<span style="color: black;">资源,这就好比每家每户自己发电供自己用;而云计算则<span style="color: black;">好似</span>是(用<span style="color: black;">海量</span>算力设备)建了一个大型的“发电站”,<span style="color: black;">而后</span>将“电力”(</span>IT<span style="color: black;">资源)输出给所有用户来用。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">按照</span>云服务<span style="color: black;">供给</span>者所<span style="color: black;">供给</span>的</span>IT<span style="color: black;">资源的<span style="color: black;">区别</span>,<span style="color: black;">能够</span>产生<span style="color: black;">区别</span>的“云交付模式”(</span>Cloud Delivery Model<span style="color: black;">)。<span style="color: black;">因为</span></span>IT<span style="color: black;">资源的种类<span style="color: black;">非常多</span>,<span style="color: black;">因此呢</span>对应的“云交付模式”<span style="color: black;">亦</span>就<span style="color: black;">非常多</span>。在各类<span style="color: black;">资讯</span><span style="color: black;">报告</span>中,最<span style="color: black;">平常</span>的“云交付模式”有三种:</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">第1</span>种是</span>IaaS<span style="color: black;">,它的全<span style="color: black;">叫作</span>是“<span style="color: black;">基本</span><span style="color: black;">设备</span><span style="color: black;">做为</span>服务”(</span>Infrastructure-as-a-Service<span style="color: black;">)。在这种交付模式下,云服务的<span style="color: black;">供给</span>者供给的<span style="color: black;">重点</span>是存储、硬件、服务器和网络等<span style="color: black;">基本</span><span style="color: black;">设备</span>。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">第二种是</span>PaaS<span style="color: black;">,它的全<span style="color: black;">叫作</span>是“平台<span style="color: black;">做为</span>服务”(</span>Platform-as-a-Service<span style="color: black;">)。在这种交付模式下,云服务的<span style="color: black;">供给</span>者需要供应的资源<span style="color: black;">更加多</span>,以便为<span style="color: black;">运用</span>者<span style="color: black;">供给</span>一个“就绪可用”(</span>ready-to-use<span style="color: black;">)的计算平台,以满足<span style="color: black;">她们</span>设计、<span style="color: black;">研发</span>、测试和<span style="color: black;">安排</span>应用程序的需要。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">第三种是</span>SaaS<span style="color: black;">,<span style="color: black;">亦</span><span style="color: black;">便是</span>“软件<span style="color: black;">做为</span>服务”(</span>Software-as-a-Service<span style="color: black;">)。在这种交付模式下,云服务<span style="color: black;">供给</span>者将成品的软件<span style="color: black;">做为</span><span style="color: black;">制品</span>来<span style="color: black;">供给</span>给用户,供其<span style="color: black;">运用</span>。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">有了以上<span style="color: black;">区别</span>的云交付模式,用户就<span style="color: black;">能够</span><span style="color: black;">按照</span>自己的需要来<span style="color: black;">选取</span>相应的</span>IT<span style="color: black;">资源。<span style="color: black;">例如</span>,<span style="color: black;">倘若</span>元宇宙的用户需要<span style="color: black;">更加多</span>的算力或存储,而本地的<span style="color: black;">设备</span><span style="color: black;">没法</span>满足,<span style="color: black;">那样</span>就<span style="color: black;">能够</span><span style="color: black;">经过</span>从云端来获取“外援”。一个云端</span>GPU<span style="color: black;"><span style="color: black;">不足</span>,那就再来几个,按需取用,丰俭由人,既方便,又不至于产生浪费。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">需要指出的是,尽管从理论上看云计算<span style="color: black;">能够</span>很好地承担巨大运算和存储<span style="color: black;">需要</span>,但其缺陷<span style="color: black;">亦</span>是很<span style="color: black;">显著</span>的。比较重要的一点是,在执行云计算时,有<span style="color: black;">海量</span>的数据要在本地和云端之间进行交换,这可能会<span style="color: black;">导致</span><span style="color: black;">显著</span>的延迟。尤其是数据吞吐量过大时,这种延迟就更加严重。<span style="color: black;">针对</span>用户<span style="color: black;">来讲</span>,这可能会对其<span style="color: black;">运用</span>体验产生非常<span style="color: black;">消极</span>的效果。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">那样</span>怎么<span style="color: black;">才可</span>克服这个问题呢?一个直观的思路<span style="color: black;">便是</span>,在靠近用户或设备<span style="color: black;">一边</span><span style="color: black;">安顿</span>一个能够进行计算、存储和传输的平台。这个平台一方面<span style="color: black;">能够</span>在终端和云端之间承担起一个<span style="color: black;">中间商</span>的<span style="color: black;">功效</span>,另一方面则<span style="color: black;">能够</span>对终端的<span style="color: black;">各样</span><span style="color: black;">需求</span>作出实时的<span style="color: black;">回复</span>。这个思想,<span style="color: black;">便是</span><span style="color: black;">所说</span>的边缘计算。<span style="color: black;">因为</span>边缘平台靠近用户,因而其与用户的数据交换会更加<span style="color: black;">即时</span>,延迟问题就<span style="color: black;">能够</span>得到比较好的破解。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">2<span style="color: black;">、超越经典计算——以量子计算为例</span></span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">无论是高性能计算还是分布式计算,其本质都是在运算资源的分配上下功夫。但正如<span style="color: black;">咱们</span>前面看到的,<span style="color: black;">经过</span>这种思路来<span style="color: black;">提高</span>算力是有<span style="color: black;">非常多</span><span style="color: black;">阻碍</span>的。<span style="color: black;">因此呢</span>,<span style="color: black;">此刻</span><span style="color: black;">非常多</span>人<span style="color: black;">期盼</span><strong style="color: blue;">从计算方式本身来进行突破,从而实现更高的计算效率</strong>。其中,量子计算<span style="color: black;">便是</span>最有<span style="color: black;">表率</span>性的例子。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">咱们</span><span style="color: black;">晓得</span>,经典计算的基本单位是比特,比特的状态要么是</span>0<span style="color: black;">,要么是</span>1<span style="color: black;">,<span style="color: black;">因此呢</span>经典计算机中的所有问题都<span style="color: black;">能够</span>分解为对</span>0<span style="color: black;">和</span>1<span style="color: black;">的操作。一个比特的存储单元只能存储一个</span>0<span style="color: black;"><span style="color: black;">或</span>一个</span>1<span style="color: black;">。而量子计算的基本单位则是量子比特,它的状态则<span style="color: black;">能够</span>是一个多维的向量,向量的每一个维度都<span style="color: black;">能够</span><span style="color: black;">暗示</span>一个状态。<span style="color: black;">这般</span>一来,量子存储器就比经典的存储器有很大的<span style="color: black;">优良</span>。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">思虑</span>一个有</span> N<span style="color: black;"><span style="color: black;">理学</span>比特的存储器,<span style="color: black;">倘若</span>它是经典存储器,<span style="color: black;">那样</span>它只能存储</span>2<span style="color: black;">的</span>N<span style="color: black;">次方个可能数据<span style="color: black;">其中</span>的任一个;而<span style="color: black;">倘若</span>它是量子存储器,<span style="color: black;">那样</span>它就<span style="color: black;">能够</span><span style="color: black;">同期</span>存储</span>2<span style="color: black;">的</span>N<span style="color: black;">次方个数据。随着</span> N<span style="color: black;">的<span style="color: black;">增多</span>,量子存储器相<span style="color: black;">针对</span>经典存储器的存储能力就会<span style="color: black;">显现</span>指数级增长。例如,一个</span>250<span style="color: black;">量子比特的存储器可能存储的数就<span style="color: black;">能够</span>达到</span>2<span style="color: black;">的</span>250<span style="color: black;">次方个</span>,<span style="color: black;">比现有已知的宇宙中<span style="color: black;">所有</span>原子数目还要多。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在进行量子计算时,数学操作<span style="color: black;">能够</span><span style="color: black;">同期</span>对存储器中<span style="color: black;">所有</span>的数据进行。<span style="color: black;">这般</span>一来,量子计算机在实施一次的运算中<span style="color: black;">能够</span><span style="color: black;">同期</span>对</span>2<span style="color: black;">的</span>N<span style="color: black;">次方个输入数进行数学运算。其效果相当于经典计算机要重复实施</span>2<span style="color: black;">的</span>N<span style="color: black;">次方次操作,<span style="color: black;">或</span>采用</span>2<span style="color: black;">的</span>N<span style="color: black;">次方个<span style="color: black;">区别</span>处理器实行并行操作。依靠<span style="color: black;">这般</span>的设定,就<span style="color: black;">能够</span>大幅度节省计算次数。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">为了<span style="color: black;">帮忙</span><span style="color: black;">大众</span>理解,<span style="color: black;">咱们</span><span style="color: black;">能够</span>打一个并不是太恰当的比方:玩过动作游戏的<span style="color: black;">伴侣</span>大多<span style="color: black;">晓得</span>,在游戏中,<span style="color: black;">咱们</span><span style="color: black;">装扮</span>的英雄经常<span style="color: black;">能够</span><span style="color: black;">运用</span><span style="color: black;">非常多</span>招数,有些招数只能是针对单一对象输出的;而另<span style="color: black;">有些</span>招数则<span style="color: black;">能够</span>针对全体敌人输出。<span style="color: black;">这儿</span>,前一类的单体输出招数就相当于经典计算,而后一类的群体输出招数就相当于量子计算。<span style="color: black;">咱们</span><span style="color: black;">晓得</span>,在面对<span style="color: black;">海量</span>小怪围攻的时候,一次群体输出产生的效果<span style="color: black;">能够</span>顶得上<span style="color: black;">非常多</span>次单体输出的招数。<span style="color: black;">一样</span>的道理,在<span style="color: black;">有些</span>特定<span style="color: black;">状况</span>下,量子计算<span style="color: black;">能够</span>比经典计算实现非常大的效率<span style="color: black;">提高</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">举例<span style="color: black;">来讲</span>,大数因式分解在破解公开密钥加密的过程中有<span style="color: black;">非常</span>重要的价值。<span style="color: black;">倘若</span>用计算机,采用<span style="color: black;">此刻</span>常用的</span>Shor<span style="color: black;">算法来对数</span>N<span style="color: black;">进行因式分解,其运算的时间将会随着</span>N<span style="color: black;">对应的二进制数的长度呈现指数级增长。</span>1994<span style="color: black;">年时,曾有人组织<span style="color: black;">全世界</span>的</span>1600<span style="color: black;">个工作站对一个二进制长度为</span>129<span style="color: black;">的数字进行了因式分解。这项工作足足用了</span>8<span style="color: black;">个月才完成。然而,<span style="color: black;">倘若</span><span style="color: black;">一样</span>的问题换成用量子计算来<span style="color: black;">处理</span>,<span style="color: black;">那样</span><span style="color: black;">全部</span>问题就<span style="color: black;">能够</span>在</span>1<span style="color: black;">秒之内<span style="color: black;">处理</span>。量子计算的威力由此可见一斑。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">然则</span>,在看到量子计算威力的<span style="color: black;">同期</span>,<span style="color: black;">咱们</span><span style="color: black;">亦</span>必须认识到,<span style="color: black;">最少</span>到<span style="color: black;">日前</span>为止,量子计算的威力还只能<span style="color: black;">表现</span>对<span style="color: black;">少许</span>几种特殊问题的处理上,其通用性还比较弱。事实上,<span style="color: black;">此刻</span>见诸<span style="color: black;">报告</span>的<span style="color: black;">各样</span>量子计算机<span style="color: black;">亦</span>都只能执行专门算法,而<span style="color: black;">不可</span>执行通用计算。<span style="color: black;">例如</span>,谷歌和</span>NASA<span style="color: black;">联合<span style="color: black;">研发</span>的</span>D-Wave<span style="color: black;">就只能执行量子退火(</span>Quantum Annealing<span style="color: black;">)算法,而我国<span style="color: black;">开发</span>的光量子计算机“九章”则是专门被用来<span style="color: black;">科研</span>“高斯玻色取样”问题的。尽管它们在各自的专业<span style="color: black;">行业</span>表现<span style="color: black;">非常</span>优异,但都还<span style="color: black;">不可</span>用来<span style="color: black;">处理</span>通用问题。这就<span style="color: black;">好似</span>游戏中的群体攻击大招,虽然攻击范围广,<span style="color: black;">然则</span>对<span style="color: black;">每一个</span>个体的杀伤力都比较弱。<span style="color: black;">因此呢</span>,<span style="color: black;">倘若</span>遇上大群的小怪,群体攻击固然厉害,但<span style="color: black;">倘若</span>遇上防御高、血条厚的</span>Boss<span style="color: black;">,这种攻击就派不上用处了。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">从这个<span style="color: black;">方向</span>看,<span style="color: black;">倘若</span><span style="color: black;">咱们</span><span style="color: black;">期盼</span>让量子计算大发神威,就必须先找出适合量子计算应用的问题和场景,<span style="color: black;">而后</span>再找到相应的算法。与此<span style="color: black;">同期</span>,<span style="color: black;">咱们</span><span style="color: black;">亦</span>必须认识到,虽然量子计算的<span style="color: black;">开发</span>和探索<span style="color: black;">非常</span>重要,<span style="color: black;">然则</span>它和对其他技术路径的探索之间更应该是互补,而不是替代的关系。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">3<span style="color: black;">、<span style="color: black;">经过</span>改进算法节约算力</span></span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;"><span style="color: black;">倘若</span>说,<span style="color: black;">经过</span>高性能计算、分布式计算,以及量子计算等手段来<span style="color: black;">提高</span>算力是“开源”,<span style="color: black;">那样</span><span style="color: black;">经过</span>改进算法来节约算力<span style="color: black;">便是</span>“节流”。</span></strong><span style="color: black;">从<span style="color: black;">提高</span>计算效率、减少因计算而产生的经济、环境成本而言,开源和节流在某种程度上<span style="color: black;">拥有</span>同等重要的价值。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在</span>ChatGPT<span style="color: black;">爆火之后,大模型<span style="color: black;">起始</span>越来越受到人们的<span style="color: black;">喜爱</span>。<span style="color: black;">因为</span>在同等<span style="color: black;">要求</span>下,模型的参数越多、训练的数据越大,它的表现就越好,<span style="color: black;">因此呢</span>为了追求模型的更好表现,<span style="color: black;">此刻</span>的模型正在变得越来越大。<span style="color: black;">咱们</span><span style="color: black;">晓得</span>,<span style="color: black;">此刻</span>的</span>ChatGPT<span style="color: black;"><span style="color: black;">重点</span>是在</span>GPT-3.5<span style="color: black;">的<span style="color: black;">基本</span>上训练的。在它<span style="color: black;">显现</span>之前,</span>GPT<span style="color: black;">共经历了三代。</span>GPT-1<span style="color: black;">的参数大约为</span>1.17<span style="color: black;">亿个,预训练数据为</span>5GB<span style="color: black;">,从<span style="color: black;">此刻</span>看来并不算多;到了</span>GPT-2<span style="color: black;">,参数量就<span style="color: black;">增多</span>到了</span>15<span style="color: black;">亿个,预训练数据<span style="color: black;">亦</span>达到了</span>40GB<span style="color: black;">;而到了</span>GPT-3<span style="color: black;">,参数量则<span style="color: black;">已然</span><span style="color: black;">快速</span>膨胀到了骇人的</span>1750<span style="color: black;">亿个,预训练数据<span style="color: black;">亦</span>达到了</span>45TB<span style="color: black;">。为了训练</span>GPT-3<span style="color: black;">,单次成本就需要</span>140<span style="color: black;">万美元。尽管</span>OpenAI<span style="color: black;">并<span style="color: black;">无</span><span style="color: black;">颁布</span></span>GPT-3.5<span style="color: black;">的<span style="color: black;">详细</span><span style="color: black;">状况</span>,但<span style="color: black;">能够</span>想象,它的参数量和预训练数据上都会比</span>GPT-3<span style="color: black;">更高。为了训练这个模型,微软专门组建了一个由</span>1<span style="color: black;">万个</span>V100GPU<span style="color: black;"><span style="color: black;">构成</span>的高性能网络集群,总算力消耗达到了</span>3640<span style="color: black;">“算力当量”——<span style="color: black;">亦</span><span style="color: black;">便是</span>说,<span style="color: black;">倘若</span>用一台每秒计算一千万亿次的计算机来训练这个模型,<span style="color: black;">那样</span>大约需要近十年<span style="color: black;">才可</span>完成这个任务。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">倘若</span>任由这种“一代更比一代大”的趋势<span style="color: black;">连续</span>下去,<span style="color: black;">那样</span>在<span style="color: black;">将来</span>几年,对算力的<span style="color: black;">需要</span>将会<span style="color: black;">显现</span>爆炸性的增长。一项最新的<span style="color: black;">科研</span>估计,在</span>5<span style="color: black;">年之后,</span>AI<span style="color: black;">模型需要的算力可能会是<span style="color: black;">此刻</span>的</span>100<span style="color: black;">万倍。很显然,由此产生的经济和环境成本将会是<span style="color: black;">非常</span>惊人的。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">令人欣慰的是,目前<span style="color: black;">已然</span>有不少<span style="color: black;">科研</span>者<span style="color: black;">期盼</span>改进算法、优化模型来减少对算力的<span style="color: black;">需要</span>,并且<span style="color: black;">已然</span>取得了<span style="color: black;">必定</span>的成就。<span style="color: black;">例如</span>,就在今年</span>1<span style="color: black;">月</span>3<span style="color: black;">日,来自奥地利科学技术<span style="color: black;">科研</span>所</span> (ISTA)<span style="color: black;">的<span style="color: black;">科研</span>人员埃利亚斯·弗朗塔(</span>Elias Frantar<span style="color: black;">)和丹·阿里斯特尔(</span>Dan Alistarh<span style="color: black;">)合作进行了一项<span style="color: black;">科研</span>,首次针对</span> 100<span style="color: black;">至</span> 1000<span style="color: black;">亿参数的模型规模,提出了精确的单次剪枝<span style="color: black;">办法</span></span>SparseGPT<span style="color: black;">。</span>SparseGPT<span style="color: black;"><span style="color: black;">能够</span>将</span>GPT<span style="color: black;">系列模型单次剪枝到</span> 50%<span style="color: black;">的稀疏性,而无需任何重新训练。以<span style="color: black;">日前</span>最大的公开可用的</span>GPT-175B<span style="color: black;">模型为例,只需要<span style="color: black;">运用</span>单个</span>GPU<span style="color: black;">在几个小时内就能实现这种剪枝。不仅如此,</span>SparseGPT<span style="color: black;">还很准确,能将精度损失降到最小。在进行了类似的修剪之后,这些大模型在训练时所需要的计算量就会大幅减少,其对算力的<span style="color: black;">需要</span><span style="color: black;">亦</span>就会相应下降。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">关于<span style="color: black;">提高</span>算力、</span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">支持人工智能发展的政策思考</span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">随着</span>ChatGPT<span style="color: black;">引领了新一轮的人工智能热潮,市场上对算力的<span style="color: black;">需要</span><span style="color: black;">亦</span>会<span style="color: black;">显现</span>爆炸性的增长。在这种<span style="color: black;">状况</span>下,为了有力支撑人工智能的发展,就必须要<span style="color: black;">经过</span>政策的手段引导算力供给的大幅度<span style="color: black;">增多</span>。而要实现这一点,以下几方面的工作可能是最为值得<span style="color: black;">注重</span>的。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">第1</span>,应当加快对算力<span style="color: black;">基本</span><span style="color: black;">设备</span>的建设和布局,<span style="color: black;">提高</span>对全社会算力<span style="color: black;">需要</span>的支持。如前所述,从<span style="color: black;">日前</span>看,分布式计算,尤其是其中的云计算是<span style="color: black;">提高</span>算力的一个有效之举。而要让云计算的效应充分发挥,就需要大力建设各类算力<span style="color: black;">基本</span><span style="color: black;">设备</span>。唯有如此,才<span style="color: black;">能够</span>让人们随时随地都<span style="color: black;">能够</span>直接<span style="color: black;">经过</span>网络<span style="color: black;">得到</span>所需的算力资源。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">这儿</span>需要指出的是,在布局算力<span style="color: black;">基本</span><span style="color: black;">设备</span>的时候,应当<span style="color: black;">谨慎</span><span style="color: black;">思虑</span>它们的地域和空间分布,尽可能降低算力的成本。<span style="color: black;">咱们</span><span style="color: black;">晓得</span>,<span style="color: black;">区别</span>的地区的土地、水、电力等要素的价格是<span style="color: black;">区别</span>的,这决定了在<span style="color: black;">区别</span>地区生产相同的算力所需要的成本<span style="color: black;">亦</span>不尽相同。<span style="color: black;">因此呢</span>,在建设算力<span style="color: black;">基本</span><span style="color: black;">设备</span>时,必须<span style="color: black;">统一</span>全局,尽可能优化成本。需要指出的是,我国正在推进的“东数西算”工程<span style="color: black;">便是</span>这个思路的一个<span style="color: black;">表现</span>。<span style="color: black;">因为</span>我国东部<span style="color: black;">各样</span>资源的<span style="color: black;">运用</span>成本都要高于西部,<span style="color: black;">因此呢</span>在西部地区<span style="color: black;">创立</span>算力<span style="color: black;">设备</span>,就会大幅降低算力的供给成本,从而在全国范围内达到更优的配置效率。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">第二,应当加强与算力<span style="color: black;">关联</span>的硬件技术及其应用的<span style="color: black;">开发</span>,为<span style="color: black;">增多</span>算力供应<span style="color: black;">供给</span>支持。与算力<span style="color: black;">关联</span>的硬件技术既<span style="color: black;">包含</span>基于经典计算的<span style="color: black;">各样</span>硬件,如芯片、高性能计算机等,<span style="color: black;">亦</span><span style="color: black;">包含</span>超越经典计算理论,<span style="color: black;">按照</span>新计算理论<span style="color: black;">研发</span>的硬件,如量子计算机等。从供给的<span style="color: black;">方向</span>看,这些硬件是<span style="color: black;">基本</span>,它们的性能直接关系到算力<span style="color: black;">供给</span>的可能性界限。<span style="color: black;">因此呢</span>,必须用政策积极促进这些硬件的攻关和<span style="color: black;">开发</span>。尤其是<span style="color: black;">针对</span><span style="color: black;">有些</span>“卡脖子”的项目,应当<span style="color: black;">首要</span>加以突破。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这里需要指出的是,在进行技术<span style="color: black;">开发</span>的<span style="color: black;">同期</span>,<span style="color: black;">亦</span>应该积极探索技术的应用。例如,<span style="color: black;">咱们</span><span style="color: black;">此刻</span><span style="color: black;">已然</span>在量子计算<span style="color: black;">行业</span>取得了<span style="color: black;">有些</span>成果,<span style="color: black;">然则</span><span style="color: black;">因为</span>用例的缺乏,这些成果并<span style="color: black;">无</span>能够转化为现实的应用。从这个<span style="color: black;">道理</span>上讲,<span style="color: black;">咱们</span><span style="color: black;">亦</span>需要加强对技术应用的<span style="color: black;">科研</span>。<span style="color: black;">倘若</span><span style="color: black;">能够</span>把<span style="color: black;">有些</span>计算问题转化成量子计算问题,就<span style="color: black;">能够</span>充分发挥量子计算机的<span style="color: black;">优良</span>,实现计算效率的大幅<span style="color: black;">提高</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">第三,应当对算法、架构等软件层面的要素进行优化,在<span style="color: black;">保准</span></span>AI<span style="color: black;"><span style="color: black;">制品</span>性能的<span style="color: black;">同期</span>,尽可能减少对算力的依赖。从降低</span>AI<span style="color: black;">计算成本的<span style="color: black;">方向</span>看,降低模型的算力<span style="color: black;">需要</span>和<span style="color: black;">提高</span>算力<span style="color: black;">拥有</span>同等重要的<span style="color: black;">道理</span>。<span style="color: black;">因此呢</span>,在用政策的手段促进算力供给的<span style="color: black;">同期</span>,<span style="color: black;">亦</span>应当以<span style="color: black;">一样</span>的力度对算法、架构和模型的优化予以同等的激励。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">思虑</span>到类似的成果<span style="color: black;">拥有</span><span style="color: black;">非常</span>巨大的社会正<span style="color: black;">外边</span>性,<span style="color: black;">因此呢</span>用专利来<span style="color: black;">守护</span>它们并不是最合适的。<span style="color: black;">因此呢</span>,<span style="color: black;">能够</span>积极鼓励对取得类似成功的人员和单位给予直接的奖励,并<span style="color: black;">同期</span>鼓励<span style="color: black;">她们</span>将这些成果向全社会开源;<span style="color: black;">亦</span><span style="color: black;">能够</span><span style="color: black;">思虑</span>由政府出面,对类似的模型<span style="color: black;">制品</span>进行招标采购。<span style="color: black;">倘若</span>有个人和单位<span style="color: black;">能够</span><span style="color: black;">根据</span><span style="color: black;">需求</span><span style="color: black;">供给</span>相应的成果,政府就支付相应的<span style="color: black;">花费</span>,并对成果进行开源。<span style="color: black;">经过</span>这些<span style="color: black;">措施</span>,就<span style="color: black;">能够</span>很好地激励人们积极投身到改进模型、节约算力的事业中,<span style="color: black;">亦</span><span style="color: black;">能够</span>在有成果产出时,让全社会<span style="color: black;">即时</span>享受到这些成果。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">总而言之,在人工智能突飞猛进的时代,算力可能是决定人工智能发展上限的一个关键<span style="color: black;">原因</span>。唯有在算力问题上实现突破,人工智能的发展才可能有<span style="color: black;">基本</span><span style="color: black;">保证</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">阅读作者<span style="color: black;">更加多</span><span style="color: black;">文案</span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><a style="color: black;">AI来袭:这一次,<span style="color: black;">咱们</span></a>还能保住自己的饭碗吗</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><a style="color: black;">生成式AI:缘起、机遇和挑战</a></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><a style="color: black;">2023:平台经济再出发</a></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_jpg/FOHTfbK2JUjzzNvlEGIzFVn2dicKrxwhCtpDEF52ahQVSRdrGy5JPNOgqLTz0mv2M4hJX6XvRWJbVoF22pbRibww/640?wx_fmt=jpeg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
感谢你的精彩评论,为我的思绪打开了新的窗口。 我完全赞同你的观点,思考很有深度。
页:
[1]