6hz7vif 发表于 2024-8-31 14:55:09

神经网络优化器的核心算法以及为么需要它们


    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">实践教程,直观的深度学习,像SGD,动量,RMSPROP,ADAM和其他人<span style="color: black;">同样</span>,渐变下降优化器<span style="color: black;">运用</span>的柔和指南,以简单的英语</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">ketan doshi</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">10分钟阅读</p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/891e1e5a6a8744b890fb7463182577d8~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1725649150&amp;x-signature=iPmjvOLPEgISGL%2FltB9TrcceqyU%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&gt; Photo by George Stackpole on Unsplash</p>
    </div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">优化器是神经网络架构的关键<span style="color: black;">构成</span>部分。在训练<span style="color: black;">时期</span>,<span style="color: black;">她们</span>在<span style="color: black;">帮忙</span>网络学习以<span style="color: black;">加强</span>更好的预测方面发挥着关键<span style="color: black;">功效</span>。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">它们<span style="color: black;">经过</span>找到权重和偏差等最佳模型参数来执行此操作,以便该模型<span style="color: black;">能够</span>产生它们<span style="color: black;">处理</span>问题的最佳输出。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">大<span style="color: black;">都数</span>神经网络<span style="color: black;">运用</span>的最<span style="color: black;">平常</span>的优化技术是梯度下降。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">最受欢迎的深度学习图书馆,如Pytorch和Keras,基于梯度下降,<span style="color: black;">持有</span><span style="color: black;">广泛</span>的内置优化器,例如。SGD,Adadelta,Adagagrad,RMSProp,Adam等。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">为何</span>有这么多<span style="color: black;">区别</span>的优化算法?<span style="color: black;">咱们</span><span style="color: black;">怎样</span>决定<span style="color: black;">选取</span>哪一个?</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">倘若</span>您读取<span style="color: black;">每一个</span>文件的文档,则描述<span style="color: black;">怎样</span>更新模型参数的公式。<span style="color: black;">每一个</span>配方是什么意思,其重要性是什么?</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在<span style="color: black;">咱们</span>准备好进入数学之前,我与本文的<span style="color: black;">目的</span>是<span style="color: black;">供给</span><span style="color: black;">有些</span>整体背景并<span style="color: black;">得到</span><span style="color: black;">有些</span>关于每种算法<span style="color: black;">怎样</span>适合的一种直觉。事实上,我不会在<span style="color: black;">这儿</span>讨论公式,但会进行另一种讨论。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">让<span style="color: black;">咱们</span>从工作中的梯度下降算法的典型3D<span style="color: black;">照片</span><span style="color: black;">起始</span>。</p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/084126c876584b17afdff441b71350dc~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1725649150&amp;x-signature=CmNaHSkDqQV0RsPCnrhTob%2BDfJQ%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&gt; Loss curve for Gradient Descent (Source)</p>
    </div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">此<span style="color: black;">照片</span><span style="color: black;">表示</span><span style="color: black;">拥有</span>两个权重参数的网络:</p>水平平面分别<span style="color: black;">拥有</span>两个轴,分别用于重量W1和W2。垂直轴<span style="color: black;">表示</span>损耗的值,适用于权重的<span style="color: black;">每一个</span>组合<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">换句话说,曲线的形状<span style="color: black;">表示</span>了神经网络的“损失景观”。它绘制了权重的<span style="color: black;">区别</span>值的损失,而<span style="color: black;">咱们</span>将输入数据集固定在<span style="color: black;">一块</span>。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">蓝线在优化<span style="color: black;">时期</span>绘制梯度下降算法的轨迹:</p>它<span style="color: black;">经过</span>为两个重量<span style="color: black;">选取</span><span style="color: black;">有些</span>随机值来<span style="color: black;">起始</span>,并计算损耗值。在每次迭代时,它更新其权重值,<span style="color: black;">引起</span>较低的损耗(<span style="color: black;">期盼</span>),它移动到曲线的较低点最后,它到达其<span style="color: black;">目的</span>是损耗最低的曲线的底部。<h1 style="color: black; text-align: left; margin-bottom: 10px;">计算渐变</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">该算法基于该点处的损耗曲线的梯度更新权重,以及学习速率因子。</p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/21645adf25c64fe1a23b4f4793cc227a~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1725649150&amp;x-signature=0qyzjDRN6ZcJKDGMO9hH4dL88G0%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&gt; Gradient Descent parameter update (Image by Author)</p>
    </div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">梯度<span style="color: black;">测绘</span>斜率,并且是垂直方向(DL)的变化除以水平方向(DW)的变化。这<span style="color: black;">寓意</span>着梯度<span style="color: black;">针对</span>陡峭的斜坡而小,<span style="color: black;">针对</span>轻柔的斜坡而小。</p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/68d6ca0e6fb549ee9ed4086593fa4370~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1725649150&amp;x-signature=wGwaca6MKMlGdj6NwkYTsYSxTCg%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&gt; Computing the Gradient (Image by Author)</p>
    </div>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">实践中的梯度下降</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这些损失曲线是一个有用的可视化,以<span style="color: black;">认识</span>梯度下降的概念。<span style="color: black;">然则</span>,<span style="color: black;">咱们</span>应该<span style="color: black;">认识</span>到这是一个理想化的场景,而不是一个现实的情景:</p>上图<span style="color: black;">表示</span>了光滑的凸形曲线。<span style="color: black;">实质</span>上,曲线非常颠簸。 &gt; A neural network loss landscape (Source, by permission of Hao Li)其次,<span style="color: black;">咱们</span>不会有2个参数。<span style="color: black;">一般</span>有数十或数百百万,并且<span style="color: black;">没法</span>想象或<span style="color: black;">乃至</span>想象在你的头上。<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在每次迭代时,梯度血缘血液由“看着各个方向找到它<span style="color: black;">能够</span>下降的最佳斜坡”。<span style="color: black;">那样</span>当最好的坡度不是最好的方向时会<span style="color: black;">出现</span>什么?</p><span style="color: black;">倘若</span>景观在一个方向上陡峭倾斜,但最低点是朝向更平缓的斜坡的方向?<span style="color: black;">或</span><span style="color: black;">倘若</span><span style="color: black;">周边</span>的景观相当平坦,怎么办?<span style="color: black;">或</span><span style="color: black;">倘若</span>它沿着深沟沿着它攀登<span style="color: black;">怎样</span>爬出它?<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这些是对其困难的曲线的<span style="color: black;">有些</span>例子。让<span style="color: black;">咱们</span><span style="color: black;">瞧瞧</span><span style="color: black;">哪些</span>接下来。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">梯度下降优化的挑战</h1>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">局部最小值</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在典型的损失曲线中,除了全局最小值之外,您可能有许多本地最小值。<span style="color: black;">因为</span>梯度下降旨在继续向下,一旦它沿着局部最小,就会<span style="color: black;">发掘</span>爬回斜坡很难。<span style="color: black;">因此</span>它可能会在<span style="color: black;">无</span>达到全局最小的<span style="color: black;">状况</span>下陷入困境。</p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/2822af819a3b4264aab33f268b6825ce~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1725649150&amp;x-signature=eOJ%2F%2FvmaK9tFcDxE19JfCHMI1qs%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&gt; Local minima and Global minimum (Source)</p>
    </div>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">马鞍点</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">另一个关键挑战是“马鞍要点”的<span style="color: black;">出现</span>。这是一个点,在对应于一个参数的一个方向上,曲线<span style="color: black;">处在</span>局部最小值。另一方面,在对应于另一个参数的第二方向上,曲线<span style="color: black;">处在</span>局部最大值。</p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/349c2379f9084ea99362ed3de1a67fa6~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1725649150&amp;x-signature=am3JNZjFo%2BxixvERSdvuHHv9ukw%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&gt; Saddle Point (Source)</p>
    </div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">是什么让马鞍点棘手,是马鞍点<span style="color: black;">周边</span>的区域<span style="color: black;">一般</span>相当平坦,就像一个高原。这<span style="color: black;">寓意</span>着梯度接近零。这使得优化器在<span style="color: black;">第1</span>参数的方向上围绕鞍点振荡,而<span style="color: black;">不可</span>够沿第二参数的方向下降斜率。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">因此呢</span>,梯度下降不正确地假设它<span style="color: black;">发掘</span>它最小。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">沟壑</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">梯度血统<span style="color: black;">亦</span><span style="color: black;">发掘</span>难以遍历沟壑。这是一个长长的狭窄山谷,陡峭地沿一个方向(即,山谷的侧面),并且在第二方向上轻轻地(即沿着山谷)。这种山沟<span style="color: black;">一般</span>会<span style="color: black;">引起</span>最小。<span style="color: black;">由于</span>难以导航,这种形状<span style="color: black;">亦</span><span style="color: black;">叫作</span>为病理曲率。</p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/19c2da52a51249c6a24a3bac0736f65d~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1725649150&amp;x-signature=nmN6uJscWanzVAzi7v7NR4X95xE%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&gt; Ravines (Modified from Source, by permission of James Martens)</p>
    </div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">想象一下这就像一只狭窄的河流,从山上轻轻地倾斜,直到它在湖中结束。你想做的是在山谷的方向上快速下行。然而,梯度下降非常容易沿着山谷的侧面来回反弹,并在河的方向上非常缓慢地移动。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">虽然它们继续在核心下<span style="color: black;">运用</span>梯度下降,但优化算法<span style="color: black;">已然</span><span style="color: black;">研发</span>了一系列改善的香草梯度下降,以<span style="color: black;">处理</span>这些挑战。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">梯度下降的<span style="color: black;">首要</span>改进 - 随机梯度下降(SGD)</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">梯度下降<span style="color: black;">一般</span><span style="color: black;">寓意</span>着“全批梯度下降”,其中<span style="color: black;">运用</span>数据集中的所有项目计算丢失和梯度。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">相反,迷你批量随机梯度血液为<span style="color: black;">每一个</span>训练迭代采用数据集的随机<span style="color: black;">选取</span>的子集。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">随机性有助于<span style="color: black;">咱们</span>探索损失景观。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">早些时候<span style="color: black;">咱们</span><span style="color: black;">已然</span><span style="color: black;">说到</span><span style="color: black;">经过</span>改变模型参数来<span style="color: black;">得到</span>损耗曲线,<span style="color: black;">同期</span>保持输入数据集固定。但是,<span style="color: black;">倘若</span><span style="color: black;">经过</span>在<span style="color: black;">每一个</span>小批处理中<span style="color: black;">选取</span><span style="color: black;">区别</span>的数据样本来改变输入,则损耗值和渐变<span style="color: black;">亦</span>会变化。换句话说,<span style="color: black;">经过</span>改变输入数据集,<span style="color: black;">能够</span><span style="color: black;">运用</span><span style="color: black;">每一个</span>迷你批处理<span style="color: black;">得到</span>略微<span style="color: black;">区别</span>的损耗曲线。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">因此呢</span>,即使您在一个百分之批处理中陷入景观中的某个<span style="color: black;">地区</span>,您可能会看到下一个迷你批次的<span style="color: black;">区别</span>景观,这让您继续移动。这<span style="color: black;">能够</span>防止算法卡在景观的特定部分中,<span style="color: black;">尤其</span>是在培训的<span style="color: black;">初期</span><span style="color: black;">周期</span>。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">梯度下降的第二种改善 - 动量</h1>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">动态<span style="color: black;">调节</span>更新量</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">梯度下降的棘手方面之一是处理陡峭的斜坡。<span style="color: black;">由于</span>梯度在那里很大,<span style="color: black;">因此</span>当你<span style="color: black;">实质</span>上想要慢慢地和<span style="color: black;">小心</span>时,你可能会迈出一大步。这可能<span style="color: black;">引起</span>来回弹跳,从而减慢训练。</p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/627f6120b0d6463690d4336df757a318~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1725649150&amp;x-signature=xy0y9YEZc6ImBeIWzwosOE%2FHKiA%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&gt; (Image by Author)</p>
    </div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">理想<span style="color: black;">状况</span>下,您<span style="color: black;">期盼</span>动态地改变更新的<span style="color: black;">体积</span>,<span style="color: black;">因此呢</span>您<span style="color: black;">能够</span>响应您<span style="color: black;">周边</span>景观的变化。<span style="color: black;">倘若</span>斜坡非常陡峭,你想要减速。<span style="color: black;">倘若</span>斜坡非常平坦,你可能想要加速等等。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">梯度下降,基于梯度和学习速率,您<span style="color: black;">能够</span>在<span style="color: black;">每一个</span><span style="color: black;">过程</span>中进行更新。<span style="color: black;">因此呢</span>,要修改更新的<span style="color: black;">体积</span>,您<span style="color: black;">能够</span>执行两件事:</p><span style="color: black;">调节</span>梯度<span style="color: black;">调节</span>学习率<h1 style="color: black; text-align: left; margin-bottom: 10px;">动量与SGD.</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">势头是一种以前的方式,以上是。<span style="color: black;">调节</span>梯度。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">运用</span>SGD,<span style="color: black;">咱们</span>只看<span style="color: black;">日前</span>的渐变并忽略所有过去的梯度。这<span style="color: black;">寓意</span>着<span style="color: black;">倘若</span>损失曲线中存在<span style="color: black;">忽然</span><span style="color: black;">反常</span>,您的轨迹可能会抛弃课程。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">另一方面,当<span style="color: black;">运用</span>动量时,您<span style="color: black;">能够</span>让过去的渐变引导整体方向,以便您留在课程中。这使您<span style="color: black;">能够</span><span style="color: black;">运用</span>您<span style="color: black;">这里</span>一点中看到的<span style="color: black;">周边</span>景观的<span style="color: black;">认识</span>,并有助于<span style="color: black;">控制</span><span style="color: black;">反常</span>值在损失曲线中的效果</p><span style="color: black;">第1</span>个问题是你过去有多远?你走的<span style="color: black;">另一</span>一步,你会受到<span style="color: black;">反常</span>影响得越少。其次,<span style="color: black;">每一个</span>梯度都是从过去计数的<span style="color: black;">一样</span>?<span style="color: black;">近期</span>过去的事情应该是从遥远的过去的东西算上的东西。<span style="color: black;">因此呢</span>,<span style="color: black;">倘若</span>景观的变化不是<span style="color: black;">反常</span>,而是真正的结构变化,<span style="color: black;">那样</span>你确实需要对它做出反应并<span style="color: black;">逐步</span>改变你的课程。<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">动量算法<span style="color: black;">运用</span>梯度的指数移动平均值,而不是当前梯度值。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">运用</span>动量的横向沟壑</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">动量<span style="color: black;">能够</span><span style="color: black;">帮忙</span>您<span style="color: black;">处理</span>病态曲率的狭窄蹂躏问题,其中梯度非常高,<span style="color: black;">针对</span>一个重量参数,但另一个参数非常低。</p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/9908851ddf1645339f26634afb5c2a84~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1725649150&amp;x-signature=L7zxHJX6fIgjtnxQmKZuClhquUg%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&gt; Momentum helps you traverse ravines (Modified from Source, by permission of James Martens)</p>
    </div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">经过</span><span style="color: black;">运用</span>动量,您将<span style="color: black;">控制</span>SGD会<span style="color: black;">出现</span>的Zig Zag振荡。</p><span style="color: black;">针对</span><span style="color: black;">拥有</span>陡坡的<span style="color: black;">第1</span>个参数,大梯度会<span style="color: black;">引起</span>从谷的<span style="color: black;">一边</span>到另<span style="color: black;">一边</span>的“曲折”。<span style="color: black;">然则</span>,在下一步中,这将被反面中的“ZAG”取消。另一方面,<span style="color: black;">针对</span>第二个参数,<span style="color: black;">第1</span>步的小更新由第二步的小更新加强,<span style="color: black;">由于</span>它们<span style="color: black;">处在</span>相同方向。这是你想去的山谷的方向。<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">运用</span><span style="color: black;">区别</span>公式<span style="color: black;">运用</span>动量的优化器算法的<span style="color: black;">有些</span>示例是:</p>SGD与势头nesterov加速梯度<h1 style="color: black; text-align: left; margin-bottom: 10px;">第三次改善梯度下降 - 修改学习率(基于梯度)</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">如上所述,修改参数更新量的第二种<span style="color: black;">办法</span>是<span style="color: black;">经过</span><span style="color: black;">调节</span>学习率。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">到<span style="color: black;">日前</span>为止,<span style="color: black;">咱们</span><span style="color: black;">始终</span>在将学习速度保持在一个迭代到下一个迭代。其次,渐变更新<span style="color: black;">运用</span>了所有参数的相<span style="color: black;">朋友</span>习率。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">然则</span>,正如<span style="color: black;">咱们</span>所看到的,<span style="color: black;">区别</span>参数的梯度之间可能存在大的变化。一个参数可能有一个陡峭的斜率,而另一个参数<span style="color: black;">拥有</span>平缓的斜率。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span><span style="color: black;">能够</span>利用这一点来<span style="color: black;">调节</span><span style="color: black;">每一个</span>参数的学习率。<span style="color: black;">咱们</span><span style="color: black;">能够</span>利用过去的渐变(分别为<span style="color: black;">每一个</span>参数)<span style="color: black;">选取</span>该参数的学习速率。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">运用</span>略微<span style="color: black;">区别</span>的技术,有<span style="color: black;">有些</span>优化器算法<span style="color: black;">这般</span>做,例如。Adagrad,Adadelta,RMS Prop。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">例如,Adagrad平方于过去的渐变并<span style="color: black;">增多</span>它们,<span style="color: black;">一样</span>加权所有这些。RMSPROP还在过去的梯度方方化,但<span style="color: black;">运用</span>其指数移动平均值,从而更重要地对<span style="color: black;">近期</span>的梯度。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">此刻</span>,<span style="color: black;">经过</span>平衡梯度,它们都变<span style="color: black;">成为了</span>正面的IE。有相同的方向。这否定了<span style="color: black;">咱们</span>谈到动量的抵消效果,朝着相反方向的渐变。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这<span style="color: black;">寓意</span>着<span style="color: black;">针对</span><span style="color: black;">拥有</span>陡坡斜率的参数,梯度很大,梯度的正方形非常大并且总是正的,<span style="color: black;">因此呢</span>它们快速累积。为了<span style="color: black;">控制</span>这一点,算法通过将累积的平方梯度除以<span style="color: black;">很强</span>的<span style="color: black;">原因</span>来计算学习率。这<span style="color: black;">准许</span>它减慢陡峭的斜坡。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">类似地,<span style="color: black;">针对</span>浅斜率,累积很小,<span style="color: black;">因此呢</span>算法将累积的正方形分成较小的<span style="color: black;">原因</span>来计算学习率。这会<span style="color: black;">加强</span>温和斜坡的学习率。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">有些</span>优化器算法组合了这两种<span style="color: black;">办法</span> - <span style="color: black;">按照</span><span style="color: black;">以上</span>修改学习率以及<span style="color: black;">运用</span>动量来修改梯度。例如。亚当及其许多变种,羔羊。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">第四次改善梯度下降 - 修改学习率(<span style="color: black;">按照</span>您的培训进度)</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在上一节中,基于参数的梯度修改了学习速率。<span style="color: black;">另外</span>,<span style="color: black;">咱们</span><span style="color: black;">能够</span><span style="color: black;">按照</span>培训过程的<span style="color: black;">发展</span>来<span style="color: black;">调节</span>学习率。基于训练epoch设置的学习速率,并在该点处独立于模型的参数。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这<span style="color: black;">实质</span>上不<span style="color: black;">是由于</span>优化器完成的。事实上,它是<span style="color: black;">叫作</span>为调度器的神经网络的单独组件。我<span style="color: black;">说到</span>这是为了完整性,并展示<span style="color: black;">咱们</span>与<span style="color: black;">咱们</span>讨论的优化技术的关系,但在<span style="color: black;">这儿</span>不会进一步覆盖它们。</p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">结论</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span><span style="color: black;">此刻</span><span style="color: black;">认识</span>基于梯度下降的优化器<span style="color: black;">运用</span>的基本技术,<span style="color: black;">为何</span>它们<span style="color: black;">运用</span>,以及它们<span style="color: black;">怎样</span>彼此<span style="color: black;">关联</span>。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这使<span style="color: black;">咱们</span>能够更好地进入许多<span style="color: black;">详细</span>的优化算法,并<span style="color: black;">认识</span><span style="color: black;">她们</span><span style="color: black;">怎样</span><span style="color: black;">仔细</span>工作。我<span style="color: black;">期盼</span><span style="color: black;">火速</span>地涵盖另一篇<span style="color: black;">文案</span>……</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">让<span style="color: black;">咱们</span>继续学习!</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">(本文由闻数起舞翻译自Federico Mannucci的<span style="color: black;">文案</span>《Neural Network Optimizers Made Simple: Core algorithms and why they are needed》,转载请注明出处,原文链接:</p>https://towardsdatascience.com/neural-network-optimizers-made-simple-core-algorithms-and-why-they-are-needed-7fd072cd2788)




页: [1]
查看完整版本: 神经网络优化器的核心算法以及为么需要它们