建神经网络模型,哪种优化算法更好?35000次测试告诉你
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">萧箫 发自 凹非寺</p>量子位 <span style="color: black;">报告</span> | 公众号 QbitAI
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">想要优化自己的神经网络,却不<span style="color: black;">晓得</span>哪种优化器更适合自己?</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">又<span style="color: black;">或</span>,想<span style="color: black;">晓得</span>深度学习中梯度下降的算法到底都有<span style="color: black;">那些</span>?</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/pgc-image/7116aedf6bbc4d47b65a1b219409f5a0~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=djtd6M04PlPSFHDU%2FUizPJFQEnA%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">此刻</span>,最全面的优化算法分析来了。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">它整理了自1964年<span style="color: black;">败兴</span>,<strong style="color: blue;">几乎所有的优化<span style="color: black;">办法</span></strong> <span style="color: black;">(约130种)</span>,将它们进行了<span style="color: black;">归类</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">另外</span>,它还给出了几种基准测试<span style="color: black;">办法</span>,并用它分析了<strong style="color: blue;">1344种</strong>可能的配置<span style="color: black;">方法</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">在运行了<strong style="color: blue;">35000次测试</strong>后,它给出了非常全面的优化器算法分析介绍,并告诉你<span style="color: black;">怎样</span>用这些基准测试,为自己的深度学习模型<span style="color: black;">选取</span>最好的优化<span style="color: black;">方法</span>。</span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">优化<span style="color: black;">办法</span><span style="color: black;">详细</span>都有哪几种?</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">从下图这份密密麻麻的图表来看,迄今为止,提出的优化算法<span style="color: black;">已然</span>有130种<span style="color: black;">上下</span>。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/37d5c7ae822d41f4b4508f24e0b4cf41~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=IPC3a1mxwQfCsIpYS%2FxBB2AIE6I%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">日前</span><span style="color: black;">她们</span>还看不出来区别,但在测试结果中<span style="color: black;">能够</span><span style="color: black;">发掘</span>,这些优化器<span style="color: black;">显著</span>能被分成两类,一种适用于VAE<span style="color: black;">(变分自编码器)</span>,另一种则不适用于VAE。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">而从这些优化器中的常用参数来看,α0<span style="color: black;">暗示</span>初始学习率,αlo和αup<span style="color: black;">表率</span>上下界,∆t<span style="color: black;">暗示</span>切换衰减样式的周期,k<span style="color: black;">暗示</span>衰减因子。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">能够</span>看出,这些学习率的参数<span style="color: black;">重点</span><span style="color: black;">能够</span>被分为常数、梯度下降、平滑下降、周期性、预热、超收敛等几种。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/ff55c900ce6f4d61a344e338b418dca4~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=ENocnnh5Y%2FQ5zfeT71p2e0aIxLM%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">那样</span>,130多种优化器,哪种才是最适用的?而对这些参数进行<span style="color: black;">调节</span>,到底能对优化器起到多大的<span style="color: black;">功效</span>?</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">用基准测试<span style="color: black;">办法</span>来测测,就<span style="color: black;">晓得</span>了。</span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">8种基准测试<span style="color: black;">办法</span></h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">如下图,作者提出了8种优化任务,在这些任务上面进行测试,以得到对比结果。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/58c44e1c871f44fe826ec6ee854c2936~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=6VTacoU4RJ4Pnqz13gwx%2BhnHyAE%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">从图中看来,无论是数据集<span style="color: black;">(MNIST、CIFAR-10等)</span>、模型<span style="color: black;">(VAE、CNN、RNN等)</span>,还是任务<span style="color: black;">(<span style="color: black;">归类</span>、NLP等)</span>和标准<span style="color: black;">(损失率、精度)</span>都不<span style="color: black;">同样</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">另外</span>,batchsize<span style="color: black;">亦</span><span style="color: black;">思虑</span>在内<span style="color: black;">(看来实验<span style="color: black;">设备</span>性能不错)</span>。制作这些测试的目的在于,多<span style="color: black;">方向</span>考量出这些优化<span style="color: black;">办法</span>的<span style="color: black;">恰当</span>性。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">测试<span style="color: black;">根据</span>下图流程走,整体算下来,共有1344种配置,共运行接近35000次。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/4d60d7c6edcf4769b63f3c7d711af373~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=07Ff1S%2FE7fjT0b5oL9ZHaN21RDI%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">为了得知哪种优化<span style="color: black;">办法</span>更合适,<span style="color: black;">这般</span>做<span style="color: black;">亦</span>是很拼了。</span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">怎样</span><span style="color: black;">选取</span>适合自己的优化<span style="color: black;">办法</span>?</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">那样</span>,<span style="color: black;">详细</span><span style="color: black;">怎样</span><span style="color: black;">选取</span>适合的优化<span style="color: black;">办法</span>呢?</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">下图是作者随机<span style="color: black;">选择</span>的14个优化器。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/97162e0b96ff4ed8aae016c546263dae~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=%2BsjtP8SQ0fdPbk4li%2FWyHgCeDAA%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">下图是这些优化器在上面8种基准测试下的表现结果。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/d2d48949d1e546db9dac737077d26fa9~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=7%2FCgkPNm1l6nwLObQ85FAqOlREw%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">其中,红色的I<span style="color: black;">暗示</span>误差范围。<span style="color: black;">能够</span>看出,在<span style="color: black;">必定</span>误差范围内,某一类优化<span style="color: black;">办法</span>的性能几乎非常<span style="color: black;">类似</span>:它们在<span style="color: black;">各样</span>基准测试上的表现都不错。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">为了验证这些测试<span style="color: black;">办法</span>的稳定性,作者特意对其中<span style="color: black;">有些</span>算法进行了参数<span style="color: black;">调节</span>,下图是经典算法RMSProp和RMSProp(2)的调优结果。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/6f0ccbef1b234bf790ac6a4dc8be0f7f~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=3vsqJ2z%2BbwGUON6sV%2B7OuZAzJ4M%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">可见,<span style="color: black;">区别</span>的参数能给优化算法的性能带来不小的波动变化。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">更直接地,<span style="color: black;">倘若</span><span style="color: black;">增多</span>(性能)预算,从下图<span style="color: black;">能够</span>看出,性能的改进<span style="color: black;">亦</span>会有所<span style="color: black;">增多</span>。<span style="color: black;">(图中橙色为所有灰线的中值)</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/65eaa97ba42f4188ba1befed94675883~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=3bGQHGcodHC0yLXe4iPmoQaAEnU%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">亦</span><span style="color: black;">便是</span>说,即使优化算法的性能不错,<span style="color: black;">恰当</span>调参仍然不可或缺。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">那样</span>,到底有多少优化器存在“改进参数,竟然能大幅<span style="color: black;">增多</span>优化能力”的问题呢?</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">还不少。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">从下图来看,绿色<span style="color: black;">暗示</span>优化过后,优化算法能更好地运行。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/02b4c217b2854fee9216e427462f7183~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=u9ZPLFd%2BGqWL8%2FldSKC1T18qsss%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">换而言之,只要某种优化算法的结果是一片绿,<span style="color: black;">那样</span>它原来的默认参数就真的很糟糕……</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">例如,AMSGrad、Mom、NAG的默认参数都存在很大的改进空间。相比而言,AMSBound<span style="color: black;">因为</span>自适应,默认参数都还非常不错,不需要再有大改进。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">对这些优化器进行<span style="color: black;">评定</span>后,<span style="color: black;">科研</span>者们得出以下几个结论:</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">1、优化器的性能,在<span style="color: black;">区别</span>的任务中有很大差异;</span></span></p><span style="color: black;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">2、事实上,大部分优化器的性能惊人地<span style="color: black;">类似</span>,<span style="color: black;">日前</span>尚<span style="color: black;">无</span>“最通用”的优化<span style="color: black;">办法</span>;</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">3、对优化器进行(参数)微调,其实和<span style="color: black;">选取</span>优化器<span style="color: black;">同样</span>重要、<span style="color: black;">乃至</span>更重要。</span></span></p>
</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">不外</span>,虽然这份表格<span style="color: black;">已然</span>非常<span style="color: black;">仔细</span>,还是有细心的网友<span style="color: black;">发掘</span>了盲点:像SWA<span style="color: black;">这般</span>非常简单<span style="color: black;">有效</span>的<span style="color: black;">办法</span>,还是在分析时被遗漏了。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/dfb4a05d3b864dabb39f97057630d83a~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=cUAO2G3oLhZgfTkjPUCYx%2FAqUao%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">当然,就提出的几种基准测试<span style="color: black;">来讲</span>,<span style="color: black;">已然</span>适合用于分析大部分优化器的<span style="color: black;">选取</span><span style="color: black;">方法</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">日前</span>,作者<span style="color: black;">已然</span>在ArXiv论文页面,开源了基准测试<span style="color: black;">办法</span>的Code,感兴趣的小伙伴可戳论文<span style="color: black;">位置</span>查看~</span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">作者介绍</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这几位作者都来自于德国图宾根大学。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/4582b040bb364ac6a35b222594126094~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=MZ4PIDu2s%2Beo%2BWdpQ%2BnwVCJUvqQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Robin M. Schmidt,计算机专业<span style="color: black;">科研</span>生,<span style="color: black;">重点</span><span style="color: black;">科研</span>方向是人工智能,感兴趣的方向在深度学习、强化学习及优化上。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/295dcf648df64c21bb51b1bc54237736~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=vad08f%2FT5rGg9w4VCBNBFQUalA0%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Philipp Hennig,<span style="color: black;">设备</span>学习教授,兼任马普所<span style="color: black;">专家</span>,曾于海德堡大学和帝国理工学院修读<span style="color: black;">理学</span>,并在剑桥大学<span style="color: black;">得到</span><span style="color: black;">设备</span>学习博士学位。</span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/242aff7ec7bb494e83a8332444d79a96~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725649160&x-signature=rpbF5nJIG433ui2ehnflp0gvy%2BQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Frank Schneider,<span style="color: black;">设备</span>学习博士生,<span style="color: black;">科研</span><span style="color: black;">行业</span>是<span style="color: black;">设备</span>学习的优化<span style="color: black;">办法</span>。<span style="color: black;">日前</span>在钻研深度学习的超参数,使深度神经网络的训练自动化。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">论文<span style="color: black;">位置</span>:</span><span style="color: black;">https://arxiv.org/abs/2007.01547</span></span><span style="color: black;"><span style="color: black;">— 完 —</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">量子位 QbitAI · 头条号签约</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">关注<span style="color: black;">咱们</span>,<span style="color: black;">第1</span>时间获知前沿科技动态</span></span></p>
我完全同意你的观点,说得太对了。 说得好啊!我在外链论坛打滚这么多年,所谓阅人无数,就算没有见过猪走路,也总明白猪肉是啥味道的。 感谢你的精彩评论,为我的思绪打开了新的窗口。 你的话深深触动了我,仿佛说出了我心里的声音。
页:
[1]