分布式图网络怎么样优化网易云音乐的举荐系统?
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">在“<span style="color: black;">精细</span><span style="color: black;">举荐</span>者得民心”的今天,<span style="color: black;">举荐</span>系统已<span style="color: black;">作为</span>各大互联网<span style="color: black;">机构</span>的标配。但<span style="color: black;">因为</span>现实中<span style="color: black;">非常多</span>数据是非欧氏空间生成的(例如,社交网络、信息网络等),一</span><span style="color: black;">些<span style="color: black;">繁杂</span>场景下的业务<span style="color: black;">需要</span>很难<span style="color: black;">经过</span>协同过滤等基于历史<span style="color: black;">行径</span>挖掘用户或<span style="color: black;">制品</span><span style="color: black;">类似</span>性的传统算法来满足。图神经网络<span style="color: black;">做为</span>一种约束性较少、极其灵活的数据表征方式,在深度学习各<span style="color: black;">重点</span>领域中崭露头角,一系列图学习模型涌现并得到越来越多的应用。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">网易云音乐在<span style="color: black;">举荐</span><span style="color: black;">行业</span>的探索</strong></span></p><img src="https://mmbiz.qpic.cn/mmbiz_png/5ddyukqqNUs0h3gBR1gexR1OpzkqMBAVBskYM0B8HOG9Y5SChfsQya1icmaGY0q9fPm4AOj6eejlibrcBJbroDuA/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"><span style="color: black;"><span style="color: black;">做为</span><span style="color: black;">百姓</span>级的音乐App,网易云音乐很久之前就将定位从传统的音乐工具软件转移到音乐内容社区,致力于联结泛音乐<span style="color: black;">制品</span>与用户,打造最懂用户的音乐 App。在音乐内容社区中,直播<span style="color: black;">能够</span>说是用户参与度极高的场景了,云音乐内部投入了大量的人力物力以求将匹配度更高的主播<span style="color: black;">举荐</span>给用户,但仍然面临多重严峻的挑战。</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">怎样</span>破解历史<span style="color: black;">行径</span></strong></span><strong style="color: blue;"><span style="color: black;">稀疏</span>的用户冷<span style="color: black;">起步</span>问题</strong></p><span style="color: black;">众所周知,<span style="color: black;">举荐</span>系统的整体框架<span style="color: black;">重点</span><span style="color: black;">包含</span>召回、粗排和精排3个部分。其中,最底层的召回模型<span style="color: black;">拥有</span>举足轻重的<span style="color: black;">功效</span>,而成功的召回推理需要依赖充足的历史数据。但在云音乐的业务场景中,<span style="color: black;">经过</span>站内<span style="color: black;">宣传</span>看到直播<span style="color: black;">举荐</span>的用户很大比例是直播功能的新用户,即<span style="color: black;">无</span>产生过观看直播<span style="color: black;">行径</span>数据的用户。<span style="color: black;">怎样</span>向这类数据稀疏的用户<span style="color: black;">举荐</span>合适的内容<span style="color: black;">成为了</span>亟待<span style="color: black;">处理</span>的<span style="color: black;">困难</span>,这类问题<span style="color: black;">亦</span><span style="color: black;">一般</span>被<span style="color: black;">叫作</span>为冷<span style="color: black;">起步</span>。</span><img src="https://mmbiz.qpic.cn/mmbiz_png/5ddyukqqNUs0h3gBR1gexR1OpzkqMBAVMN7oiaHzEyr9K2wJ0NIJicz2bI4LWIuIw4iabBcBzJUaxF44kWEPIR7Cg/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">大规模图模型<span style="color: black;">怎样</span>训练?</strong></span></p><span style="color: black;">云音乐现有计算资源已全面实现容器化<span style="color: black;">安排</span>,<span style="color: black;">针对</span>各个业务团队<span style="color: black;">来讲</span>,计算资源都是有限的,需要以最<span style="color: black;">有效</span><span style="color: black;">恰当</span>的方式利用有限的资源。如<span style="color: black;">安在</span>有限的分布式资源调控策略下低本<span style="color: black;">有效</span>地完成大规模图神经网络的模型训练,<span style="color: black;">作为</span>必须<span style="color: black;">解决</span>的<span style="color: black;">困难</span>。</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">PGL图神经网络助力</strong><strong style="color: blue;"><span style="color: black;">举荐</span>场景落地</strong></span></p><span style="color: black;">为<span style="color: black;">认识</span>决以上问题,网易云音乐的<span style="color: black;">开发</span>团队调研了<span style="color: black;">海量</span>开源<span style="color: black;">方法</span>,<span style="color: black;">最后</span><span style="color: black;">选取</span>了对大规模图训练更加友好的百度飞桨分布式图学习框架PGL,<span style="color: black;">做为</span>云音乐的<span style="color: black;">基本</span>框架。</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">基于PGL的<span style="color: black;">行径</span>域知识迁移<span style="color: black;">处理</span>冷<span style="color: black;">起步</span>问题</strong></span></p><span style="color: black;">云音乐直播场景的新用户中,有<span style="color: black;">非常多</span>在音乐、歌单、Mlog 等业务中产生过较丰富的历史<span style="color: black;">行径</span>,能否<span style="color: black;">经过</span>将这部分历史<span style="color: black;">行径</span>知识映射到直播<span style="color: black;">行业</span>,来<span style="color: black;">处理</span>“<span style="color: black;">行径</span>”数据不足的问题呢?</span><span style="color: black;">带着疑问,云音乐引入了图模型结构,以多种<span style="color: black;">区别</span>类型的实体(如歌曲、DJ、Query、RadioID 等)为节点,<span style="color: black;">经过</span>用户与主播、用户与歌曲、Query与主播等历史<span style="color: black;">行径</span>关系,构建了一张统一的图关系网络。</span><span style="color: black;"><span style="color: black;">而后</span>,基于飞桨图学习框架 PGL对图模型进行训练。先采用 DeepWalk、Metapath2Vec、GraphSage等模型学习出足够强大的Graph Embedding<span style="color: black;">暗示</span>来建模实体ID;再<span style="color: black;">经过</span>向量召回,将用户在歌曲、Query等处的<span style="color: black;">行径</span>迁移到主播<span style="color: black;">行业</span>,达到召回合适主播的目的。</span><img src="https://mmbiz.qpic.cn/mmbiz_png/5ddyukqqNUs0h3gBR1gexR1OpzkqMBAVy7jQ3YSkdic583EpmCJB97uXYGRV08pWAZDEoUrplpgv4jicdE1aztaQ/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">基于PGL通用的分布式能力进行训练 </strong></span></p><span style="color: black;">云音乐的数据规模非常庞大,数据关系即使经过裁剪<span style="color: black;">亦</span>高达亿级别以上。在常用的硬件资源配备<span style="color: black;">状况</span>下,此等量级规模的数据早已<span style="color: black;">作为</span>某些开源的图神经网络框架的瓶颈,需要<span style="color: black;">运用</span>极其昂贵的计算资源<span style="color: black;">才可</span><span style="color: black;">处理</span>。<span style="color: black;">针对</span>数据规模必将<span style="color: black;">连续</span>增大的云音乐<span style="color: black;">来讲</span>,相较于<span style="color: black;">运用</span>什么类型的模型,能否在这种数据规模下训练出模型才是优先要<span style="color: black;">思虑</span>的关键问题,<span style="color: black;">亦</span>是网易云音乐与PGL成功牵手的关键<span style="color: black;">原因</span>!</span><span style="color: black;">百度飞桨深度学习平台PaddlePaddle 2019年开源的分布式图学习框架PGL,原生支持图学习中较为独特的分布式图存储(Distributed Graph Storage)和分布式采样(Distributed Sampling),<span style="color: black;">能够</span>方便地<span style="color: black;">经过</span>上层Python接口,将 图的特征(如Side Feature等)存储在<span style="color: black;">区别</span>的Server上,<span style="color: black;">亦</span>支持通用的分布式采样接口,将<span style="color: black;">区别</span>子图的采样分布式处理,并基于PaddlePaddle Fleet API来完<span style="color: black;">成份</span>布式训练(Distributed Training),实<span style="color: black;">此刻</span>分布式的“瘦计算节点”上加速计算。这些能力对云音乐内容社区直播<span style="color: black;">举荐</span>遇到的训练问题<span style="color: black;">来讲</span>,极具魅力!</span><span style="color: black;">实验对比<span style="color: black;">表示</span>,在主播<span style="color: black;">举荐</span>场景采用图计算带来有效观看大幅<span style="color: black;">提高</span>,尤其在新用户和新主播冷<span style="color: black;">起步</span>上引入其它域数据后有了<span style="color: black;">显著</span><span style="color: black;">提高</span>。</span><img src="https://mmbiz.qpic.cn/mmbiz_png/5ddyukqqNUs0h3gBR1gexR1OpzkqMBAVWEntdiaicf3cAibPibZmibiaLqiaj16bwxa3LWZ5sUby79ydrzOZ30PfP0cCg/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">想<span style="color: black;">认识</span><span style="color: black;">更加多</span>落地细节和实战经验?</strong></span></p><span style="color: black;">3月16日,网易云音乐<span style="color: black;">设备</span>学习平台与框架负责人段石石,将在飞桨B站直播间分享深度学习实战进阶课程《图神经网络在云音乐业务落地》。除了上面<span style="color: black;">说到</span>的数据稀疏性、冷<span style="color: black;">起步</span>召回和大规模分布式训练等业务<span style="color: black;">困难</span>的<span style="color: black;">处理</span><span style="color: black;">方法</span>,段老师还将分享云音乐<span style="color: black;">怎样</span>应对训练数据质量、瘦计算节点等技术挑战。</span><span style="color: black;">3月17日,百度高级算法工程师苏炜跃将分享《分布式图学习框架PGL及其<span style="color: black;">举荐</span>应用》,重点介绍图学习算法的理论<span style="color: black;">基本</span>、图学习框架PGL的特点和<span style="color: black;">优良</span>;<span style="color: black;">同期</span>将<span style="color: black;">经过</span>演示经典大规模<span style="color: black;">举荐</span>场景的图学习训练过程,<span style="color: black;">帮忙</span><span style="color: black;">大众</span>快速学习和实现产业级的图模型实践。</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">扫描下方二维码,加入技术交流群</span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/NvqaDFQAo1ibNb7MjH2FvIicwUhQ2thtMaO0Hv15JqKHw8micTNsYgH4Mias0ic2BKd3sibpVic1tRbJHNSfib10aHtTBA/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">想<span style="color: black;">认识</span><span style="color: black;">更加多</span>落地细节和实战经验,3月16、17日20:10-21:30锁定AI快车道x网易云音乐直播课,<span style="color: black;">咱们</span>不见不散!</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/NvqaDFQAo1ibNb7MjH2FvIicwUhQ2thtMawxrGdpR0ee83N9cSJActDSNwLC2ASfvicIGegHlzJARJyLcrPuUBD3Q/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">飞桨图学习框架PGL</strong></span></p><span style="color: black;">PGL是业界首个提出通用<span style="color: black;">信息</span>并行传递机制,支持百亿规模巨图的工业级图学习框架。PGL基于飞桨动态图全新升级,<span style="color: black;">极重</span><span style="color: black;">提高</span>了易用性,原生支持异构图,覆盖30+图学习模型,<span style="color: black;">包含</span>图语义理解模型ERNIESage等,历经<span style="color: black;">海量</span>真实工业应用验证。<span style="color: black;">另一</span>,基于飞桨深度学习框架的分布式Fleet API,<span style="color: black;">创立</span>分布式图存储及分布式学习算法,实现灵活、<span style="color: black;">有效</span>地搭建前沿的大规模图学习算法。</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">更加多</span>资料请关注</strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">PGL图学习框架Github代码仓库:https://github.com/PaddlePaddle/PGL</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">飞桨<span style="color: black;">举荐</span>系统:https://github.com/PaddlePaddle/paddlerec</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">飞桨分布式:</span><span style="color: black;">https://fleet-x.readthedocs.io/en/latest/</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">飞桨深度学习框架Github代码仓库:https://github.com/PaddlePaddle/Paddle</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">针对</span>想要<span style="color: black;">认识</span>图神经网络及其分布式应用的小伙伴,<span style="color: black;">能够</span>围观PGL团队倾力<span style="color: black;">研发</span>的图神经网络课程,带你七天<span style="color: black;">有效</span>入门:https://github.com/PaddlePaddle/PGL/tree/main/course</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">如感觉不错,欢迎“Star”;如需交流,欢迎“Issue”,<span style="color: black;">咱们</span>将<span style="color: black;">即时</span>反馈;如您有基于飞桨的产业落地案例,欢迎发送至邮件paddle-up@baidu.com。</span></p>
大势所趋,用于讽刺一些制作目的就是为了跟风玩梗,博取眼球的作品。 我深感你的理解与共鸣,愿对话长流。 楼主的文章非常有意义,提升了我的知识水平。
页:
[1]