u1jodi1q 发表于 2024-10-29 18:35:56

运用蜜蜂采集器时怎么样检测百度网盘链接是不是有效


    <h1 style="color: black; text-align: left; margin-bottom: 10px;">蜜蜂采集器<span style="color: black;">运用</span>教程 - 检测百度网盘链接<span style="color: black;">是不是</span>有效</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">建站过程中,可能需要<span style="color: black;">运用</span>到百度网盘等第三方网盘。而不少<span style="color: black;">状况</span>下,网盘分享链接可能<span style="color: black;">已然</span>失效。<span style="color: black;">倘若</span>网站<span style="color: black;">长时间</span>存在<span style="color: black;">海量</span>失效链接,对网站用户留存的影响非常大。对此,比较好的处理<span style="color: black;">办法</span>,<span style="color: black;">便是</span>检测链接<span style="color: black;">是不是</span>有效。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本文以蜜蜂采集器为例,调用百度网盘链接有效性验证插件,实现链接有效性检测功能。</span></p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">插件介绍</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">蜜蜂采集器的插件功能,分为列表页网址插件、数据处理插件、标签数据处理插件、文件上传插件、内容发布插件、<span style="color: black;">信息</span><span style="color: black;">通告</span>插件。每一种插件都支持PHP、Python、Nodejs、Go四种编程语言。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">其中,标签数据处理插件是在数据采集过程中对单个标签字段进行标签数据二次处理时调用的,针对的是单条数据记录的单个标签字段。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">针对</span>链接有效性检测功能,<span style="color: black;">通常</span>是在内容采集<span style="color: black;">周期</span>,对标签内容进行处理。<span style="color: black;">因此</span>,<span style="color: black;">这儿</span>是采用标签数据处理插件实现的。</span></p>
    <h1 style="color: black; text-align: left; margin-bottom: 10px;">插件<span style="color: black;">运用</span>说明</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">运用</span><span style="color: black;">办法</span></span></p><span style="color: black;">添加<span style="color: black;">外边</span>程序(Python)。</span><span style="color: black;">添加或导入标签数据处理插件,添加标签数据处理配置。</span><span style="color: black;">python需要安装urllib3组件: pip install urllib3 。 如安装后还是提示找不到组件,可重启采集器进程。</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">返回内容</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">以</span><span style="color: black;"><span style="color: black;">网址链接形式 + 检测结果</span></span><span style="color: black;">组合返回,一行<span style="color: black;">表示</span>一条检测结果。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">返回结果可能<span style="color: black;">包括</span>以下字符串:</span><span style="color: black;"><span style="color: black;">检测失败,<span style="color: black;">无</span>找到百度网盘链接</span></span><span style="color: black;">,</span><span style="color: black;"><span style="color: black;">检测失败,xxx</span></span><span style="color: black;">,</span><span style="color: black;"><span style="color: black;">检测成功,链接状态正常</span></span><span style="color: black;">。<span style="color: black;">因此呢</span>,<span style="color: black;">倘若</span><span style="color: black;">期盼</span>所有链接必须<span style="color: black;">所有</span>有效,<span style="color: black;">能够</span>添加内容过滤:不得包含</span><span style="color: black;"><span style="color: black;">检测失败</span></span><span style="color: black;">;<span style="color: black;">倘若</span><span style="color: black;">期盼</span><span style="color: black;">最少</span>一个链接有效,<span style="color: black;">能够</span>添加内容过滤:必须<span style="color: black;">包括</span></span><span style="color: black;"><span style="color: black;">检测成功</span></span><span style="color: black;">;<span style="color: black;">倘若</span>对<span style="color: black;">无</span>网盘链接的,<span style="color: black;">亦</span><span style="color: black;">做为</span>成功处理,则<span style="color: black;">能够</span>先字符串替换,替换</span><span style="color: black;"><span style="color: black;">检测失败,<span style="color: black;">无</span>找到百度网盘链接</span></span><span style="color: black;">为</span><span style="color: black;"><span style="color: black;">检测成功</span></span><span style="color: black;">,再添加内容过滤:必须<span style="color: black;">包括</span></span><span style="color: black;"><span style="color: black;">检测成功</span></span><span style="color: black;">。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">插件返回结果示例:</span></p>.../<span style="color: black;">s/abcdabcdabcdabcdabcdabcdabcd?pwd=1234 : 检测成功,链接状态正常
      .../s/abcdabcdabcdabcdabcdabcdabcd</span>?pwd=<span style="color: black;">1234</span> : 检测成功,链接状态正常
    .../<span style="color: black;">s</span>/abcdabcdabcdabcdabcdabcdabcd?pwd=<span style="color: black;">1234</span> : 检测成功,链接状态正常<h1 style="color: black; text-align: left; margin-bottom: 10px;">功能实现</h1>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">关于采集规则的编写,<span style="color: black;">这儿</span>略过。<span style="color: black;">重点</span>说一下链接有效性检测功能。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">打开主菜单“<span style="color: black;">帮忙</span>”——“应用市场”。类型<span style="color: black;">选取</span>“标签数据处理插件”。搜索“百度网盘链接有效性”,<span style="color: black;">能够</span>看到“百度网盘链接有效性验证公共版”。“百度网盘链接有效性验证公共版”无需申请百度网盘开放平台接口权限,但有<span style="color: black;">运用</span>频次限制。<span style="color: black;">选取</span>对应插件,点击“下载”导入。</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/93949107d9874dce84788bdcf8cf4828~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1729513586&amp;x-signature=m7VLQXzv2VogbcnFz7TdqtP65eI%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">导入插件</p>
    </div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">如上图,导入时,需要指定<span style="color: black;">外边</span>程序。该插件采用Python语言实现,<span style="color: black;">倘若</span>还<span style="color: black;">无</span>安装Python,请先到<span style="color: black;">外边</span>程序管理器中,下载安装Python,再添加Python到<span style="color: black;">外边</span>程序中。导入时,<span style="color: black;">意见</span><span style="color: black;">选取</span>“<span style="color: black;">同期</span>自动创建标签数据处理配置”。创建标签数据处理配置后,<span style="color: black;">能够</span>在“标签数据处理配置管理”列表中找到对应的配置项。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">转到采集规则的编辑窗口。对某个标签,启用“标签数据二次处理”,并添加“调用插件”处理项,<span style="color: black;">选取</span><span style="color: black;">刚才</span>配置好的标签数据处理配置,<span style="color: black;">就可</span>。点击下方的“测试”,测试<span style="color: black;">是不是</span>正常执行。如图:</span></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/4ebdf8a1e5b64b759b2c00c28248a509~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1729513586&amp;x-signature=rVKSQr0BqlRidEY3sQaEln%2F9xUU%3D" style="width: 50%; margin-bottom: 20px;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">调用插件</p>
    </div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">这儿</span>调用插件后,会替换标签内容的,<span style="color: black;">因此</span>,用于检测链接的标签应该是单独的“检测”标签,仅仅用于检测有效性,<span style="color: black;">不消</span>于采集内容输出。对此检测标签,还<span style="color: black;">能够</span>添加内容过滤,<span style="color: black;">例如</span>,不得<span style="color: black;">包括</span></span><span style="color: black;"><span style="color: black;">检测失败</span></span><span style="color: black;">,则<span style="color: black;">需求</span>所有提取的网盘链接都是有效的。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">至此,<span style="color: black;">咱们</span>就<span style="color: black;">能够</span>在自己的采集规则中检测百度网盘链接<span style="color: black;">是不是</span>有效了。</span></p>




页: [1]
查看完整版本: 运用蜜蜂采集器时怎么样检测百度网盘链接是不是有效