PHP判断拜访者是不是百度蜘蛛的几种办法
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">php<span style="color: black;">怎样</span>判断来访者是不是真正的百度蜘蛛?有<span style="color: black;">无</span>一种完美一点的<span style="color: black;">处理</span><span style="color: black;">方法</span>。要判断来访者<span style="color: black;">是不是</span>是真正的百度蜘蛛,本文为打算采用三种<span style="color: black;">办法</span>来<span style="color: black;">处理</span>这个问题!</p>
<h2 style="color: black; text-align: left; margin-bottom: 10px;">1. User-Agent(用户代理)<span style="color: black;">办法</span></h2>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在HTTP请求头中,<span style="color: black;">每一个</span>浏览器或爬虫都会<span style="color: black;">运用</span>一个User-Agent标识自己。百度蜘蛛的User-Agent<span style="color: black;">一般</span>包含"baidu"关键字。<span style="color: black;">因此呢</span>,你<span style="color: black;">能够</span>获取来访者的User-Agent,<span style="color: black;">而后</span><span style="color: black;">检测</span><span style="color: black;">是不是</span><span style="color: black;">包括</span>"baidu"关键字来判断<span style="color: black;">是不是</span>为百度蜘蛛。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">示例代码:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">if(strpos($_SERVER, baidu) !== false) {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> // 是百度蜘蛛</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">} else {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">// 不是百度蜘蛛</p>}<h2 style="color: black; text-align: left; margin-bottom: 10px;">2. IP<span style="color: black;">位置</span><span style="color: black;">办法</span></h2>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">百度蜘蛛<span style="color: black;">拜访</span>网站时会<span style="color: black;">运用</span>固定的IP<span style="color: black;">位置</span>(例如180.76.15.0/24)。<span style="color: black;">因此呢</span>,你<span style="color: black;">能够</span>获取来访者的IP<span style="color: black;">位置</span>,<span style="color: black;">而后</span>将其与百度蜘蛛的IP<span style="color: black;">位置</span>进行比较,<span style="color: black;">倘若</span>匹配成功,则<span style="color: black;">暗示</span>来访者是百度蜘蛛。当然这一步,你需要收集所有百度<span style="color: black;">颁布</span>的IP<span style="color: black;">位置</span>,<span style="color: black;">才可</span>够准确识别。<span style="color: black;">倘若</span>你收集的百度蜘蛛的IP<span style="color: black;">位置</span>比较准确全面,<span style="color: black;">那样</span>这<span style="color: black;">便是</span>一个比较准确的<span style="color: black;">办法</span>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">示例代码:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$visitor_ip = $_SERVER;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">if(preg_match(/^180\.76\.15\.{1,3}$/, $visitor_ip)) {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> // 是百度蜘蛛</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">} else {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> // 不是百度蜘蛛</p>}<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_jpg/fm7RnX9X5QSNKbaN7Qg380uNbhKqibD0c13moQcAxv8Jj51Lzia0hHZpUrJJZ65Tq0OrPVowgUxYIFtY72mKicBgQ/640?wx_fmt=jpeg&from=appmsg&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<h2 style="color: black; text-align: left; margin-bottom: 10px;">3. Reverse DNS(反向DNS)<span style="color: black;">办法</span></h2>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">百度蜘蛛<span style="color: black;">拜访</span>网站时其DNS解析<span style="color: black;">一般</span>会反向解析为以".baidu.com"结尾的域名。<span style="color: black;">因此呢</span>,你<span style="color: black;">能够</span>获取来访者的IP<span style="color: black;">位置</span>,<span style="color: black;">而后</span>将其反向解析为域名,再<span style="color: black;">检测</span>解析得到的域名<span style="color: black;">是不是</span>以".baidu.com"结尾,从而判断<span style="color: black;">是不是</span>为百度蜘蛛。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">示例代码:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$visitor_ip = $_SERVER;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$visitor_domain = gethostbyaddr($visitor_ip);</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">if(substr($visitor_domain, -10) === .baidu.com) {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> // 是百度蜘蛛</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">} else {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> // 不是百度蜘蛛</p>}<h2 style="color: black; text-align: left; margin-bottom: 10px;">4. 综合手段判断</h2>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">以上<span style="color: black;">办法</span>并<span style="color: black;">不可</span>完全<span style="color: black;">保准</span>判断准确,<span style="color: black;">由于</span>用户<span style="color: black;">能够</span>伪造User-Agent、IP<span style="color: black;">位置</span>和DNS解析结果。但结合<span style="color: black;">运用</span>这些<span style="color: black;">办法</span>,<span style="color: black;">能够</span>过滤掉大部分非真实的百度蜘蛛<span style="color: black;">拜访</span>。当对来访者进行判断时,综合<span style="color: black;">运用</span>User-Agent和反向DNS<span style="color: black;">办法</span><span style="color: black;">能够</span><span style="color: black;">加强</span>判断准确率。以下是一个综合<span style="color: black;">运用</span>这两种<span style="color: black;">办法</span>的示例代码:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$visitor_user_agent = $_SERVER;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$visitor_ip = $_SERVER;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$is_baidu_spider = false;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">// <span style="color: black;">检测</span>User-Agent中<span style="color: black;">是不是</span><span style="color: black;">包括</span>"baidu"关键字</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">if(strpos($visitor_user_agent, baidu) !== false) {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> $is_baidu_spider = true;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">}</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">// <span style="color: black;">检测</span>反向解析得到的域名<span style="color: black;">是不是</span>以".baidu.com"结尾</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">if(!$is_baidu_spider) {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">$visitor_domain = gethostbyaddr($visitor_ip);</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> if(substr($visitor_domain, -10) === .baidu.com) {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> $is_baidu_spider = true;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> }</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">}</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">if($is_baidu_spider) {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> // 是百度蜘蛛</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">} else {</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> // 不是百度蜘蛛</p>}<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在<span style="color: black;">以上</span>代码中,<span style="color: black;">首要</span><span style="color: black;">检测</span>User-Agent<span style="color: black;">是不是</span>包含"baidu"关键字,<span style="color: black;">倘若</span><span style="color: black;">包括</span>则将$is_baidu_spider设置为true。<span style="color: black;">而后</span>,<span style="color: black;">经过</span>反向DNS<span style="color: black;">办法</span>获取来访者的域名,再<span style="color: black;">检测</span>域名<span style="color: black;">是不是</span>以".baidu.com"结尾,<span style="color: black;">倘若</span>是则将$is_baidu_spider设置为true。最后,<span style="color: black;">按照</span>$is_baidu_spider的值来判断<span style="color: black;">是不是</span>为百度蜘蛛。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">综合<span style="color: black;">运用</span>User-Agent和反向DNS<span style="color: black;">办法</span><span style="color: black;">能够</span><span style="color: black;">经过</span>两个方面的判断来<span style="color: black;">加强</span>准确率。然而,仍然<span style="color: black;">没法</span><span style="color: black;">保准</span>100%准确,<span style="color: black;">由于</span>用户<span style="color: black;">能够</span>伪造User-Agent和DNS解析结果。为了进一步<span style="color: black;">加强</span>准确率,<span style="color: black;">能够</span>结合其他<span style="color: black;">办法</span>,如IP<span style="color: black;">位置</span><span style="color: black;">办法</span>,进行更全面的综合判断。<span style="color: black;">这儿</span><span style="color: black;">为何</span><span style="color: black;">无</span><span style="color: black;">运用</span>IP<span style="color: black;">位置</span>来判断,<span style="color: black;">由于</span><span style="color: black;">倘若</span>你的收集的百度蜘蛛的IP<span style="color: black;">位置</span>够全够准确,其实<span style="color: black;">运用</span>第二种<span style="color: black;">办法</span>的准确率<span style="color: black;">已然</span>高过于其他的<span style="color: black;">办法</span>了。<span style="color: black;">咱们</span>为了<span style="color: black;">供给</span>准确度、<span style="color: black;">亦</span><span style="color: black;">能够</span>借用第三方库!<span style="color: black;">或</span><span style="color: black;">经过</span>综合手段来<span style="color: black;">增多</span>识别能力!</p>
期待你更多的精彩评论,一起交流学习。 楼主果然英明!不得不赞美你一下! 交流如星光璀璨,点亮思想夜空。 你的见解独到,让我受益匪浅,期待更多交流。 楼主果然英明!不得不赞美你一下!
页:
[1]