网站日志数据分析教程
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">网站日志的数据分析<span style="color: black;">重点</span>是<span style="color: black;">运用</span><span style="color: black;">关联</span><span style="color: black;">工具</span>进行,<span style="color: black;">工具</span>类型<span style="color: black;">亦</span>有<span style="color: black;">非常多</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">网页版<span style="color: black;">能够</span>用拉格好(www.loghao.com),桌面版<span style="color: black;">能够</span>用爱站<span style="color: black;">或</span>光年,<span style="color: black;">亦</span><span style="color: black;">能够</span><span style="color: black;">运用</span>shell分析日志。。。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">分析日志的<span style="color: black;">功效</span>有<span style="color: black;">非常多</span>,<span style="color: black;">能够</span>概括几点:</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">1.<span style="color: black;">认识</span>蜘蛛对页面的抓取<span style="color: black;">状况</span>,<span style="color: black;">恰当</span>分配网站内链,优化抓取路径;</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">2.统计栏目页面的流量数据<span style="color: black;">状况</span>,对其做相应的策略<span style="color: black;">调节</span>(例如数据下降,<span style="color: black;">能够</span>分析<span style="color: black;">原由</span>,对<span style="color: black;">另一</span>一个栏目页面做AB测试进行观察等等);</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">3.提取出404页面,提交给百度进行处理;</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">4.<span style="color: black;">倘若</span>是网站被黑,<span style="color: black;">能够</span>分析日志查看网站操作记录,以及找出假的百度蜘蛛IP等;</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">将日志文件下载至本地,我<span style="color: black;">这儿</span>是宝塔,<span style="color: black;">通常</span>在www根目录<span style="color: black;">能够</span>找到日志文件。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">运用</span>网页版功能有限,只能看出来蜘蛛的抓取数量和返回代码<span style="color: black;">状况</span>,如图:</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8VL8tOUZiaiaVicibkscERZFH7zHb2RnWUFgFpCmiaZ8sAoe44fPp4fh2Yp6Q/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">关于蜘蛛数量那里应该都看得懂,顺便解释下上面<span style="color: black;">表示</span>的低权重IP和权重IP(大神略过,<span style="color: black;">据述</span>科普)。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">据述</span>,百度创始<span style="color: black;">败兴</span>,对蜘蛛是有<span style="color: black;">归类</span>的,有的蜘蛛专门抓取<span style="color: black;">照片</span>,有的专门抓取视频,有的专门抓取内容。。。。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这些ip统<span style="color: black;">叫作</span>为低权重ip(我<span style="color: black;">亦</span>不<span style="color: black;">晓得</span>哪里听来的),专门抓取新站<span style="color: black;">或</span>是低质量的页面,新站<span style="color: black;">这里</span><span style="color: black;">时期</span>应该这个类型123.125.71.*的ip,来的频率会灰常多。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">123.125.71.95</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">123.125.71.97</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">123.125.71.117</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">123.125.71.71</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">123.125.71.106</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">那<span style="color: black;">倘若</span>是一个老网站,这个频率的ip<span style="color: black;">忽然</span><span style="color: black;">增多</span>,那就要<span style="color: black;">重视</span>了,很有可能在被K或是降权的边缘。。。。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这个ip上面<span style="color: black;">表示</span>隔日快照,意思<span style="color: black;">便是</span>被他抓取过的页面,不出意外<span style="color: black;">次日</span>都会被收录,<span style="color: black;">或</span>快照会有更新。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.95</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这些ip<span style="color: black;">叫作</span>为<span style="color: black;">所说</span>的“高权重ip”,即220.181.108.*,被<span style="color: black;">她们</span>抓取过的页面,收录速度和更新速度都会<span style="color: black;">火速</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.75</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.92</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.91</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.86</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.89</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.94</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.97</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.80</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.77</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">220.181.108.83</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">好的,ip段普及完了。。。。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">在来<span style="color: black;">瞧瞧</span>左侧,<span style="color: black;">能够</span>看到一大串代码段。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8VeyUwv9uqRxQMRic86BDoJtHy3Tn8aZmUEeLOk9iaubKD9tqqUzvKicEtg/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">其实<span style="color: black;">咱们</span>截取一个完整的字段是<span style="color: black;">这般</span>的:</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">123.125.71.12 - - GET /gzjysc/83.html HTTP/1.1 200 8274 - Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">一个个来看,都是什么意思。。。。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">123.125.71.12:<span style="color: black;">拜访</span>的ip;</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">07/May/2019:11:21:56 +0800:<span style="color: black;">拜访</span>的时间段;</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">GET /gzjysc/83.html:<span style="color: black;">拜访</span>的URL;</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">HTTP/1.1:</span>网站的请求协议;</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">200:网站返回状态码;</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">Baiduspider/2.0; +http://www.baidu.com/search/spider.html:<span style="color: black;">暗示</span>这是一个真实的百度蜘蛛;</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">ok,<span style="color: black;">基本</span>数据概念解释就到<span style="color: black;">这儿</span>,接下来<span style="color: black;">瞧瞧</span>从日志文件里能得到<span style="color: black;">那些</span>信息。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">运用</span>光年日志分析<span style="color: black;">工具</span><span style="color: black;">能够</span>得出以下信息:</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">蜘蛛抓取量,百度抓取最多,其次是搜狗,<span style="color: black;">发掘</span><span style="color: black;">这儿</span>居然<span style="color: black;">无</span>360的蜘蛛,去设置里添加个<span style="color: black;">360 Spider重新分析就<span style="color: black;">能够</span>了。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8VssDHdoAiaC9VluOIb2X42aVoqfgW1sQPbC5SibibbK3YtUPOvXf62RrAw/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">目录最多的抓取数量<span style="color: black;">表示</span>,<span style="color: black;">每一个</span>蜘蛛抓取的目录次数是<span style="color: black;">区别</span>的,<span style="color: black;">能够</span>看到百度对/spmn/目录抓取最多,其实这个<span style="color: black;">亦</span>不意外。。。<span style="color: black;">由于</span>这个目录页面的<span style="color: black;">重要</span>词排名最好!</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8VpxoAuTdyKA0icPZAUhXj5RUtzA2atHJQo9aJBA4cPgFw3sY7rhedqhQ/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8VLlYfYoZQ29mzNBicia6TsicWgtHzcqdZwcL72K4290ibH03hqX4BOxgG9g/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">另一</span>还<span style="color: black;">能够</span><span style="color: black;">瞧瞧</span>404页面,把这些404的url放在一个txt文档里,取名silian,上传到根目录提交到百度站长平台<span style="color: black;">就可</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8VtGFvSWbfUGbzyzHbj5guTcELSh4Biap4ETSLD2K0zZZlFRSwChhvuMg/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8VIMrOLSmVibNrmEQPicXicufs6bstxEntfmBVDmt5b76Hyiboy53IUqcKQQ/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">其它的数据<span style="color: black;">能够</span>自己<span style="color: black;">瞧瞧</span>哈!</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">以上的<span style="color: black;">办法</span><span style="color: black;">针对</span>分析普通的小<span style="color: black;">公司</span>站日志<span style="color: black;">已然</span>足够了,金花日志<span style="color: black;">工具</span><span style="color: black;">能够</span>满足大部分的需求。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">那<span style="color: black;">针对</span><span style="color: black;">有些</span>日志文件比<span style="color: black;">很强</span>,不适合用<span style="color: black;">工具</span>的<span style="color: black;">能够</span><span style="color: black;">运用</span>shell分析网站日志(以下纯属装X,<span style="color: black;">能够</span>略过)。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">首要</span>是打开日志文件。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8V4q83gs00SGTYO49FJfCeMibTaDffI8RdtgWpFFaVickh87qU3Fc4XIUQ/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">分析蜘蛛抓取最多的页面:</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">cat your.log | grep Baiduspider/2.0 | awk {print $7} | sort | uniq -c | sort -nr | head -10</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8V78zfhsN9d5icp3EH4FjXCMtoFkqjaCNRO79GTMYsvMzzvS5X9dWrVuA/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">能够</span>看出,还是/spmn这个页面抓取的最多。。。。接下来才是首页。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">找了些非200状态码的页面url:</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">cat test.log | grep Baiduspider/2.0 | awk {if($9!="200"){print $7,$9}} | sort | uniq -c | sort -nr</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><img src="https://mmbiz.qpic.cn/mmbiz_png/cHKWVdxERFxbLRjdhzj7CoypUOtY6l8VvtmAexhpRXgVJ21WBj4kLBiaO6PEJT4NpovQon78iagmpwS5ic88eRxTA/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1" style="width: 50%; margin-bottom: 20px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">能够</span>看到有<span style="color: black;">那些</span>404、304等等状态的页面。。。。<span style="color: black;">尤其</span>是找出404页面,<span style="color: black;">按照</span><span style="color: black;">以上</span><span style="color: black;">说到</span>的<span style="color: black;">办法</span>进行<span style="color: black;">处理</span>。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">忽然</span>想到鲁迅先生说的:数据本身是没卵用的,分析数据,才是价值所在。</span></p>
在遇到你之前,我对人世间是否有真正的圣人是怀疑的。 楼主继续加油啊!外链论坛加油! 软文发布论坛开幕式圆满成功。 http://www.fok120.com 论坛的成果是显著的,但我们不能因为成绩而沾沾自喜。 谷歌外贸网站优化技术。 论坛是一个舞台,让我们在这里尽情的释放自己。
页:
[1]