仅拍125个视频做为千万级网红?Python:看视频都在拍些什么?
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/fdea463408634ca99f24176e6993612f~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=6C%2FeNQ3m5klTBQhOAFEZdn%2BRsbo%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">CDA数据分析师 出品 </strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">作者:Mika</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">数据:真达 </strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">后期:Mika、泽龙</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">Show me data,用数<span style="color: black;">据述</span>话</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">今天<span style="color: black;">咱们</span>聊一聊 李子柒</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">“李家有女,人<span style="color: black;">叫作</span>子柒。”<span style="color: black;">倘若</span>说到当下最火的网红,想必<span style="color: black;">非常多</span>人都会想到李子柒。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">日出而作,日落而息,看似平淡无奇的日子,李子柒却总能过成一首诗、一幅画。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">私信<span style="color: black;">博主</span>001<span style="color: black;">就可</span>获取<span style="color: black;">海量</span>Python学习资料!</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/dfic-imagehandler/0474ebbf-48af-457f-9b19-ad581773bc27~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=HoAlrx0KVMeIPXfixECH34hkUy0%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">三月桃花熟了,采来酿桃花酒。四月枇杷成熟,酿枇杷酒…随着<span style="color: black;">区别</span>时令季节,做出<span style="color: black;">区别</span>的美食,看过李子柒视频的人,无一不对那视频里的古风田园生活向往憧憬着,<span style="color: black;">同期</span><span style="color: black;">亦</span>带给了无数人治愈的力量。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/64c15990dedc4d4096efd53124debabc~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=HF1z31k0qFChoFBztOIP4R2I1aI%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">日前</span>在B站上,李子柒共有579万的粉丝。入驻到<span style="color: black;">此刻</span>仅仅发布了共125条视频,但随便翻翻视频列表,几乎<span style="color: black;">每一个</span>视频都是爆款。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/d6f06da563e24d1a956b8360b30fd89d~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=jZrIGiVWX279cvSLyKXvzD4eQVo%3D" style="width: 50%; margin-bottom: 20px;"></div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/7d11f9e175a94879868a4f9110a31bf1~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=C%2BAdMPBXtfOem2JLnuzFWSP%2BYSc%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">那样</span>,她的这些视频都有些什么特点,播放量最高的视频是哪个?今天<span style="color: black;">咱们</span>就带你用数据来<span style="color: black;">诠释</span>李子柒。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">01</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">“把生活过成诗” </strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">李子柒的视频<span style="color: black;">为何</span>这么吸引人?</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span>用Python对李子柒在B站上发布的125个视频进行了分析。分析流程<span style="color: black;">包含</span>以下这三个<span style="color: black;">过程</span>:</p>数据读入数据清洗数据可视化<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">首要</span>让<span style="color: black;">咱们</span>看到分析结果:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">视频各年发布数量</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">先看到李子柒在B站上各年发布的视频数量。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/937f79fb051e4fb5a50fabee1f9c340c~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=Qdm%2ByYR9kafHgDOQTIHUym4RHLg%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">从2016年7月<span style="color: black;">起始</span>,李子柒在B站发布了<span style="color: black;">第1</span>个视频。<span style="color: black;">按照</span>统计,在2016年她共发布了14个视频。2017-2019年这三年发布的视频数量差不多,都是在34条<span style="color: black;">上下</span>,平均下来<span style="color: black;">每一个</span>月发布2.8个视频。截止到<span style="color: black;">日前</span>为止,在2020年发布了8个视频。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">视频各月发布数量</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">大众</span>都说在李子柒的视频中<span style="color: black;">能够</span>感受到一年四季的变化,<span style="color: black;">那样</span>她在<span style="color: black;">那些</span>月份发布的视频最多呢?</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/0ce9d9d6ae1344e39386e696c813cfee~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=JbxEeYlr4SboNPuTsXtu8%2BDPR5k%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">分析<span style="color: black;">发掘</span>,其中夏季的视频<span style="color: black;">显著</span>高于其他季节,<span style="color: black;">尤其</span>是8月份,在<span style="color: black;">所有</span>125个视频中就有26个视频在8月发布,占比20%。其次,秋季<span style="color: black;">亦</span>是李子柒视频高产的季节,9-11月共发布36个视频。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">视频发布时间线</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在视频发布时间上有什么特点呢?</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/07d9336e15bf45b0929519f2228da9f1~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=TBf67wgjPII0ockfv5z78L9L0YY%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"> <span style="color: black;">经过</span>对李子柒视频发布时间线的分析,<span style="color: black;">咱们</span><span style="color: black;">发掘</span>有四个发布视频的高峰时间,分别是<span style="color: black;">晌午</span>12点,下午4点,下午6点,以及<span style="color: black;">夜晚</span>9点。其中<span style="color: black;">夜晚</span>九点发布的视频最多,共有14个。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">视频类型占比</strong></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/e51801652d784107ae0b54a021526df7~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=p1gWcZ%2BYpodRdgck0%2B%2Bcx54MOjk%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在视频类型上,当然美食是最多的了,占比高达87.2%。其次是手工类型的视频,占比12%。最少的是美妆视频,在<span style="color: black;">日前</span>发布的125个视频仅有1个是美妆类型的。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">视频排行榜表现</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在所有125个视频中,有72个登上了B站日排行榜。其中进入前10名的共有7个,其次是50-100名的,有12个视频。10-50名和100名以上的视频最多,均为53个。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">各类视频数据平均表现</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">下面再看到李子柒视频各类数据平均表现。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/4aa0c0dba0a340c68715ca2fdd870761~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=AH4dpiB1TEryotrbwl3HnHuzz6o%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">其中视频的平均弹幕数为8361条,点赞数为52965个,投币为32690个。<span style="color: black;">保藏</span>数为8455个,平均转发为5652次。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">那些</span>视频播放量最高?<span style="color: black;">那些</span>视频弹幕互动最多呢?</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">播放量TOP10视频</strong></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/7b6f18aed49449bb9608bb76657d7d1a~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=%2F7zJ22n%2FPJ7sTYNuXqT0uCOS8vY%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">让<span style="color: black;">咱们</span>分别<span style="color: black;">瞧瞧</span>,<span style="color: black;">首要</span>是播放量最高视频top10榜单,播放最多的视频是《听说爱吃螺蛳粉的<span style="color: black;">伴侣</span>,都很可爱阿!》,播放量达到了526万余次。看来螺蛳粉果然是妥妥的<span style="color: black;">百姓</span>级网红小吃啊。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这个视频的弹幕中都在说些什么?</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/f18ed1b0b028417f9a5b3be505bb20ec~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=oyyJUmJkWE02YGNw95%2FcAG5nkWU%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">能够</span>看到,在播放量最高的螺蛳粉视频中,弹幕中讨论频率最高的<span style="color: black;">便是</span><span style="color: black;">各样</span>食材啦,<span style="color: black;">例如</span>"田螺"、"螺蛳"、"豆角"、"辣椒"、"豇豆"等等。还有"广西"这个螺蛳粉的原产地<span style="color: black;">亦</span>被提及。有意思的是,<span style="color: black;">一样</span>拍了螺蛳粉主题的美食区up主<span style="color: black;">例如</span>"蛋黄派"<span style="color: black;">亦</span>在弹幕中被<span style="color: black;">说到</span>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">弹幕数TOP10视频</strong></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/09b4d44a16654cd68415ddacd6387a0d~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=%2BY2Y2VlgZ%2Bvu0lan%2FxPi9gGDbhw%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">而后</span>是弹幕最多视频top10,弹幕最多的视频是《<span style="color: black;">因此</span>这个视频就叫辣椒的<span style="color: black;">一辈子</span>》总弹幕数达到4万余条。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这个视频的弹幕中都在说些什么?</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/286bc28ea6164bd69b3ca0e91fd60828~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=yI3xzZs1z3QJeA1d5DJbFBSf65c%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这个视频的弹幕<span style="color: black;">尤其</span>有意思,弹幕中讨论最多的<span style="color: black;">便是</span><span style="color: black;">各样</span>许愿<span style="color: black;">关联</span>的词了,<span style="color: black;">例如</span>"上岸"、"考上"、"考研"、"成功"、"顺利"、"加油"等词,被<span style="color: black;">说到</span>的频率最高。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span>再分析下李子柒的视频标题,她的视频标题比较有特点,基本都是【关键词】+简单描述。<span style="color: black;">例如</span>:【小麦的<span style="color: black;">一辈子</span>】一株小麦,变化出扎根在<span style="color: black;">每一个</span>人记忆里的味道。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/aa1a8dcb47d546c2bf9c1f8fb0d9b624~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=vch5T8a8rE4Q6fQ2JNHRXejgpAQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">视频标题关键词词云</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span>先看到关键词的词云特点,<span style="color: black;">能够</span>看到关键词中除了"李子柒","桃花"、"腊味"、"豌豆"都是<span style="color: black;">显现</span>频率<span style="color: black;">尤其</span>高的食材。<span style="color: black;">同期</span>"手工"<span style="color: black;">亦</span>是高频词。其次某种食材的"<span style="color: black;">一辈子</span>",<span style="color: black;">亦</span>是李子柒热衷拍摄的主题。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/pgc-image/98ea09cc7bd04d50bb2d009c39719705~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=jsWacduy9j7riQEIqzzaHkH2mkI%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">视频标题描述词云</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">那样</span>视频标题描述上有什么特点呢?</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/24c8893e0f684338942bc52b63fbcc24~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=KUxgMgD3dByW8wEnFg77f3LBAeE%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">分析<span style="color: black;">发掘</span>"味道"<span style="color: black;">显现</span>的频率最高,远远超出其他词。其次,"夏天"、"千年"、"家里"、"记忆里"等词<span style="color: black;">亦</span>频频<span style="color: black;">显现</span>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">02</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">教你用Python分析 </strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">李子柒的视频都在拍些什么?</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">下面让<span style="color: black;">咱们</span>看到关键的分析<span style="color: black;">过程</span>:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span>Python获取了B站上李子柒发布的125个视频<span style="color: black;">关联</span>信息,进行了以下分析,分析流程如下:</p>数据读入数据清洗数据可视化<h2 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">数据读入</strong></h2>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">首要</span>读入分析所用的数据集,本数据集一共<span style="color: black;">包括</span>125个样本,11个字段,字段含义为:视频标题、一级<span style="color: black;">归类</span>、二级<span style="color: black;">归类</span>、发布时间、最高全站排名、总播放数、历史累计弹幕、点赞、投币、收藏和分享数。数据预览如下:</p><span style="color: black;"># 导入包</span>
<span style="color: black;">import</span> numpy <span style="color: black;">as</span> np
<span style="color: black;">import</span> pandas <span style="color: black;">as</span> pd
<span style="color: black;">import</span> re
<span style="color: black;"># 读入数据</span>
df = pd.read_excel(<span style="color: black;">./data/李子柒视频数据.xlsx</span>)
df.head() <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/b9ab4d87d70045c2a22e6493c8aedf86~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=sFC90QgPLdBb%2FLo3xyoItqUo0No%3D" style="width: 50%; margin-bottom: 20px;"></div>
<h2 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">数据清洗</strong></h2>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">此部分<span style="color: black;">咱们</span>初步对以下信息进行简单的处理,其中<span style="color: black;">包括</span>:</p>title:提取主题和介绍top_rank:提取数值view_num:提取数值dm_num: 提取数值dianzan: 计算数值toubi: 计算数值shoucang:计算数值zhuanfa:计算数值<span style="color: black;"># 定义转换函数</span>
<span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">transform_num</span><span style="color: black;">(x)</span>:</span>
str1 = str(x)
<span style="color: black;">if</span> <span style="color: black;">万</span> <span style="color: black;">in</span> str1:
<span style="color: black;">return</span> float(str1.strip(<span style="color: black;">万</span>))*<span style="color: black;">10000</span>
<span style="color: black;">else</span>:
<span style="color: black;">return</span> float(str1)
<span style="color: black;"># 提取数据</span>
df[<span style="color: black;">title_1</span>] = df.title.str.extract(<span style="color: black;">【(.*?)】.*</span>)
df[<span style="color: black;">title_2</span>] = df.title.str.split(<span style="color: black;">】</span>).str[<span style="color: black;">-1</span>]
df[<span style="color: black;">top_rank</span>] = df.top_rank.str.extract(<span style="color: black;">最高全站日排行(\d+)名</span>)
df[<span style="color: black;">view_num</span>] = df.view_num.str.extract(<span style="color: black;">(\d+)</span>)
df[<span style="color: black;">dm_num</span>] = df.dm_num.str.extract(<span style="color: black;">(\d+)</span>)
df[<span style="color: black;">dianzan</span>] = df.dianzan.apply(<span style="color: black;">lambda</span> x: transform_num(x))
df[<span style="color: black;">toubi</span>] = df.toubi.apply(<span style="color: black;">lambda</span> x: transform_num(x))
df[<span style="color: black;">shoucang</span>] = df.shoucang.apply(<span style="color: black;">lambda</span> x: transform_num(x))
df[<span style="color: black;">zhuanfa</span>] = df.zhuanfa.apply(<span style="color: black;">lambda</span> x: transform_num(x))
<span style="color: black;"># 转换类型</span>
df[<span style="color: black;">view_num</span>] = df.view_num.astype(<span style="color: black;">int</span>)
df[<span style="color: black;">dm_num</span>] = df.dm_num.astype(<span style="color: black;">int</span>)
df[<span style="color: black;">publish_time</span>] = pd.to_datetime(df[<span style="color: black;">publish_time</span>])<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">经过处理之后的数据如下所示:</p><span style="color: black;">df</span><span style="color: black;">.head</span>(2) <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/0d32c6a7f2944e52992f6c9bb42e7dd3~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723895246&x-signature=11Egn%2BUJWGZYL7C7jifgRGceWfQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<h2 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">数据可视化</strong></h2>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">此处<span style="color: black;">咱们</span>将进行以下部分的可视化分析,<span style="color: black;">首要</span>导入所需包,其中pyecharts用于绘制动态可视化图形,stylecloud包用于绘制词云图。关键部分代码如下:</p><span style="color: black;"># 导出所需包</span>
<span style="color: black;">from</span> pyecharts.charts <span style="color: black;">import</span> Pie, Line, Tab, Map, Bar, WordCloud, Page
<span style="color: black;">from</span> pyecharts <span style="color: black;">import</span> options <span style="color: black;">as</span>opts<span style="color: black;">from</span> pyecharts.globals <span style="color: black;">import</span> SymbolType
<span style="color: black;">import</span> stylecloud
<h3 style="color: black; text-align: left; margin-bottom: 10px;">视频各年发布数量</h3><span style="color: black;"># 发布数量</span>
<span style="color: black;">pub_year</span> = df.publish_time.dt.year.value_counts().sort_index()
<span style="color: black;"># 条形图</span>bar0 = Bar(init_opts=opts.InitOpts(width=<span style="color: black;">1350px</span>, height=<span style="color: black;">750px</span>))
bar0.add_xaxis(pub_year.index.tolist())
bar0.add_yaxis(, pub_year.values.tolist())
bar0.set_global_opts(title_opts=opts.TitleOpts(title=<span style="color: black;">B站李子柒视频各年发布数量</span>),
visualmap_opts=opts.VisualMapOpts(max_=<span style="color: black;">50</span>),
)
bar0.render() <h3 style="color: black; text-align: left; margin-bottom: 10px;">视频各月发布数量</h3><span style="color: black;">pub_month</span>= df.publish_time.dt.month.value_counts().sort_index()<span style="color: black;"># 条形图</span>
bar = Bar(init_opts=opts.InitOpts(width=<span style="color: black;">1350px</span>, height=<span style="color: black;">750px</span>))
bar.add_xaxis()
bar.add_yaxis(, pub_month.values.tolist())
bar.set_global_opts(title_opts=opts.TitleOpts(title=<span style="color: black;">B站李子柒视频各月发布数量</span>),
visualmap_opts=opts.VisualMapOpts(max_=<span style="color: black;">30</span>),
)
bar.render() <h3 style="color: black; text-align: left; margin-bottom: 10px;">视频发布时间线</h3><span style="color: black;"># 发布时间点分布</span>pub_hour = df.publish_time.dt.hour.value_counts().sort_index()<span style="color: black;"># 产生数据</span>
x1_line1 =
y1_line1 = pub_hour.values.tolist()
<span style="color: black;"># 绘制面积图</span>line1 = Line(init_opts=opts.InitOpts(width=<span style="color: black;">1350px</span>, height=<span style="color: black;">750px</span>))
line1.add_xaxis(x1_line1)
line1.add_yaxis(, y1_line1,
markpoint_opts=opts.MarkPointOpts(<span style="color: black;">data</span>=[
opts.MarkPointItem(type_=<span style="color: black;">max</span>, <span style="color: black;">name</span>=<span style="color: black;">最大值</span>),
opts.MarkPointItem(type_=<span style="color: black;">min</span>, <span style="color: black;">name</span>=<span style="color: black;">最小值</span>)
]))
line1.set_global_opts(title_opts=opts.TitleOpts(<span style="color: black;">B站李子柒视频日发布时间线</span>),
visualmap_opts=opts.VisualMapOpts(max_=<span style="color: black;">20</span>)
)
line1.set_series_opts(label_opts=opts.LabelOpts(is_show=<span style="color: black;">False</span>),
linestyle_opts=opts.LineStyleOpts(width=<span style="color: black;">3</span>))
line1.render()<h3 style="color: black; text-align: left; margin-bottom: 10px;">发布视频类型占比</h3><span style="color: black;"># 视频类型占比</span>
<span style="color: black;">cat_num</span> = df.cat2.value_counts()
<span style="color: black;"># 产生数据对</span>data_pair = <span style="color: black;"># 绘制饼图</span>
<span style="color: black;"># {a}(系列名<span style="color: black;">叫作</span>),{b}(数据项名<span style="color: black;">叫作</span>),{c}(数值), {d}(百分比)</span>
pie1 = Pie(init_opts=opts.InitOpts(width=<span style="color: black;">1350px</span>, height=<span style="color: black;">750px</span>))
pie1.add(, data_pair=data_pair, radius=[<span style="color: black;">35%</span>, <span style="color: black;">60%</span>])
pie1.set_global_opts(title_opts=opts.TitleOpts(title=<span style="color: black;">B站李子柒发布视频类型占比</span>),
legend_opts=opts.LegendOpts(orient=<span style="color: black;">vertical</span>, pos_top=<span style="color: black;">15%</span>, pos_left=<span style="color: black;">2%</span>))
pie1.set_series_opts(label_opts=opts.LabelOpts(formatter=<span style="color: black;">"{b}:{d}%"</span>))
pie1.render()<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">有排名数据视频表现</p>top_rank_num = df.top_rank.dropna().astype(<span style="color: black;">int</span>)
cut_bins = [<span style="color: black;">1</span>,<span style="color: black;">10</span>,<span style="color: black;">30</span>,<span style="color: black;">50</span>,<span style="color: black;">100</span>]
top_num = pd.cut(top_rank_num, bins=cut_bins, labels=[<span style="color: black;">前10名</span>, <span style="color: black;">10-30名</span>, <span style="color: black;">30-50名</span>, <span style="color: black;">50-100名</span>]).value_counts()
<span style="color: black;"># 数据对</span>
data_pair_2 =
# 饼图
pie2</span>= Pie(init_opts=opts.InitOpts(width=<span style="color: black;">1350px</span>, height=<span style="color: black;">750px</span>))
pie2.<span style="color: black;">add</span>(, data_pair=data_pair_2, radius=[<span style="color: black;">35%</span>, <span style="color: black;">60%</span>])
pie2.set_global_opts(title_opts=opts.TitleOpts(title=<span style="color: black;">B站李子柒有排名数据视频表现</span>),
legend_opts=opts.LegendOpts(orient=<span style="color: black;">vertical</span>, pos_top=<span style="color: black;">15%</span>, pos_left=<span style="color: black;">2%</span>))
pie2.set_series_opts(label_opts=opts.LabelOpts(formatter=<span style="color: black;">"{b}:数量:{c}\n占比:{d}%"</span>))
pie2.render()
<h3 style="color: black; text-align: left; margin-bottom: 10px;">视频各类数据平均表现</h3><span style="color: black;">df_num</span> = df[[<span style="color: black;">view_num</span>, <span style="color: black;">dm_num</span>, <span style="color: black;">dianzan</span>, <span style="color: black;">toubi</span>, <span style="color: black;">shoucang</span>, <span style="color: black;">zhuanfa</span>]].mean()
<span style="color: black;"># 条形图</span>bar3 = Bar(init_opts=opts.InitOpts(width=<span style="color: black;">1350px</span>, height=<span style="color: black;">750px</span>))
bar3.add_xaxis([<span style="color: black;">弹幕数</span>, <span style="color: black;">点赞数</span>, <span style="color: black;">投币数</span>, <span style="color: black;"><span style="color: black;">保藏</span>数</span>, <span style="color: black;">转发数</span>])
bar3.add_yaxis(, df_num.values.tolist()[<span style="color: black;">1</span>:])
bar3.set_global_opts(title_opts=opts.TitleOpts(title=<span style="color: black;">B站李子柒视频各类数据平均表现</span>),
visualmap_opts=opts.VisualMapOpts(max_=<span style="color: black;">50000</span>),
)
bar3.render() <h3 style="color: black; text-align: left; margin-bottom: 10px;">播放数Top10视频</h3><span style="color: black;"># 最多播放top10</span>
view_top10 = df.sort_values(<span style="color: black;">view_num</span>, <span style="color: black;">ascending</span>=False).head(<span style="color: black;">10</span>)[[<span style="color: black;">title</span>, <span style="color: black;">view_num</span>]]
view_top10 = view_top10.sort_values(<span style="color: black;">view_num</span>)
<span style="color: black;"># 柱形图</span>bar1 = Bar(init_opts=opts.InitOpts(width=<span style="color: black;">1350px</span>, height=<span style="color: black;">750px</span>))
bar1.add_xaxis(view_top10.title.values.tolist())
bar1.add_yaxis(, view_top10.view_num.values.tolist())
bar1.set_global_opts(title_opts=opts.TitleOpts(title=<span style="color: black;">B站李子柒播放数Top10视频</span>),
yaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(position=<span style="color: black;">inside</span>)),
visualmap_opts=opts.VisualMapOpts(max_=<span style="color: black;">3000000</span>),
)
bar1.set_series_opts(label_opts=opts.LabelOpts(position=<span style="color: black;">right</span>))
bar1.reversal_axis()
bar1.render() <h3 style="color: black; text-align: left; margin-bottom: 10px;">弹幕数Top10视频</h3><span style="color: black;"># 弹幕最多top10</span>
dm_top10 = df.sort_values(<span style="color: black;">dm_num</span>, <span style="color: black;">ascending</span>=False).head(<span style="color: black;">10</span>)[[<span style="color: black;">title</span>, <span style="color: black;">dm_num</span>]]
dm_top10 = dm_top10.sort_values(<span style="color: black;">"dm_num"</span>)
<span style="color: black;"># 柱形图</span>
bar2 = Bar(init_opts=opts.InitOpts(width=<span style="color: black;">1350px</span>, height=<span style="color: black;">750px</span>))
bar2.add_xaxis(dm_top10.title.values.tolist())
bar2.add_yaxis(, dm_top10.dm_num.values.tolist())
bar2.set_global_opts(title_opts=opts.TitleOpts(title=<span style="color: black;">B站李子柒弹幕数Top10视频</span>),
visualmap_opts=opts.VisualMapOpts(max_=<span style="color: black;">40999</span>),
)
bar2.set_series_opts(label_opts=opts.LabelOpts(position=<span style="color: black;">right</span>))
bar2.reversal_axis()
bar2.render()<h3 style="color: black; text-align: left; margin-bottom: 10px;">视频标题词云图</h3><span style="color: black;">import</span> stylecloud
stylecloud.gen_stylecloud(text=.join(word_num_selected), <span style="color: black;">#text<span style="color: black;">必须</span>是str类型</span>
palette=<span style="color: black;">tableau.Tableau_10</span>,
collocations=<span style="color: black;">False</span>,
font_path=<span style="color: black;">rC:\Windows\Fonts\msyh.ttc</span>, <span style="color: black;"># 字体</span>
icon_name=<span style="color: black;">fas fa-heart</span>,
size=<span style="color: black;">768</span>,
output_name=<span style="color: black;">李子柒视频标题词云图.png</span> <span style="color: black;"># 生成<span style="color: black;">照片</span></span>
)
seo常来的论坛,希望我的网站快点收录。
页:
[1]