Python爬虫练习:爬取800多所大学学校排名、星级等
<h1 style="color: black; text-align: left; margin-bottom: 10px;">前言</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">国内大学最新排名,北大反超,浙大仅第四,中科大跌至第八</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">时隔五年,“双一流”大学即将迎来首次大考,这<span style="color: black;">亦</span>是继改变高校评断标准之后,<span style="color: black;">第1</span>次即将以官方对外发布,自然是引来了许多人的关注。<span style="color: black;">近期</span>,有许多<span style="color: black;">区别</span><span style="color: black;">公司</span>发布的国内高校排名,但彼此之间的差异很大,网友之间的争议<span style="color: black;">亦</span>很大。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">私信<span style="color: black;">博主</span>01<span style="color: black;">就可</span>获取<span style="color: black;">海量</span>Python学习资料</p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">项目<span style="color: black;">目的</span></h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">爬取高三网大学排名,并<span style="color: black;">保留</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">目的</span>网址</p><span style="color: black;">http</span>:<span style="color: black;">//m.gaosan.com/gaokao/265440.html</span>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/c3a572d89566430484b5347018cd5a03~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723904614&x-signature=pi3u79Sywr7NdToPv9KdeOtPBU0%3D" style="width: 50%; margin-bottom: 20px;"></div>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">基本环境配置</h1>python 3.6 pycharm<h1 style="color: black; text-align: left; margin-bottom: 10px;">爬虫代码</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">导入工具</p><span style="color: black;">import</span> requests
<span style="color: black;">import</span> parsel
<span style="color: black;">import</span> csv<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">请求网页数据</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/f775d27aa085441aacd7e32d05844ba2~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723904614&x-signature=4QOlfT77NiulFom8Qpr554GIuow%3D" style="width: 50%; margin-bottom: 20px;"></div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/67435f7b40ac49ca805efc4c5a1d5e1d~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723904614&x-signature=Gjzl4R20MVTVtJmM6EndQpcUb78%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"></p>url = <span style="color: black;">http://m.gaosan.com/gaokao/265440.html</span>
headers = {
<span style="color: black;">User-Agent</span>: <span style="color: black;">Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36</span>
}
response = requests.<span style="color: black;">get</span>(url=url, headers=headers)
response.encoding = response.apparent_encoding<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">爬取数据</p>selector = parsel.Selector(response.text)
trs = selector.css(<span style="color: black;">#page tr</span>)
<span style="color: black;">for</span> tr <span style="color: black;">in</span> trs:
dit = {}
ranking = tr.css(<span style="color: black;">td:nth-child(1)::text</span>).<span style="color: black;">get</span>()
dit[<span style="color: black;">名次</span>] = ranking
school = tr.css(<span style="color: black;">td:nth-child(2)::text</span>).<span style="color: black;">get</span>()
dit[<span style="color: black;">学校名<span style="color: black;">叫作</span></span>] = school
score = tr.css(<span style="color: black;">td:nth-child(3)::text</span>).<span style="color: black;">get</span>()
dit[<span style="color: black;">综合得分</span>] = score
star = tr.css(<span style="color: black;">td:nth-child(4)::text</span>).<span style="color: black;">get</span>()
dit[<span style="color: black;">星级排名</span>] = star
level = tr.css(<span style="color: black;">td:nth-child(5)::text</span>).<span style="color: black;">get</span>()
dit[<span style="color: black;">办学层次</span>] = level
csv_writer.writerow(dit)<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/dd5892d0e51745249c8fa4522dd36543~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723904614&x-signature=zlozdvn4ppHSV50OVmfcXNtPOyY%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">保留</span>数据</p>f = <span style="color: black;">open</span>(<span style="color: black;">排名.csv</span>, mode=<span style="color: black;">a</span>, encoding=<span style="color: black;">utf-8</span>, newline=)
csv_writer = csv.DictWriter(f, fieldnames=[<span style="color: black;">名次</span>, <span style="color: black;">学校名<span style="color: black;">叫作</span></span>, <span style="color: black;">综合得分</span>, <span style="color: black;">星级排名</span>, <span style="color: black;">办学层次</span>])
f.<span style="color: black;">close</span>()<h1 style="color: black; text-align: left; margin-bottom: 10px;">运行代码,效果如下图</h1>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/fb7707b2061b42b7a47eafde1b24b077~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723904614&x-signature=BD8X8DXNJ3kZbIowEOC3mOyXVMA%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/06c1116f84574ff48b069015d90c6783~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723904614&x-signature=QnkzINfZrpa1FSD4ctVEg%2BCe1L4%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/c765abf9502f47e5ab02a2e8532bb12f~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723904614&x-signature=0ZmUgL3R0APCP5fUtNNUar6cIBI%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"></p>
顶楼主,说得太好了!
页:
[1]