lbk60ox 发表于 2024-11-1 22:57:28

Python爬虫之网站超清照片爬取


    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">缺不缺好看的桌面呢?这边来爬取网站超清<span style="color: black;">照片</span>吧</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">这次</span>爬虫用到的网址是:</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">http://www.netbian.com/index.htm: 彼岸桌面.里面有<span style="color: black;">非常多</span>的好看壁纸,<span style="color: black;">况且</span>都是<span style="color: black;">能够</span>下载高清无损的,还比较不错,<span style="color: black;">因此</span>我就拿这个网站练练手。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">做为</span>一个初学者,刚<span style="color: black;">起始</span>的时候,无论的代码的质量<span style="color: black;">怎样</span>,总之代码只要能够被正确完整的运行那就很能够让自己开心的,如同<span style="color: black;">咱们</span>的游戏<span style="color: black;">同样</span>,能在<span style="color: black;">短期</span>内得到正向的反馈,<span style="color: black;">咱们</span>就会更有兴趣去玩。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">学习<span style="color: black;">亦</span>是如此,只要<span style="color: black;">咱们</span>能够在短期内得到学习带来的反馈,<span style="color: black;">那样</span><span style="color: black;">咱们</span>的<span style="color: black;">针对</span>学习的欲望<span style="color: black;">亦</span>是<span style="color: black;">剧烈</span>的。能够完整的完整此次爬虫程序的编写,那便是一个最大的收货,但其实我<span style="color: black;">这里</span>次过程中的收获远不止此。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">好的代码其实应该<span style="color: black;">拥有</span>以下特性</span></strong></p><span style="color: black;">能够满足最关键的<span style="color: black;">需要</span></span><span style="color: black;">容易理解</span><span style="color: black;">有充分的注释</span><span style="color: black;"><span style="color: black;">运用</span>规范的命名</span><span style="color: black;"><span style="color: black;">无</span><span style="color: black;">显著</span>的安全问题</span><span style="color: black;">经过充分的测试</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">就以充分的测试为例,经常写代码的就应该<span style="color: black;">晓得</span>,尽管<span style="color: black;">都数</span>时候你的代码<span style="color: black;">无</span>BUG,但那仅仅说明只是大<span style="color: black;">都数</span><span style="color: black;">状况</span>下是稳定的,<span style="color: black;">然则</span>在某些<span style="color: black;">要求</span>下就会出错(达到出错<span style="color: black;">要求</span>,存在<span style="color: black;">规律</span>问题的时候等)。这是肯定的。至于什么<span style="color: black;">原由</span>,<span style="color: black;">区别</span>的代码有<span style="color: black;">区别</span>的<span style="color: black;">原由</span>。<span style="color: black;">倘若</span>代码程序都是一次就能完善的,<span style="color: black;">那样</span><span style="color: black;">咱们</span><span style="color: black;">运用</span>的软件的软件就不会经常更新了。其他其中的道理就不一 一道说了,久而自知。</span></p>好的代码<span style="color: black;">通常</span><span style="color: black;">拥有</span>的5大特性<span style="color: black;">1</span><span style="color: black;">.便于<span style="color: black;">守护</span></span>
    <span style="color: black;">2</span><span style="color: black;">.可复用</span>
    <span style="color: black;">3</span><span style="color: black;">.可扩展</span>
    <span style="color: black;">4</span><span style="color: black;">.强灵活性</span>
    <span style="color: black;">5</span><span style="color: black;">.健壮性</span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">经过我的代码运行我<span style="color: black;">发掘</span>时间<span style="color: black;">繁杂</span>度比<span style="color: black;">很强</span>,<span style="color: black;">因此呢</span>这是我将要改进的<span style="color: black;">地区</span>,但<span style="color: black;">亦</span>不止于此。<span style="color: black;">亦</span>有<span style="color: black;">非常多</span>利用得不<span style="color: black;">恰当</span>的<span style="color: black;">地区</span>,至于存在的不足的<span style="color: black;">地区</span>就待我慢慢<span style="color: black;">提高</span>改进吧!</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">路过的大佬欢迎留下您宝贵的代码修改意见,</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">完整代码如下</span></strong></p><span style="color: black;">import</span> os
    <span style="color: black;">import</span> bs4
    <span style="color: black;">import</span> re
    <span style="color: black;">import</span> time
    <span style="color: black;">import</span> requests
    <span style="color: black;">from</span> bs4 <span style="color: black;">import</span> BeautifulSoup

    <span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">getHTMLText</span><span style="color: black;">(url, headers)</span>:</span>
    <span style="color: black;">"""向<span style="color: black;">目的</span>服务器发起请求并返回响应"""</span>
    <span style="color: black;">try</span>:
    r = requests.get(url=url, headers=headers)
    r.encoding = r.apparent_encoding
    soup = BeautifulSoup(r.text,<span style="color: black;">"html.parser"</span>)
    <span style="color: black;">return</span> soup
    <span style="color: black;">except</span>:
    <span style="color: black;">return</span> <span style="color: black;">""</span>

    <span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">CreateFolder</span><span style="color: black;">()</span>:</span>
    <span style="color: black;">"""创建存储数据文件夹"""</span>
    flag = <span style="color: black;">True</span>
    <span style="color: black;">while</span> flag == <span style="color: black;">1</span>:
    file = input(<span style="color: black;">"请输入<span style="color: black;">保留</span>数据文件夹的名<span style="color: black;">叫作</span>:"</span>)
    <span style="color: black;">if</span> <span style="color: black;">not</span>os.path.exists(file):
    os.mkdir(file)
    flag =<span style="color: black;">False</span>
    <span style="color: black;">else</span>:
    print(<span style="color: black;">该文件已存在,请重新输入</span>)
    flag = <span style="color: black;">True</span>

    <span style="color: black;"># os.path.abspath(file) 获取文件夹的绝对路径</span>path = os.path.abspath(file) +<span style="color: black;">"\\"</span>
    <span style="color: black;">return</span> path

    <span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">fillUnivList</span><span style="color: black;">(ulist, soup)</span>:</span>
    <span style="color: black;">"""获取每一张<span style="color: black;">照片</span>的原图页面"""</span>
    <span style="color: black;"># 使得<span style="color: black;">得到</span>的ul是 &lt;class bs4.BeautifulSoup&gt; 类型</span>
    div = soup.find_all(<span style="color: black;">div</span>, <span style="color: black;">list</span>)[<span style="color: black;">0</span>]
    <span style="color: black;">for</span> a <span style="color: black;">in</span> div(<span style="color: black;">a</span>):
    <span style="color: black;">if</span> isinstance(a, bs4.element.Tag):
    hr = a.attrs[<span style="color: black;">href</span>]
    href = re.findall(<span style="color: black;">r/desk/\d{4}.htm</span>, hr)
    <span style="color: black;">if</span> bool(href) == <span style="color: black;">True</span>:
    ulist.append(href[<span style="color: black;">0</span>])

    <span style="color: black;">return</span> ulist

    <span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">DownloadPicture</span><span style="color: black;">(left_url,list,path)</span>:</span>
    <span style="color: black;">for</span> right <span style="color: black;">in</span> list:
    url = left_url + right
    r = requests.get(url=url, timeout=<span style="color: black;">10</span>)
    r.encoding = r.apparent_encoding
    soup = BeautifulSoup(r.text,<span style="color: black;">"html.parser"</span>)
    tag = soup.find_all(<span style="color: black;">"p"</span>)
    <span style="color: black;"># 获取img标签的alt属性,给<span style="color: black;">保留</span><span style="color: black;">照片</span>命名</span>
    name = tag[<span style="color: black;">0</span>].a.img.attrs[<span style="color: black;">alt</span>]
    img_name = name + <span style="color: black;">".jpg"</span>
    <span style="color: black;"># 获取<span style="color: black;">照片</span>的信息</span>
    img_src = tag[<span style="color: black;">0</span>].a.img.attrs[<span style="color: black;">src</span>]
    <span style="color: black;">try</span>:
    img_data = requests.get(url=img_src)
    <span style="color: black;">except</span>:
    <span style="color: black;">continue</span>img_path = path + img_name<span style="color: black;">with</span> open(img_path,<span style="color: black;">wb</span>) <span style="color: black;">as</span> fp:
    fp.write(img_data.content)
    print(img_name, <span style="color: black;">" ******下载完成!"</span>)

    <span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">PageNumurl</span><span style="color: black;">(urls)</span>:</span>
    num = int(input(<span style="color: black;">"请输入爬取所到的页码数:"</span>))
    <span style="color: black;">for</span> i <span style="color: black;">in</span> range(<span style="color: black;">2</span>,num+<span style="color: black;">1</span>):
    u = <span style="color: black;">"http://www.netbian.com/index_"</span> + str(i) + <span style="color: black;">".htm"</span>
    urls.append(u)

    <span style="color: black;">return</span> urls


    <span style="color: black;">if</span> __name__ == <span style="color: black;">"__main__"</span>:
    uinfo = []
    left_url =<span style="color: black;">"http://www.netbian.com"</span>
    urls = [<span style="color: black;">"http://www.netbian.com/index.htm"</span>]
    headers = {
    <span style="color: black;">"User-Agent"</span>: <span style="color: black;">"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"</span>
    }
    start = time.time()
    <span style="color: black;"># 1.创建<span style="color: black;">保留</span>数据的文件夹</span>
    path = CreateFolder()
    <span style="color: black;"># 2. 确定要爬取的页面数并返回每一页的链接</span>
    PageNumurl(urls)
    n = int(input(<span style="color: black;">"<span style="color: black;">拜访</span>的<span style="color: black;">初始</span>页面:"</span>))
    <span style="color: black;">for</span> i <span style="color: black;">in</span> urls:
    <span style="color: black;"># 3.获取每一个页面的首页数据文本</span>
    soup = getHTMLText(i, headers)
    <span style="color: black;"># 4.<span style="color: black;">拜访</span>原图所在页链接并返回<span style="color: black;">照片</span>的链接</span>page_list = fillUnivList(uinfo, soup)<span style="color: black;"># 5.下载原图</span>
    DownloadPicture(left_url, page_list, path)

    print(<span style="color: black;">"<span style="color: black;">所有</span>下载完成!"</span>, <span style="color: black;">"共"</span> + str(len(os.listdir(path))) + <span style="color: black;">"张<span style="color: black;">照片</span>"</span>)
    end = time.time()
    print(<span style="color: black;">"共耗时"</span> + str(end-start) + <span style="color: black;">"秒"</span>)<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">运行</span></strong></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/pgc-image/1012d6827e7e4e50b307e9d668ffffb7~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1729728483&amp;x-signature=o4urO8nN5EYglQlkMek1O2hUURU%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">部分展示结果如下:</span></strong></p>
    <div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/6a8344b821d74ef4ab62594e11eeb2f6~noop.image?_iz=58558&amp;from=article.pc_detail&amp;lk3s=953192f4&amp;x-expires=1729728483&amp;x-signature=zrqKvSgpps8vkgPNKlhcYFdzkfE%3D" style="width: 50%; margin-bottom: 20px;"></div>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">起学Python,<span style="color: black;">一块</span>写代码,加油!奥利给!!!</span></strong></p>




页: [1]
查看完整版本: Python爬虫之网站超清照片爬取