Python利用bs4批量抓取网页图片并下载保存至本地

发表评论
1,324 次浏览

A+

所属分类：Python

Python利用bs4批量抓取网页图片并下载保存至本地

使用bs4抓取网页图片，bs4解析比较简单，需要预先了解一些html知识，bs4的逻辑简单，编写难度较低。本例以抓取某壁纸网站中的壁纸为例。(bs4为第三方库，使用前需要要自行安装)

步骤

拿到页面源代码，提取子页面的链接地址——>href
通过href拿到子页面的内容，再从子页面中找到图片的下载地址 img——>src
下载图片

首先导入必要的包

import requests
from bs4 import BeautifulSoup

准备url和响应

url = "https://umei.cc/bizhitupian/diannaobizhi/"
resp = requests.get(url)
resp.encoding = "UTF-8"

使用bs4提取数据

main_page = BeautifulSoup(resp.text, "html.parser")  
alist = main_page.find("div", class_="TypeList").find_all("a")

跳转到子页面
抓取图片下载地址
图片下载

count = 1 
for a in alist:  
    href = "https://umei.cc" + a.get('href')  
    child_resp = requests.get(href)  
    child_resp.encoding = "UTF-8"  
    child_mainpage = BeautifulSoup(child_resp.text, "html.parser")   
    child_alist = child_mainpage.find("p", align="center") 
    image = child_alist.find("img")  
    src = image.get("src")          
    image_resp = requests.get(src)		
    image_name = src.split("/")[-1]		
    with open("image/"+image_name, mode="wb") as f: 
        f.write(image_resp.content)
    print(f"{count}张已完成！,{image_name}")  
    count = count + 1

改进

由于涉及到多次的请求，为了避免ip被网站屏蔽，可以令程序在抓取到一张图片后睡眠一段时间，并且显示所用时间。

import time
time_start = time.time() 
...
...
...
print(f"已用时{time.time()-time_start}s") 
time.sleep(1)

完整源码：



import requests
from bs4 import BeautifulSoup
import time
count = 1     
url = "https://umei.cc/bizhitupian/diannaobizhi/"
resp = requests.get(url)
resp.encoding = "UTF-8"
main_page = BeautifulSoup(resp.text, "html.parser")
alist = main_page.find("div", class_="TypeList").find_all("a")
time_start = time.time()   
for a in alist:  
    href = "https://umei.cc" + a.get('href')  
    child_resp = requests.get(href)  
    child_resp.encoding = "UTF-8"   
    child_mainpage = BeautifulSoup(child_resp.text, "html.parser")   
    child_alist = child_mainpage.find("p", align="center")   
    image = child_alist.find("img")    
    src = image.get("src")       
    image_resp = requests.get(src)   
    image_name = src.split("/")[-1]   
    with open("image/"+image_name, mode="wb") as f:   
        f.write(image_resp.content)
    print(f"{count}张已完成！,{image_name}")
    print(f"已用时{time.time()-time_start}s")    
    count = count + 1
    time.sleep(1)

效果

完成！

注意定位标签的选择，最好是唯一标签，然后在一层一层的进行筛选，最终获取到数据，这样不容易发生找错的情况

我的微信
这是我的微信扫一扫

我的微信公众号
我的微信公众号扫一扫

版权声明：本站收录文章，于2022年12月6日14:01:48，由 admin 发表，共 1886 字。
转载请注明：Python利用bs4批量抓取网页图片并下载保存至本地 | 安云网 – AnYun.ORG

文章目录
繁