- A+
所属分类:Python
Python利用bs4批量抓取网页图片并下载保存至本地
使用bs4抓取网页图片,bs4解析比较简单,需要预先了解一些html知识,bs4的逻辑简单,编写难度较低。本例以抓取某壁纸网站中的壁纸为例。(bs4为第三方库,使用前需要要自行安装)
步骤
- 首先导入必要的包
import requests
from bs4 import BeautifulSoup
- 准备url和响应
url = "https://umei.cc/bizhitupian/diannaobizhi/"
resp = requests.get(url)
resp.encoding = "UTF-8"
- 使用bs4提取数据
main_page = BeautifulSoup(resp.text, "html.parser")
alist = main_page.find("div", class_="TypeList").find_all("a")
-
跳转到子页面
-
抓取图片下载地址
-
图片下载
count = 1
for a in alist:
href = "https://umei.cc" + a.get('href')
child_resp = requests.get(href)
child_resp.encoding = "UTF-8"
child_mainpage = BeautifulSoup(child_resp.text, "html.parser")
child_alist = child_mainpage.find("p", align="center")
image = child_alist.find("img")
src = image.get("src")
image_resp = requests.get(src)
image_name = src.split("/")[-1]
with open("image/"+image_name, mode="wb") as f:
f.write(image_resp.content)
print(f"{count}张已完成!,{image_name}")
count = count + 1
改进
由于涉及到多次的请求,为了避免ip被网站屏蔽,可以令程序在抓取到一张图片后睡眠一段时间,并且显示所用时间。
import time
time_start = time.time()
...
...
...
print(f"已用时{time.time()-time_start}s")
time.sleep(1)
完整源码:
import requests
from bs4 import BeautifulSoup
import time
count = 1
url = "https://umei.cc/bizhitupian/diannaobizhi/"
resp = requests.get(url)
resp.encoding = "UTF-8"
main_page = BeautifulSoup(resp.text, "html.parser")
alist = main_page.find("div", class_="TypeList").find_all("a")
time_start = time.time()
for a in alist:
href = "https://umei.cc" + a.get('href')
child_resp = requests.get(href)
child_resp.encoding = "UTF-8"
child_mainpage = BeautifulSoup(child_resp.text, "html.parser")
child_alist = child_mainpage.find("p", align="center")
image = child_alist.find("img")
src = image.get("src")
image_resp = requests.get(src)
image_name = src.split("/")[-1]
with open("image/"+image_name, mode="wb") as f:
f.write(image_resp.content)
print(f"{count}张已完成!,{image_name}")
print(f"已用时{time.time()-time_start}s")
count = count + 1
time.sleep(1)
效果
完成!
- 注意定位标签的选择,最好是唯一标签,然后在一层一层的进行筛选,最终获取到数据,这样不容易发生找错的情况
- 我的微信
- 这是我的微信扫一扫
- 我的微信公众号
- 我的微信公众号扫一扫