python

发布日期: 2021-05-22

python网络爬虫Selenium(2)

大家好我是羔羊！

今天我们同样的以top250为目标来讲解下selenium的使用

连接：https://movie.douban.com/top250

首先定下目标，比如说我们要获取top250的电影名，导演主演等信息，还有评分以及最后一句评价等信息，

那么首先上代码

from selenium import webdriver

import time

import os

path = '电影'
if not os.path.exists(path):
os.makedirs(path)

weizhi = r"C:\Program Files (x86)\Mozilla Firefox\geckodriver.exe"
#调用火狐浏览器
driver = webdriver.Firefox(executable_path= weizhi)

#driver.maximize_window()
driver.get("https://movie.douban.com/top250")
moves = driver.find_elements_by_css_selector('div.info')

#下滑到页面底部
driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")

for move in moves:
#获取电影名字
move_name = move.find_element_by_css_selector("span[class='title']")
move_name = move_name.text

#获取其他信息 other
move_other = move.find_element_by_css_selector("div[class='bd']")
move_other = move_other.text

a = move_name + move_other+'\n\n\n'
with open('电影/电影名.txt',"a+",encoding='utf-8') as f:
    f.write(a)
    f.close()

time.sleep(1)
#关闭浏览器
driver.close()
#退出浏览器
driver.quit()

不要慌一句一句的解释：

#判断是否有名字叫电影的文件夹，如果没有的话就创建一个所以在这里需要用到os这个库

import os

path = '电影'
if not os.path.exists(path):
`os.makedirs(path)``

#使用火狐浏览器用geckodriver.exe来进行开启

weizhi = r"C:\Program Files (x86)\Mozilla Firefox\geckodriver.exe"
#调用火狐浏览器
driver = webdriver.Firefox(executable_path= weizhi)

#设置目标网页

driver.get("https://movie.douban.com/top250")

moves = driver.find_elements_by_css_selector('div.info')

#这句话的意思是我们需要爬取的具体内容在哪一个标签里面，这里我们获取的是info标签，上截图

我们可以看到我们需要获取的所有资源都在这个标签里面，所以我们先确定这个标签作为我们的目标，但是因为有很多部电影，每一部电影都会有一个

这个标签，所以这个语句的element要加上s表示我们获取的是网页上面的所有

然后再来获取里面具体的东西

#下滑到页面底部

driver.execute_script(“window.scrollTo(0,document.body.scrollHeight);”)

#编写一个循环然后根据电影名字所在的标签进行获取，因为我们获取的是标签里面的内容，所以需要在后面添加一个.text

for move in moves:
#获取电影名字
move_name = move.find_element_by_css_selector("span[class='title']")
move_name = move_name.text

#这个和上面的一样

#获取其他信息 other
move_other = move.find_element_by_css_selector("div[class='bd']")
move_other = move_other.text

#将电影名字和其他信息放在一起

a = move_name + move_other+’\n\n\n’

#将内容保存起来并且写入.txt.文档里面，编码为utf-8

with open(‘电影/电影名.txt’,”a+”,encoding=’utf-8’) as f:
f.write(a)
f.close()

#等待1秒

``time.sleep(1) #关闭浏览器 driver.close() #退出浏览器 driver.quit()`

打完收工！

Gao Yang

https://gaoiyang.github.io/2021/05/22/python%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%ABSelenium(2)/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 Gao Yang !

python

python的安装和使用

2021-05-30 Gao Yang

python

本篇

python网络爬虫Selenium(2)

2021-05-22 Gao Yang

python

python网络爬虫Selenium(2)

python网络爬虫Selenium(2)

你的赏识是我前进的动力