python selenium 用scrapy爬取时，点击后的页面是否应该用response来获取

测试积点老人 发表于 2021-7-16 13:26:19

代码如下
import scrapy
from ..items import ShujukuItem
import selenium
import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
class SjkSpider(scrapy.Spider):
name = 'sjk'
allowed_domains = ['fintechdb.cn']
start_urls = ['http://www.fintechdb.cn/']
def parse(self, response):
   item = ShujukuItem()
   location_data = input("所在地区")
   field_data = input("业务领域")
   financing_data = input("融资轮次")
   option = webdriver.ChromeOptions()
   option.add_argument('--headless')
   option.add_argument('--disable-gpu')
   option.add_argument('--disable-javascript')
   option.add_argument('blink-settings=imagesEnabled=false')
   option.add_experimental_option("excludeSwitches", ['enable-automation', 'enable-logging'])
   driver = webdriver.Chrome(chrome_options=option)
   driver.get('http://www.fintechdb.cn/')
   location = response.xpath('//*[@id="filter-tag"]/dl/dd/ul/li[*]/a/text()').getall()
   field = response.xpath('//*[@id="filter-tag"]/dl/dd/ul/li[*]/a/text()').getall()
   financing = response.xpath('//*[@id="filter-tag"]/dl/dd/ul/li[*]/a/text()').getall()
   a = location.index(location_data) + 1
   b = field.index(field_data) + 1
   c = financing.index(financing_data) + 1
   # print(a,b,c)
   loc = response.xpath(f'//*[@id="filter-tag"]/dl/dd/ul/li[{a}]/a/text()').get()
   fie = response.xpath(f'//*[@id="filter-tag"]/dl/dd/ul/li[{b}]/a/text()').get()
   fin = response.xpath(f'//*[@id="filter-tag"]/dl/dd/ul/li[{c}]/a/text()').get()
   # print(loc)
   click_01 = driver.find_element_by_xpath(f'//*[@id="filter-tag"]/dl/dd/ul/li[{a}]/a')
   ActionChains(driver).move_to_element(click_01).click(click_01).perform()
   click_02 = driver.find_element_by_xpath(f'//*[@id="filter-tag"]/dl/dd/ul/li[{b}]/a')
   ActionChains(driver).move_to_element(click_02).click(click_02).perform()
   click_03 = driver.find_element_by_xpath(f'//*[@id="filter-tag"]/dl/dd/ul/li[{c}]/a')
   ActionChains(driver).move_to_element(click_03).click(click_03).perform()
   # while True:
   # if response.xpath('//*[@id="home-load-more"]/@style/text()').get() == 'display: block;':
   #       driver.find_element_by_id('home-load-more').click()
   # else:
   #       break
   title = response.xpath('//*[@id="company-list"]/div[*]/a/div/div/text()').getall()
   time = response.xpath('//*[@id="company-list"]/div[*]/a/div/p/text()').getall()
   fie = response.xpath('//*[@id="company-list"]/div[*]/a/div/p/text()').getall()
   fin = response.xpath('//*[@id="company-list"]/div[*]/a/div/p/text()').getall()
   print(title)
   item['title'] = title
   item['time'] = time
   item['fie'] = fie
   item['fin'] = fin
   yield item
问题是

   title = response.xpath('//*[@id="company-list"]/div[*]/a/div/div/text()').getall()
   time = response.xpath('//*[@id="company-list"]/div[*]/a/div/p/text()').getall()
   fie = response.xpath('//*[@id="company-list"]/div[*]/a/div/p/text()').getall()
   fin = response.xpath('//*[@id="company-list"]/div[*]/a/div/p/text()').getall()
这些代码应该在selenium实现点击后再爬取，那么是否应该还为response.xpath，还是应该用什么
因为我过程中打印两组数据确定已经定位到位置了并且更改click的方式确保点击到
但是打印出的结果仍为未点击的结果
所以就在思考是不是最后这里response的问题

海海豚 发表于 2021-7-19 09:36:16

https://www.cnblogs.com/fighter007/p/13720315.html看下这个

qqq911 发表于 2021-7-19 10:56:25

返回是可以直接存为变量的

页: [1]

51Testing软件测试论坛 's Archiver

python selenium 用scrapy爬取时，点击后的页面是否应该用response来获取