TA的每日心情 | 无聊 4 天前 |
---|
签到天数: 530 天 连续签到: 2 天 [LV.9]测试副司令
|
1测试积点
代码如下
- import scrapy
- from ..items import ShujukuItem
- import selenium
- import time
- from selenium import webdriver
- from selenium.webdriver.common.action_chains import ActionChains
- class SjkSpider(scrapy.Spider):
- name = 'sjk'
- allowed_domains = ['fintechdb.cn']
- start_urls = ['http://www.fintechdb.cn/']
- def parse(self, response):
- item = ShujukuItem()
- location_data = input("所在地区")
- field_data = input("业务领域")
- financing_data = input("融资轮次")
- option = webdriver.ChromeOptions()
- option.add_argument('--headless')
- option.add_argument('--disable-gpu')
- option.add_argument('--disable-javascript')
- option.add_argument('blink-settings=imagesEnabled=false')
- option.add_experimental_option("excludeSwitches", ['enable-automation', 'enable-logging'])
- driver = webdriver.Chrome(chrome_options=option)
- driver.get('http://www.fintechdb.cn/')
- location = response.xpath('//*[@id="filter-tag"]/dl/dd[1]/ul/li[*]/a/text()').getall()
- field = response.xpath('//*[@id="filter-tag"]/dl/dd[2]/ul/li[*]/a/text()').getall()
- financing = response.xpath('//*[@id="filter-tag"]/dl/dd[3]/ul/li[*]/a/text()').getall()
- a = location.index(location_data) + 1
- b = field.index(field_data) + 1
- c = financing.index(financing_data) + 1
- # print(a,b,c)
- loc = response.xpath(f'//*[@id="filter-tag"]/dl/dd[1]/ul/li[{a}]/a/text()').get()
- fie = response.xpath(f'//*[@id="filter-tag"]/dl/dd[2]/ul/li[{b}]/a/text()').get()
- fin = response.xpath(f'//*[@id="filter-tag"]/dl/dd[3]/ul/li[{c}]/a/text()').get()
- # print(loc)
- click_01 = driver.find_element_by_xpath(f'//*[@id="filter-tag"]/dl/dd[1]/ul/li[{a}]/a')
- ActionChains(driver).move_to_element(click_01).click(click_01).perform()
- click_02 = driver.find_element_by_xpath(f'//*[@id="filter-tag"]/dl/dd[2]/ul/li[{b}]/a')
- ActionChains(driver).move_to_element(click_02).click(click_02).perform()
- click_03 = driver.find_element_by_xpath(f'//*[@id="filter-tag"]/dl/dd[3]/ul/li[{c}]/a')
- ActionChains(driver).move_to_element(click_03).click(click_03).perform()
- # while True:
- # if response.xpath('//*[@id="home-load-more"]/@style/text()').get() == 'display: block;':
- # driver.find_element_by_id('home-load-more').click()
- # else:
- # break
- title = response.xpath('//*[@id="company-list"]/div[*]/a/div/div[2]/text()').getall()
- time = response.xpath('//*[@id="company-list"]/div[*]/a/div/p[1]/text()').getall()
- fie = response.xpath('//*[@id="company-list"]/div[*]/a/div/p[2]/text()').getall()
- fin = response.xpath('//*[@id="company-list"]/div[*]/a/div/p[3]/text()').getall()
- print(title)
- item['title'] = title
- item['time'] = time
- item['fie'] = fie
- item['fin'] = fin
- yield item
-
复制代码 问题是
-
- title = response.xpath('//*[@id="company-list"]/div[*]/a/div/div[2]/text()').getall()
- time = response.xpath('//*[@id="company-list"]/div[*]/a/div/p[1]/text()').getall()
- fie = response.xpath('//*[@id="company-list"]/div[*]/a/div/p[2]/text()').getall()
- fin = response.xpath('//*[@id="company-list"]/div[*]/a/div/p[3]/text()').getall()
-
复制代码 这些代码应该在selenium实现点击后再爬取,那么是否应该还为response.xpath,还是应该用什么
因为我过程中打印两组数据确定已经定位到位置了并且更改click的方式确保点击到
但是打印出的结果仍为未点击的结果
所以就在思考是不是最后这里response的问题
|
|