51Testing软件测试论坛

标题: selenium自动化爬虫运行越来越慢,如何解决? [打印本页]

作者: 测试积点老人    时间: 2021-6-17 13:45
标题: selenium自动化爬虫运行越来越慢,如何解决?
最近用selenium写了一个小爬虫,需要循环在搜索框内输入内容,然后模拟点击搜索,在前进的新的页面获取数据,之后循环之前步骤,知道搜索结束。在爬虫刚开始运行的时候速度还可以,大约一秒一个页面,随着运行,速度开始越来越慢。
这里贴出代码
  1. browser = webdriver.Chrome(executable_path="D:\GeckoDriver\chromedriver")
  2. browser.get("https://www.qcc.com/")
  3. # #4.设置浏览器的大小
  4. browser.maximize_window()

  5. login = browser.find_element_by_xpath('/html/body/header/div/ul/li[10]/a')
  6. login.click()
  7. # sleep(30)
  8. # print("30 seconds later")
  9. x = input("登录后请按y")
  10. cookies = browser.get_cookies()
  11. browser.quit()
  12. browser = webdriver.Chrome(executable_path="D:\GeckoDriver\chromedriver",options = chrome_options)
  13. # , options = chrome_options
  14. browser.get("https://www.qcc.com/")
  15. for cookie in cookies:
  16. browser.add_cookie(cookie)
  17. browser.get("https://www.qcc.com/")
  18. browser.maximize_window()
  19. qccinput = browser.find_element_by_css_selector("#searchkey")
  20. # qccinput.clear()
  21. qccinput.send_keys(companyNames[random.randint(0, len(companyNames))])
  22. qccbutton = browser.find_element_by_css_selector(".index-searchbtn")
  23. sleep(0.5)
  24. qccbutton.click()
  25. qccbutton = browser.find_element_by_css_selector(".input-group-btn")
  26. sleep(0.5)
  27. qccbutton.click()
  28. pbar = tqdm(range(len(companyNames)))
  29. for companyName, i in zip(companyNames, pbar):
  30. browser.forward()
  31. # browser.delete_all_cookies()
  32. # browser.refresh();
  33. lem = WebDriverWait(browser, 15, 0.5).until(EC.presence_of_element_located((By.ID, "searchKey"))) # 节约时间,网页出现这个元素再操作
  34. seach = browser.find_element_by_css_selector("#searchKey")
  35. seach.clear()
  36. seach.send_keys(companyName)
  37. seachButton = browser.find_element_by_css_selector(".btn-primary")
  38. seachButton.click()
  39. response = browser.page_source
  40. html = etree.HTML(response)
  41. result = etree.tostring(html)
  42. cookies = browser.get_cookies()
  43. try:
  44. companyName = html.xpath(
  45. 'normalize-space(/html/body/div[1]/div[2]/div[2]/div[3]/div/div[2]/div/table/tr[1]/td[3]/div/a[1])') # 去掉tbody
  46. urls = html.xpath(
  47. '/html/body/div[1]/div[2]/div[2]/div[3]/div/div[2]/div/table/tr[1]/td[3]/div/a[1]/@href') # 去掉tbody
  48. getCompanyNames.append(companyName)
  49. getTaxpayerNumber(urls, cookies)
  50. except Exception as r:
  51. getCompanyNames.append("无法搜索到公司")
  52. taxpayerNumber.append("无法查到税号")
  53. # browser = webdriver.Chrome(profile)
  54. # send_command = ('POST', '/session/$sessionId/chromium/send_command')
  55. # browser.command_executor._commands['SEND_COMMAND'] = send_command
  56. # browser.execute('SEND_COMMAND', dict(cmd='Network.clearBrowserCache', params={}))
  57. browser.delete_all_cookies()
  58. for cookie in cookies:
  59. browser.add_cookie(cookie)
复制代码







欢迎光临 51Testing软件测试论坛 (http://bbs.51testing.com/) Powered by Discuz! X3.2