51Testing软件测试论坛

标题: Python爬虫的第二种姿势,Selenium框架案例讲解 [打印本页]

作者: lsekfe    时间: 2021-4-20 11:06
标题: Python爬虫的第二种姿势,Selenium框架案例讲解
selenium使用流程:
  1.环境安装:
  pip install selenium

  2.下载一个浏览器的驱动程序(谷歌浏览器)
  3.实例化一个浏览器对象基本使用
  代码
  from selenium import webdriver
  from lxml import etree
  from time import sleep
  if __name__ == '__main__':
      bro = webdriver.Chrome(r"E:\google\Chrome\Application\chromedriver.exe")
      bro.get(url='http://scxk.nmpa.gov.cn:81/xk/')
      page_text = bro.page_source
      tree = etree.HTML(page_text)
      li_list = tree.xpath('//*[@id="gzlist"]/li')
      for li in li_list:
          name = li.xpath('./dl/@title')[0]
          print(name)
      sleep(5)
      bro.quit()

  基于浏览器自动化的操作代码
  #编写基于浏览器自动化的操作代码
  - 发起请求: get(url)
  - 标签定位: find系列的方法
  - 标签交互: send_ keys( 'xxx' )
  - 执行js程序: excute_script('jsCod')
  - 前进,后退: back(),forward( )
  - 关闭浏览器: quit()1
  代码
请  https://www.taobao.com/  from selenium import webdriver
  from time import sleep
  bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
  bro.get(url='https://www.taobao.com/')
  #标签定位
  search_input = bro.find_element_by_id('q')
  sleep(2)
  #执行一组js代码,使得滚轮向下滑动
  bro.execute_script('window.scrollTo(0,document.body.scrollHeight)')
  sleep(2)
  #标签交互
  search_input.send_keys('女装')
  button = bro.find_element_by_class_name('btn-search')
  button.click()
  bro.get('https://www.baidu.com')
  sleep(2)
  bro.back()
  sleep(2)
  bro.forward()
  sleep(5)
  bro.quit()

  selenium处理iframe:
  - 如果定位的标签存在于iframe标签之中,则必须使用switch_to.frame(id)
  - 动作链(拖动) : from selenium. webdriver import ActionChains
  - 实例化一个动作链对象: action = ActionChains (bro)
  - click_and_hold(div) :长按且点击操作
  - move_by_offset(x,y)
  - perform( )让动作链立即执行
  - action.release( )释放动作链对象
  代码
  https://www.runoob.com/try/try.p ... eryui-api-droppable
  from selenium import webdriver
  from time import sleep
  from selenium.webdriver import ActionChains
  bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
  bro.get('https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')
  bro.switch_to.frame('iframeResult')
  div = bro.find_element_by_id('draggable')
  #动作链
  action = ActionChains(bro)
  action.click_and_hold(div)
  for i in range(5):
      action.move_by_offset(17,0).perform()
      sleep(0.3)
  #释放动作链
  action.release()
  bro.quit()

  selenium模拟登陆QQ空间
  代码
  https://qzone.qq.com/
  from selenium import webdriver
  from time import sleep
  bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe")
  bro.get('https://qzone.qq.com/')
  bro.switch_to.frame("login_frame")
  switcher = bro.find_element_by_id('switcher_plogin')
  switcher.click()
  user_tag = bro.find_element_by_id('u')
  password_tag = bro.find_element_by_id('p')
  user_tag.send_keys('1234455')
  password_tag.send_keys('qwer123')
  sleep(1)
  but = bro.find_element_by_id('login_button')
  but.click()

  无头浏览器和规避检测
  代码
  from  selenium import webdriver
  from time import sleep
  #实现无可视化界面
  from selenium.webdriver.chrome.options import Options
  #实现规避检测
  from selenium.webdriver import ChromeOptions
  #实现无可视化界面
  chrome_options = Options()
  chrome_options.add_argument('--headless')
  chrome_options.add_argument('--disable-gpu')
  #实现规避检测
  option = ChromeOptions()
  option.add_experimental_option('excludeSwitches',['enable-automation'])
  bro = webdriver.Chrome(executable_path=r"E:\google\Chrome\Application\chromedriver.exe",chrome_options=chrome_options,options=option)
  bro.get('https://www.baidu.com')
  print(bro.page_source)
  sleep(2)
  bro.quit()






欢迎光临 51Testing软件测试论坛 (http://bbs.51testing.com/) Powered by Discuz! X3.2