TA的每日心情 | 无聊 4 天前 |
---|
签到天数: 530 天 连续签到: 2 天 [LV.9]测试副司令
|
1测试积点
- from selenium import webdriver
- from time import sleep
- from selenium.webdriver.common.by import By
- from lxml import etree
- import requests
- import time
- headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'}
- driver = webdriver.Chrome()
- driver.get('https://www.timeshighereducation.com/')
- sleep(5)
- driver.find_element(By.XPATH,'//div[@class="col-sm-12"]/div[@class="navbar-button__wrapper"]/ul/li/a[@title="User account"]').click()
- sleep(5)
- driver.find_element(By.XPATH,'//div[@class="region region-secondary-navigation"]/section/div/ul/li/a[@id="modal-login"]').click()
- sleep(5)
- sleep(5)
- driver.switch_to.frame(driver.find_element(By.XPATH,'//div[@id="modal-content"]/iframe'))
- driver.find_element(ByXPATH,'//input[@placeholder="Username or email"]').send_keys("123456")
- driver.find_element(By.XPATH,'//input[@placeholder="Password"]').send_keys("123456")
- # driver.refresh()
- driver.find_element(By.XPATH,'//form[@class="user-login-form"]/div/input[@value="Log in"]').click()
- sleep(5)
- driver.switch_to.parent_frame
- sleep(5)
复制代码 解析html字符串,获取需要的信息- def parse_html(html):
- text = etree.HTML(html)
- node_list = text.xpath('//tbody/tr[@class="odd row-1 js-row"]')
- # print(node_list)
- for i in node_list:
- try:
- # rank
- rank = i.xpath('/td[@class="rank sorting_1 sorting_2"]/text()')
- # name
- name = i.xpath('/td[@class=" name namesearch"]/a/text()')
- # region
- region = i.xpath('/td/div/div[@class="location"]/span/a/text()')
- #ratio
- # ratio = i.xpath('')
- # 构建json格式的字符串
- items = {
- "排名": rank,
- "名称": name,
- "地区/国家": region
- }
- print(items)
- except:
- pass
- def main():
- # 循环获取第0~15的网页源码,并解析
- for page in range(0, 16):
- # 每个网页的网址
- url = 'https://www.timeshighereducation.com/world-university-rankings/2022#!/page/'+ str(page) + '/length/25/sort_by/rank/sort_order/asc/cols/stats'
- # 爬取网页源码
- html = requests.get(url, headers=headers).text
- # 解析网页信息
- parse_html(html)
复制代码
程序运行入口
- if name == 'main':
- main()
复制代码 为什么我爬不到数据,有没有能人赐教,本人初次接触
|
|