51Testing软件测试论坛
标题:
QQ空间爬取报错问题!
[打印本页]
作者:
测试积点老人
时间:
2022-3-21 10:35
标题:
QQ空间爬取报错问题!
QQ空间爬取报错:urllib.error.URLError: <urlopen error [WinError 10061] 由于目标计算机积极拒绝,无法连接。>
from lxml import etree
from bs4 import BeautifulSoup
from selenium import webdriver
import time
#使用 selenium
driver = webdriver.PhantomJS(executable_path="D:\\GHOST/bin/phantomjs.exe")
driver.maximize_window()
#登录 QQ 空间
def get_shuoshuo(qq):
driver.get('http://user.qzone.qq.com/{}/311'.format(qq))
time.sleep(6.6)
try:
driver.find_element_by_id('login_div')
a = True
except:
a = False
if a == True:
driver.switch_to.frame('login_frame')
time.sleep(3.3)
driver.find_element_by_id('switcher_plogin').click()
time.sleep(3.3)
driver.find_element_by_id('u').clear()
time.sleep(3.3)
driver.find_element_by_id('u').send_keys('你的QQ')
time.sleep(3.3)
driver.find_element_by_id('p').clear()
time.sleep(3.3)
driver.find_element_by_id('p').send_keys('你的QQ密码')
time.sleep(3.3)
driver.find_element_by_id('login_button').click()
time.sleep(6.6)
driver.implicitly_wait(3)
try:
driver.find_element_by_id('QM_OwnerInfo_Icon')
b = True
except:
b = False
if b == True:
driver.switch_to.frame('app_canvas_frame')
content = driver.find_elements_by_css_selector('.content')
stime = driver.find_elements_by_css_selector('.c_tx.c_tx3.goDetail')
for con,sti in zip(content,stime):
data = {
'time':sti.text,
'shuos':con.text
}
print(data)
pages = driver.page_source
soup = BeautifulSoup(pages,'lxml')
cookie = driver.get_cookies()
cookie_dict = []
for c in cookie:
ck = "{0}={1};".format(c['name'],c['value'])
cookie_dict.append(ck)
i = ''
for c in cookie_dict:
i += c
print('Cookies:',i)
print("================完成================")
driver.close()
driver.quit()
if __name__ == '__main__':
get_shuoshuo('目标的QQ号')
复制代码
[attach]136947[/attach]
作者:
qqq911
时间:
2022-3-22 10:18
被反爬了
作者:
kallinr
时间:
2022-3-22 10:18
反扒
作者:
jingzizx
时间:
2022-3-22 14:54
没连上。。。
欢迎光临 51Testing软件测试论坛 (http://bbs.51testing.com/)
Powered by Discuz! X3.2