51Testing软件测试论坛

标题: selenium爬虫被检测到该如何破？ [打印本页]

作者: 测试积点老人 时间: 2020-7-13 11:32
标题: selenium爬虫被检测到该如何破？
你好，我现在用selenium抓取一个网站的时候，被识别为爬虫，请问有什么破解的方法么？代码如下

import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
browser = webdriver.Chrome()
browser.implicitly_wait(40)
browser.get("https://www.crunchbase.com/app/search/companies/")
time.sleep(60)

复制代码

页面返回：

Pardon Our Interruption...
As you were browsing crunchbase accelerates innovation by bringing together data on companies and the people behind them. something about your browser made us think you were a bot. There are a few reasons this might happen:
You're a power user moving through this website with super-human speed.
You've disabled JavaScript in your web browser.
A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article.
To request an unblock, please fill out the form below and we will review it as soon as possible.

复制代码

该网站使用了http://distilnetworks.com的反爬服务.

作者: 海海豚 时间: 2020-7-14 09:38
https://blog.csdn.net/Python1996/article/details/99709167 看下这个

作者: 郭小贱 时间: 2020-7-14 09:49
参考看看呢 https://www.zhihu.com/question/50738719

作者: bellas 时间: 2020-7-14 09:54
https://www.jianshu.com/p/5e34a8f95512 参考下这个链接

作者: qqq911 时间: 2020-7-14 10:19
加上头信息

欢迎光临 51Testing软件测试论坛 (http://bbs.51testing.com/)

Powered by Discuz! X3.2