如何使用python解除selenium访问网站被反爬限制封锁?

lsekfe 发表于 2023-6-20 13:20:55

在访问某些网站时，selenium webdriver 开启网页失败，被发现为爬虫，目前我碰到的有效解决方案是：
　　1、因为selenium在命令行手动开启后的谷歌浏览器加了一些变量值，比如window.navigator.webdriver，在正常的谷歌浏览器是undefined，在selenium打开的谷歌浏览器是True，然后对方服务器就会下发js代码，检测这个变量值给网站，网站判断这个值，为True就是爬虫程序就封锁你的访问，如下图。
http://www.51testing.com/attachments/2023/06/15326880_202306191507231Cvpn.jpg
　　实现代码是在请求之前改变一些参数，绕过检测，具体细节可以自己了解下网站检测selenium的原理，需要设置对应其它的值都可以加：
　　# 下面代码本人是基于命令行打开再接管浏览器窗口，
　　from selenium import webdriver
　　options = webdriver.ChromeOptions()

　　# chrome在79版之前用下面两行代码
　　#options.add_experimental_option("excludeSwitches", ["enable-automation"])
　　#options.add_experimental_option('useAutomationExtension', False)

　　# -我是最新谷歌浏览器版本，chrome在79和79版之后用这个，
　　driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
　　 "source": """
　　 Object.defineProperty(navigator, 'webdriver', {
　　 get: () => undefined
　　 })
　　"""
　　})
　　driver.get("这里填写你被反爬网站的链接")

页: [1]

51Testing软件测试论坛 's Archiver

如何使用python解除selenium访问网站被反爬限制封锁?