51Testing软件测试论坛
标题:
爬虫测试
[打印本页]
作者:
胖虎
时间:
2018-3-15 16:31
标题:
爬虫测试
#coding=utf-8
__doc__ = '''''
使用requests来简单的爬取图片,请求使用Referer,否则爬取不到正确的结果
'''
CHUNK = 1024
import requests
import re
import os
def download_images(x, y):
URL_SEGMENT = '{0}/{1}'.format(x,y)
URL_FORMAT = 'http://img.zngirls.com/gallery/%s/{0:03d}.jpg' % (URL_SEGMENT)
URL_FORMAT0 = 'http://img.zngirls.com/gallery/%s/{0}.jpg' % (URL_SEGMENT)
i=0
numbers = re.compile('\\d+/\\d+')
rl = numbers.findall(URL_FORMAT0)
if not rl:
return
dirname = rl[0].replace('/','-')
if not os.path.isdir(dirname):
os.makedirs(dirname)
while True:
url = ''
if i==0:
url = URL_FORMAT0.format(i)
else:
url = URL_FORMAT.format(i)
print('url=', url)
res = requests.get(url, headers={'Referer':'http://www.zngirls.com/g/13080/2.html',}, stream=True)
if res.status_code != 200:
break
filename = os.path.join(dirname,'{0:03d}.jpg'.format(i))
with open(filename, mode='wb') as f:
for chunk in res.iter_content(CHUNK):
f.write(chunk)
i += 1
def main():
download_images(21363, 18304)
if __name__ == '__main__':
main()
复制代码
作者:
海海豚
时间:
2018-3-16 13:32
谢谢分享!
作者:
libingyu135
时间:
2018-4-25 16:42
6666
作者:
梦想家
时间:
2018-5-8 10:19
作者:
Miss_love
时间:
2018-5-8 13:28
作者:
梦想家
时间:
2018-5-8 13:44
欢迎光临 51Testing软件测试论坛 (http://bbs.51testing.com/)
Powered by Discuz! X3.2