python一个简单的爬虫测试

胖虎发表于 2018-3-15 17:14:25

之前稍微学了一点python，后来一直都没用，今天稍微做一个小爬虫试一试。。

太久没用了，都忘记pycharm编译的时候要选文件。。我还纳闷怎么一直报错。。怎么练hello world
都是不能run。。无语。。

贴一下今天实验的代码好了。

复制代码
1 #coding=utf-8
2 import urllib
3 import re
4
5 def getHtml(url):
6 page = urllib.urlopen(url)
7 html = page.read()
8 return html
9 def getImg(html):
10 reg=r'src="(.+?\.jpg)" ></div><br>'
11 imgre=re.compile(reg)
12 imgList=re.findall(imgre,html)
13 x=0
14 for imgurl in imgList:
15 urllib.urlretrieve(imgurl,'%s.jpg'%x)
16 x+=1
17
18 html = getHtml("https://tieba.baidu.com/p/5099605942?see_lz=1")
19
20 print getImg(html)
复制代码
随便抓了炉石传说贴吧里一个直播开包的帖子的图片。。

getHtml（）函数抓取并复制一个网页的源码

getImg（）函数通过正则表达式抓取源码中图片的代码，然后保存数组中，然后输出。

正则表达式还学要好好学一下。。

今天只是做一个小小的test，回忆一下。

海海豚 发表于 2018-3-15 17:35:38

谢谢分享~

梦想家 发表于 2018-3-16 14:20:48

:victory:

libingyu135 发表于 2018-4-25 16:42:46

666

梦想家 发表于 2018-5-5 09:38:46

:victory:

一颗正经的小树 发表于 2018-5-5 23:02:29

:handshake:handshake

Miss_love 发表于 2018-5-8 13:32:19

:lol

页: [1]

51Testing软件测试论坛 's Archiver

python一个简单的爬虫测试