51Testing软件测试论坛
标题:
centos7 pyspider环境安装
[打印本页]
作者:
测试积点老人
时间:
2018-12-4 16:28
标题:
centos7 pyspider环境安装
PySpider 是一个我个人认为非常方便并且功能强大的爬虫框架,支持多线程爬取、JS动态解析,提供了可操作界面、出错重试、定时爬取等等的功能,使用非常人性化。
网上的参考文档:
http://www.jianshu.com/p/8eb248697475
http://cuiqingcai.com/2652.html
https://yq.aliyun.com/articles/75518
1.搭建环境:
python版本:3.6.3
系统环境:centos7.3
1.1.搭建python3环境:
# 下载依赖
<p>yum install -y ncurses-devel openssl openssl-devel zlib-devel gcc make glibc-devel libffi-devel glibc-static glibc-utils sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libcurl-devel</p>
复制代码
# 下载python
wget https://www.python.org/ftp/python/3.6.3/Python-3.6.3.tgz
复制代码
#解压
tar -xf Python-3.6.3.tgz
复制代码
#编译安装
./configure --prefix=/usr/local/python3.6 --enable-shared
复制代码
make && make install
复制代码
# 建立软链接
<p>ln -s /usr/local/python3.6/bin/python3 /usr/bin/python3</p><p>
</p><p>echo "/usr/local/python3.6/lib" > /etc/ld.so.conf.d/python3.5.conf</p><p>
</p><p>ldconfig</p>
复制代码
# 验证python3
<p>[root@ceph-host-01 local]# python3</p><p>
</p><p>Python 3.6.3 (default, Oct 9 2017, 04:01:24) </p><p>
</p><p>[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux</p><p>
</p><p>Type "help", "copyright", "credits" or "license" for more information.</p><p>
</p><p>>>> </p><p>
</p><p>
</p><p>
</p><p>#pip</p><p>
</p><p>/usr/local/python3.6/bin/pip3 install --upgrade pip</p><p>
</p><p>ln -s /usr/local/python3.6/bin/pip /usr/bin/pip</p><p>
</p><p>
</p>
复制代码
1.2.安装pyspider
pip install pyspider
复制代码
启动python中的pycurl模块出现如下问题
ImportError: pycurl: libcurl link-time ssl backend (nss) is different from compile-time ssl backend (none/other)
复制代码
解决方法:
<p>pip uninstall pycurl</p><p>export PYCURL_SSL_LIBRARY=nss</p><p>pip install pycurl</p>
复制代码
1.3.安装phantomjs
官网下载:
http://phantomjs.org/download.html
wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
复制代码
解压:
<p>yum -y install unbzip2</p><p>
</p><p>bzip2 -d phantomjs-2.1.1-linux-x86_64.tar.bz2 </p><p>
</p><p>tar -xf phantomjs-2.1.1-linux-x86_64.tar</p><p>
</p><p>mv phantomjs-2.1.1-linux-x86_64 phantomjs</p><p>
</p><p>ln -sv /usr/local/phantomjs/bin/phantomjs /usr/bin/phantomjs</p>
复制代码
1.4.启动pyspider
由于放在公网,编辑了一个配置文件config.json ,用于登录认证
<p>[root@ceph-host-01 local]# vim config.json </p><p>
</p><p>
</p><p>
</p><p>{</p><p>
</p><p> "webui": {</p><p>
</p><p> "port": "5000",</p><p>
</p><p> "username": "abc",</p><p>
</p><p> "password": "123456",</p><p>
</p><p> "need-auth": true</p><p>
</p><p> }</p><p>
</p><p>}</p><p>
</p><p>开启进程</p><p>
</p><p>nohup pyspider --config config.json &</p>
复制代码
欢迎光临 51Testing软件测试论坛 (http://bbs.51testing.com/)
Powered by Discuz! X3.2