51Testing软件测试论坛

 找回密码
 (注-册)加入51Testing

QQ登录

只需一步,快速开始

微信登录,快人一步

手机号码,快捷登录

查看: 1682|回复: 1
打印 上一主题 下一主题

Scrapy:Python的爬虫框架

[复制链接]
  • TA的每日心情
    无聊
    7 小时前
  • 签到天数: 528 天

    连续签到: 1 天

    [LV.9]测试副司令

    跳转到指定楼层
    #
    发表于 2018-12-28 16:44:39 | 只看该作者 回帖奖励 |正序浏览 |阅读模式

    使用Scrapy可以很方便的完成网上数据的采集工作,它为我们完成了大量的工作,而不需要自己费大力气去开发。


    Items are containers that will be loaded with the scraped data;


    Spiders are classes that you define and Scrapy uses to scrape information from a domain

    They define an initial list of URLs to download



    Scrapy Engine

    The engine is responsible for controlling the data flow between all components of the system, and triggering events when certain actions occur.

    用来处理整个系统的数据流处理,触发事务。


    Scheduler

    The Scheduler receives requests from the engine and enqueues them for feeding them later when the engine requests them.

    用来接受引擎发过来的请求,压入队列中,并在引擎再次请求的时候返回。


    Downloader

    The Downloader is responsible for fetching web pages and feeding them to the engine which, in turn, feeds them to the spiders.

    用于下载网页内容,并将网页内容返回给蜘蛛。


    Spiders

    Spiders are custom classes written by Scrapy users to parse responses and extract items from them or additional URLs to follow. Each spider is able to handle a specific domain .

    用它来制订特定域名或网页的解析规则。


    Item Pipeline

    The Item Pipeline is responsible for processing the items once they have been extracted by the spiders. Typical tasks include cleansing, validation and persistence.

    负责处理有蜘蛛从网页中抽取的项目,他的主要任务是清晰、验证和存储数据。


    Downloader middlewares

    Downloader middlewares are specific hooks that sit between the Engine and the Downloader and process requests when they pass from the Engine to the Downloader, and responses that pass from Downloader to the Engine. They provide a convenient mechanism for extending Scrapy functionality by plugging custom code.

    Scrapy引擎和下载器之间的钩子框架,主要是处理Scrapy引擎与下载器之间的请求及响应。


    Spider middlewares

    Spider middlewares are specific hooks that sit between the Engine and the Spiders and are able to process spider input and output. They provide a convenient mechanism for extending Scrapy functionality by plugging custom code.

    Scrapy引擎和蜘蛛之间的钩子框架,主要工作是处理蜘蛛的响应输入和请求输出。


    分享到:  QQ好友和群QQ好友和群 QQ空间QQ空间 腾讯微博腾讯微博 腾讯朋友腾讯朋友
    收藏收藏
    回复

    使用道具 举报

    本版积分规则

    关闭

    站长推荐上一条 /1 下一条

    小黑屋|手机版|Archiver|51Testing软件测试网 ( 沪ICP备05003035号 关于我们

    GMT+8, 2024-11-18 17:41 , Processed in 0.062740 second(s), 23 queries .

    Powered by Discuz! X3.2

    © 2001-2024 Comsenz Inc.

    快速回复 返回顶部 返回列表