Scrapy – Open Source Web Scraping Framework for Python
January 31, 2013

Scrapy – Open Source Web Scraping Framework for Python
http://scrapy.org/
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. Features include: a) Simple – Scrapy was designed with simplicity in mind, by providing the features you need without getting in your way; b_ Productive – Just write the rules to extract the data from web pages and let Scrapy crawl the entire web site for you; c) Fast – Scrapy is used in production crawlers to completely scrape more than 500 retailer sites daily, all in one server; d) Extensible – Scrapy was designed with extensibility in mind and so it provides several mechanisms to plug new code without having to touch the framework core; e) Portable, open-source, 100% Python – Scrapy is completely written in Python and runs on Linux, Windows, Mac and BSD; f) Batteries included – Scrapy comes with lots of functionality built in; g) Well-documented & well-tested – Scrapy is extensively documented and has an comprehensive test suite with very good code coverage; and h) Good community and commercial support – Scrapy has a healthy community and very good commercial support. This will be added to Web Data Extractors white paper.