Category Posts Navigation

artoo.js – The Client-Side Scraping Companion

Posted by Marcus Zillman

artoo.js – The Client-Side Scraping Companion
http://medialab.github.io/artoo/

artoo.js is a piece of JavaScript code meant to be run in your browser’s console to provide you with some scraping utilities. This nice droid is loaded into the JavaScript context of any webpage through a handy bookmarklet you can instantly install by dropping the icon onto your bookmark bar. Features include: a) Scrape everything, everywhere: invoke artoo in the JavaScript context of any web page; b) Loaded with helpers: Scrape data quick & easy with powerful methods such as artoo.scrape; c) Data download: Make your browser download the scraped data with artoo.save methods; d) Spiders: Crawl pages through ajax and retrieve accumulated data with artoo’s spiders; e) Content expansion: Expand pages’ content programmatically thanks to artoo.autoExpand utilities; f) Store: stash persistent data in the localStorage with artoo’s handy abstraction; g) Instructions: record the instructions typed into the console and save them for later use; h) jQuery: jQuery is injected alongside artoo in the pages you visit so you can handle the DOM easily; i) Custom bookmarklets: you can use artoo as a framework and easily create custom bookmarklets to execute your code; and j) Chrome extension: trying to scrape a nasty page abiding by some sneaky HTML5 rules? Here, have a chrome extension. This will be added to my Web Data Extractors White Paper. This will be added to Bot and Intelligent Agent Research Resources and Sites

Leave a Reply

Facebook Comments

Browse Categories

AwarenessWatch Newsletter