Introducing JScrape - Java based HTML Scraping API

http://apsquared.net/blog/2007/04/01/introducing-jscrape-java-based-html-scraping-api/

A few pieces of software I’ve worked on have required me to scrape data from existing websites. In general the code to do this is ugly. The way I had been doing it was using the standard java connectivity classes to grab the data from the site and then parsing it using the standard string parsing routines. This made the parsing ugly to write and even uglier to maintain. In search of a better way I came across an article that discussed the us of XQuery to scrape HTML. I took it once step farther and created an entire end to end API for grabbing the data from a website and running the query to return either a simple string or a list of objects. My API relies on a few other APIs, namely TagSoup, Saxon and Commons-HttpClient. However with just some simple code you can begin scraping web pages.

Our first release of the API as well as some sample code showing how to scrape a stock price from Yahoo! can be found here. Please remember this is the first release and it definetly is in an Alpha stage. Please let us know where you think the documentation and samples can be improved.

JScrape