Friday, March 27, 2009

Simple cron like scheduler in Scala

Today in my Scala explorations I ran into the problem that I wanted some scheduler like Quarz or java.util.Timer -- that should be easy in scala I thought an came up with the following:

private val timedActor = actor {
//once a day
while (true) {
val c = Calendar.getInstance
c.set(Calendar.HOUR_OF_DAY, 0) //midnight
c.add(Calendar.DAY_OF_YEAR, 1) //next day
val sleepAmount = c.getTimeInMillis - Calendar.getInstance.getTimeInMillis

Pretty slick heavy weight (check the comment for a better implementation) -- they should include that in one of their libraries...

Thursday, March 19, 2009

HTML Screen Scraping

For some little side project I found myself screen scraping some HTML sites for information. My first idea was to access the pages with the URL class and then use TagSoup for parsing (see this Blog Entry for an example). This in fact worked quite well and using XPath from Scala was a blast.

Nevertheless the scraping sometimes didn't work because for some weird reason the site I was scraping demanded a JavaScript enabled browser (and sending forms is no real fun with that approach). So I turned to HTMLUnit which seems to be an even better screen scrape tool.

Now what we really need is a HTMLUnit which gives us simple access to a TagSoup of the content...