Worked a bit on the scraping project today. This is one of my small projects; something I keep around when I have a few hours to kill. But nothing too involving.
The idea is that I have had a xanga account followed by a lj account followed by vox. Right now when I blog, my entry is owned by whichever host I'm using. So in the case of xanga, I outgrew the atmosphere a long time ago. And yet there's still no way to get my data out and transfer it somewhere else. There's nigh near 4 years of entries in there; I refuse to abandon it.
So I've hacked up a little screen scraper in ruby with Mechanize/Hpricot to get my blogs out. It was my first time playing around with JQuery/XPath and I gotta say; I was very pleasantly surprised. Its nice!
Currently, I can dl all my blog entries and save them off for later processing, ordered by date of blog. So the next problem to solve is what do I do with all that data - in my case, over 260 individual entries? Vox is a PoS; its way spammy and has ads. Etc, ad naseum.
It doesn't make sense to jump to another provider so they can lock in my data again.
I've been considering publishing it through Google Base; their open dbase api. And perhaps building out a simple UI running directly on top of base; its got all the javascript hooks to query it.
Then, I can self-host the javascript/html anywhere, even on a free host. Its minimal size and all static content. All the business logic would get pushed to the front-end. In fact, there would be *no* back-end to this thing. The client would talk directly with GBase so there's no bandwidth that you have to pay for. The advantages are that its available to anyone and the user can pick how they display the data.
I wonder if a blogging platform that is open exists already? It should, but I've never heard of it.