I remember when I first saw Userland Radio, with its integrated weblogging system and aggregator, I thought *that's* what I wanted to create for my own system. But obviously the first pass was to just get the weblog, and then I'd worry about the next step. But the next step never happened. I just didn't really use an aggregator for probably the first year of blogging, so I never bothered getting around to writing that Java-based aggregator I had in mind called "Jagger." Now as I start to use Bloglines more and more, I'm starting to think to myself, "Why aren't aggregators an integral part of every weblog system?" Why doesn't Moveable Type come with an aggregator? Why doesn't WordPress have a page in its admin screen just like Bloglines? Why are OSS projects like RawDog almost unheard of, but MT and WordPress practically household names?
We all know that Weblogs and Aggregators are the ying and yang of the new web publishing paradigm. Tied together with RSS, they are a pair. If that's the case, why are there so many weblog systems that you can install on your server but just a very limited number of aggregators? Why aren't they one and the same? I think I know the answer: Aggregation is hard.
That's the reason I've never done it. Grabbing the feeds in all their wildly different formats in a timed and efficient way, saving them to a database, comparing the feeds to make sure you don't display duplicates, etc. That's a lot of work. Publish a weblog? That's just HTML - some SQL and scripting language and you've got a web-based CMS in an hour or so. But aggregators? Woof, what a pain. :-)
Seriously, if you look back two years ago when I started using Radio, then Blogger, I chafed at the idea of using someone else's program. And I wondered how long Blogger would be around:
Oof. I feel so guilty using Blogger. I'm a programmer, dammit and this stuff is so ridiculously easy to do myself. I can't believe I'm relying on a third party to do this. Especially Blogger. How does Blogger make any money from this service? I see no ads right now and I paid nothing to sign up. How is this possible? Well, in the long run it's not. So doing anything that relies on this service is pretty dumb.
(Hey - how could I know two years ago that Google of all companies would be 1) Insanely profitable and 2) Interested in Blogger?)
But now I'm doing the exact same thing two years later, but with an online aggregator instead of my weblog. I love Bloglines - it's so freakin' great and reliable (unlike Blogger circa 2002). I was using their mobile version while away from my computer yesterday and it's fantastic. All my *unread* feeds, where ever I go!! That's awesome! Even I, Mr. Mobility, got that buzz a cool new mobile service gives you. That sense of having my personal data with me where ever I go is incredibly powerful. It makes me realize that my iMobs vision I had last year is so incredibly on target.
Anyways, what I'm saying is that now that I have my new weblog system up and running, it's time to set my sights on what it will take to create my own personal online aggregator as well. I want it to work *just* like Bloglines, but rely on my own server resources. The cool thing is that now there are lots of different projects out there that can help do this without much effort on my part. Quartz, Informa and Sandler might get me started (we'll see how complete they are). I'm sure they're all very bloated, overly-complex projects that need god-knows-what to function (do I see Hibernate mentioned on one of those pages??). Maybe I'll just write a small Servlet on my own which has a thread that simply checks the feed headers for 304s, grabs those feeds that have been updated, throw them into a table as best I can with regular XML-processing and go from there.
I guess that'd be a good first start. I definitely don't like "pull based" aggregators, though. I want to see the updated feeds instantly when I get to the page, not have to wait each time for a new page, then figure out for myself what's been updated. I probably won't set off my own thread, but use something like Quartz to manage it - and I can call that from a Servlet on startup, which would be good. The great thing about Quartz is that it's meant to manage thousands of individual threads efficiently - so I could write a very basic task that gets a single feed, analyzes it for changes, and updates a db, and hopefully let Quartz manage the rest of the pain.
We'll see how it works out. Are there any server-side Java based aggregators out there that I don't know of yet?