XML Summary Information

Posted Saturday, April 26, 2003 11:45 pm

I'm doing development on my project now and I've decided to use XML documents for most of my data storage for a variety of reasons. Mostly because I'm working with XML in the first place and secondly because I suspect in my application I'm developing now that it will actually add to the scalability of the app. It's my hunch that transformations will be less costly to do and more easily distributed than queries to a normal centralized database. This could be wrong, but I'm going with it. :-)

I'm used to using databases for everything, thus changing paradigms to working with XML documents is presenting a variety of problems. Mostly I'm quite happy with where XML has progressed in the past few years. In 1999 I did a bunch of stuff with XSL and it was A LOT harder and less flexible than it is today thanks to XPath. That generally goes for all the tools. But still there's some mental adjustments that I'm having to make in order to develop that application that I want with XML.

SQL is a wonderful language that I've been working with literally for 10 years. I'd honestly say that SQL is the most underrated language there is. Using it I can grab a subset of large dataset with ease, and then aggregate it in a variety of ways to present useful information to the user. Mimicing this level of flexibility in XML is where I'm starting to run into issues.

When designing an app, I always think in terms of "documents and views" - most likely from my Lotus Notes days. I have an acronym that I use to remind myself of what I need to do for a certain feature in a web app: ALED: Add, List, Edit, Delete. This covers about 95% of the things you do in any data application.

So with XML, doing the document part is easy - since you're starting with an XML document in the first place. I can organize the information in individual documents, or as distinct sub-entries within one document (though this will obviously have file size limits to be practical to use. XML is NOT a database). In a weblog app for example, I could organize the posts as different <weblog> entries in an XML file named by the date, or I could put each post in an XML file named by the timestamp. Either way I still have to figure out how to eventually aggregate them to present them to a user or news aggregator. This is pretty simple with a SQL query, "select * from weblog where data > 04/04/2002" or something similar. But with all these XML documents strewn all over the place, how do I get to that data like this? Additionally, if I wanted to summarize some of this information ("10 comments") getting to that info as well will be different.

I guess I could use a native XML database like Xindice which would allow queries similar to this, but that's centralizing the data which is what I'm trying to avoid. The other option is to create Summary documents as I create the original. The problem with this is that the data could get de-syched, where the original document has been changed, but the summary document no, but I'm thinking this would be similar to transactions and I could set up my back end objects to handle this process as a single transaction. The idea is that you'll have many more readers than writers. So do the processing once when the data is entered or changed, and then all you have to do later is transform the data for presentation.

Anyways, that's what I'm doing now. Thoughts on this would be GREAT. Am I smoking dope? On the right track? Thanks,

-Russ

< Previous Next >