StAX: XML Pull API

Cool beans! I just saw this article by Elliot Rusty Harold about StAX (Streaming API for XML), a new "Pull API" for XML created in the Java Community Process. I didn't have any idea that this was being worked on, and I think it's great!

In case you're not down with XML Pull, I'm not surprised as I only started working with it in May when I was creating my Jabber app for J2ME which used the kXML pull-based parser. Here's Rusty's overview:

A pull API is based around the more familiar iterator design pattern rather than the less well-known observer design pattern. In a pull API, the client program asks the parser for the next piece of information rather than the parser telling the client program when the next datum is available. ... StAX shares with SAX the ability to read arbitrarily large documents. However, in StAX the application is in control rather than the parser. The application tells the parser when it wants to receive the next data chunk rather than the parser telling the client when the next chunk of data is ready. Furthermore, StAX exceeds SAX by allowing programs to both read existing XML documents and create new ones. Unlike SAX, StAX is a bidirectional API.

It's a super-intuitive way to deal with XML documents. I really liked how the kXML parser worked when I was using it and have always balked at SAX because of the pain in writing all those methods to check for different events as the SAX parser moves through the document. XML Pull just feels so much cleaner to use. But I thought it was a niche API/methodology that would never get farther than J2ME. I'm happy to see that works is being done to make it more main stream.

In case you're still not grokking XML Pull, it's very simple - it works just like any Iterator you use in Java or common classes like the JDBC ResultSet. Using StAX, you open an XMLStreamReader with the document you want to parse, and then step through the XML tags by calling next() in a loop. In a JDBC ResultSet, next() will move your cursor to the next row, in StAX, next() moves you to the next tag. You then basically write a big if statement which examines the tag under the cursor and do something with that data. It works phenomenally well.

It's actually a *bit* more complicated because the parser actually moves through all the different types of nodes, which include attributes, CDATA, etc. But it really doesn't take very long to understand how it works and push through any problems. If you need random access to a document, then obviously a DOM based API will be better, but XML Pull is fantastic for when you're using XML as a messaging medium: Think XML-RPC, SOAP or Jabber (XMPP) where you have short XML documents that have specific, ordered data and commands that you need to process. Or to me, I probably will think about using it in place of SAX for when I need raw speed access to XML docs but don't feel like messing with SAX (which I think is a PITA). Can't wait until the Apache group's XML stuff gets support for it!

Neato.

-Russ

< Previous         Next >