Jonathan was chatting to me over IM today about memes on Twitter. I happened to be watching the Roomatic stream the other day, and saw the "Little Known Facts" meme spread before my eyes. It was interesting because though later people started linking to the Twitter Search page with that term, at first it was just sort of being passed from person to person, but because I was simply watching "Palin" updates live posted from everyone, I got to see it from above, so to speak, as it happened.
Here it is:
from xml.dom import minidom import sys, time, urllib if len(sys.argv) != 2: print "Please enter a search" raise SystemExit search = sys.argv id = 0 while True: url = "http://search.twitter.com/search.atom?rpp=20&q=%s&since_id=%s" % (search, id) xml = urllib.urlopen(url) doc = minidom.parse(xml) entries = doc.getElementsByTagName("entry") if len(entries) > 0: entries.reverse() for e in entries: title = e.getElementsByTagName("title").firstChild.data pub = e.getElementsByTagName("published").firstChild.data id = e.getElementsByTagName("id").firstChild.data.split(":") name = e.getElementsByTagName("name").firstChild.data.split(" ") print "> " + name + ": " + title + " [" + pub + "]" time.sleep(3)
A few things about that script:
First, it took a lot longer to write than you'd imagine, because I don't know Python that well yet. Most of the examples online use some sort of third party library, which I think stinks for learning about how the basic standard libs work.
Second, the basic standard libs for XML suck. Or rather, just stuck in the DOM/SAX past. Pythonistas need to check out PHP's SimpleXML to see how a nice, clean, usable XML lib should work.
Finally, I actually refactored the above code a couple times to make it smaller once I had figured out what I wanted to do. Originally, I did normal DOM processing of iterating all the elements and checking for NodeTypes, then I went back and cheated and used the minidom's getElementsByTagName() everywhere instead, which made the script cleaner and shorter, but is also sort of a really nasty thing to do, IMHO. Like I said, I didn't want to use a third party lib like feedparser or the JSON stuff which would have made it cleaner. But honestly, for something this simple I really shouldn't need to either.
Python's lack of ending braces still freak me out. I'm starting to get sick of "unexpected indentation" errors already.