Twitter Search tracker in Python

Posted Monday, September 1, 2008 5:53 pm

Jonathan was chatting to me over IM today about memes on Twitter. I happened to be watching the Roomatic stream the other day, and saw the "Little Known Facts" meme spread before my eyes. It was interesting because though later people started linking to the Twitter Search page with that term, at first it was just sort of being passed from person to person, but because I was simply watching "Palin" updates live posted from everyone, I got to see it from above, so to speak, as it happened.

Anyways, the Roomatic stuff is written in JavaScript and trapped in a browser, so I thought it would be nice to convert it into Python so that I could re-use it if I wanted to. It's not difficult, especially because the original Summize API had a "since_id" param that you can pass to it which restricts the results to only the updates that haven't been returned yet, so you never have to worry about duplicates. This makes the script brain-dead simple to write.

Here it is:

from xml.dom import minidom
import sys, time, urllib

if len(sys.argv) != 2:
    print "Please enter a search"
    raise SystemExit

search = sys.argv[1]

id = 0

while True:  

    url = "http://search.twitter.com/search.atom?rpp=20&q=%s&since_id=%s" % (search, id)

    xml = urllib.urlopen(url)

    doc = minidom.parse(xml)

    entries = doc.getElementsByTagName("entry")

    if len(entries) > 0:

        entries.reverse()

        for e in entries:

            title = e.getElementsByTagName("title")[0].firstChild.data
            pub = e.getElementsByTagName("published")[0].firstChild.data       
            id = e.getElementsByTagName("id")[0].firstChild.data.split(":")[2]
            name = e.getElementsByTagName("name")[0].firstChild.data.split(" ")[0]

            print "> " + name + ": " + title + " [" + pub + "]"

    time.sleep(3)

A few things about that script:

First, it took a lot longer to write than you'd imagine, because I don't know Python that well yet. Most of the examples online use some sort of third party library, which I think stinks for learning about how the basic standard libs work.

Second, the basic standard libs for XML suck. Or rather, just stuck in the DOM/SAX past. Pythonistas need to check out PHP's SimpleXML to see how a nice, clean, usable XML lib should work.

Finally, I actually refactored the above code a couple times to make it smaller once I had figured out what I wanted to do. Originally, I did normal DOM processing of iterating all the elements and checking for NodeTypes, then I went back and cheated and used the minidom's getElementsByTagName() everywhere instead, which made the script cleaner and shorter, but is also sort of a really nasty thing to do, IMHO. Like I said, I didn't want to use a third party lib like feedparser or the JSON stuff which would have made it cleaner. But honestly, for something this simple I really shouldn't need to either.

Python's lack of ending braces still freak me out. I'm starting to get sick of "unexpected indentation" errors already.

Enjoy!

-Russ

< Previous Next >