Siri, and the Conversational User Interface


Years ago while living in Spain I had what I like to call my great epiphany. I was using the first true smartphone - the Nokia 7650 - to chat with a bunch of pals via IRC while shopping for a new laptop. The people I was chatting with (the original Mobitopians) were located literally around the world - in the UK, the US, New Zealand - even Madagascar if I remember correctly. And they were all helping me price check and compare models and features as I wandered around the store looking for just the right purchase.

That one experience lead me to see the future of mobility - way back in 2002 - of being constantly connected to my social network while mobile, having instant access to the web everywhere, of having the power of a connected PC at all times. It was an astounding revelation at the time, and it lead me along the career path I've had for the past decade.

As we've seen - mostly since the launch of the iPhone, it has to be said - this experience I had almost a decade ago, has become reality for millions of people. For a while, I was like this raving lunatic talking about how our phones were going to have all this incredible functionality and we were going to all be connected everywhere, and have instant access to information and entertainment at our fingertips. But I swear, people thought I was nuts. I was invited to speak at the first Web 2.0 event in San Francisco, and was literally ridiculed by Jeff Jarvis and most of the audience for suggesting that our mobile phone would become our iPod and more.

No, seriously. I saved the recording...

Hey, whatever. It's not like I was the most eloquent speaker on the topic, but the point is that the amazing experience I had back then - the stuff that seemed so incredibly futuristic - has become literally commonplace today. (In fact, the IRC client I was using then - Wireless IRC - was the precursor to Gravity, one of the most popular apps on the Symbian platform.)

Well, it's all commonplace today except for one bit - and that was the "Conversational User Interface" that I was using.

You see, back when I was looking for that laptop, I wasn't using an "app" to compare prices, or browsing the web myself, I was interacting with others who were doing that for me. It happened, as you would expect, via a conversation. I would ask a question, or post a brand name and product number, and my friends would quickly type that info into a search, scan for the relevant bits, and write back a summary of their findings and opinions. It was incredibly efficient from my mobile perspective (obviously, someone else doing all the work), and yet, it's something that I've yet to see duplicated in any real way since then. I remember thinking very clearly that this would be an incredibly powerful way to interact directly with an information service itself, without the people in the middle...

Obviously, I'm not the only one that's thought that. Talking to a computer has been something that's been imagined since before HAL scared the crap out of people in 2001 (the movie, not the year). There have been plenty of IRC bots, IM bots, Twitter bots and more that could monitor conversations and respond to commands. But the bots, for the most part, have been pretty dumb. Even adding people back, there's been ChaCha and the various SMS answer services, but they're not as useful as you'd hope. YubNub tried to be the command-line for the Internet, but never reached any traction. Then there was Google Wave, and the bots there that could present themselves as embedded widgets in the conversation stream, rather than simply blurting out chunks of text. That part of Wave was actually quite powerful, but was surrounded by such an insane amount of crud that it made the whole service bewildering. I'm sure there's tons of other examples that I just am not thinking of right now.

In short, the conversational user interface has been this holy grail of sorts in computer science.

Tomorrow with the launch of the iPhone 4s (may Steve R.I.P.), we're going to get Siri which may just be what we've all been been waiting for. Check out this video from 2009 from the creators of Siri - it starts out with one of my favorite future-looking videos: Apple's Knowledge Navigator and then goes on to demo Siri as it was then, with a textual conversational user interface. It's pretty amazing this stuff has been sitting on a shelf for the past few years waiting for Apple to take it mainstream, taking all the power of Siri's contextual search capabilities and added voice recognition to make it truly an incredible package.

Apple's Siri does three really cool things: First, the voice recognition is great stuff - being powered by Nuance who's purchased every company that even thinks about voice technologies, this makes sense. Second is the powerful contextual parsing of the user's commands and questions - it feels like IBM's Watson, in a small portable package. And third, and most interesting to me, is the Conversational UI.

Why is it interesting? Well, honestly, there's no way most developers can emulate the voice recognition or the incredible algorithms that must power the contextual language parsing bits of Siri, right? But the conversational UI is something that can be done by anyone. In fact, well before I ever saw Siri or Wave, I created mockups of the same sort of functionality - embedding chunks of HTML into a command line stream with searching, tweeting, adding reminders, etc. I've got notebooks *filled* with all sorts of schemes on how to make a service around this sort of UI. In my pitches to VCs back in 2008 or so, I went on at length about building Mowser out into an intelligent service using this sort of interface as it's base. (Obviously, that didn't work out so well).

This is why it's so great Siri has arrived. I think that Conversational UIs can be incredibly powerful, but so far they've been clunky and awkward, and most users would feel pretty uncomfortable using one. Maybe Siri will change that perception, and open up a whole new paradigm for interacting with our computers - beyond Windows/Mouse and Touch? It's an exciting thought.

The question is, can conversational UIs work without a super powerful contextual engine behind it? And I think the answer is, yes! Why not? To me, the contextual engine is simply a command-line help on steroids. Rather than having to memorize specific commands, keywords or formatting like you normally have to do when typing in a command line, the contextual engine figures out what the hell you're asking by actually understanding the language.

But you know, there's lots of people out there that still happily use a command line daily to get real work done. Could you get to an 80% solution of Siri by simply monitoring for keywords, and having simple parsing rules. Using the numerous amounts of APIs out there to add HTML5-powered widgets directly into the conversation stream, and you've got something pretty interesting. If you think about it, how far is Facebook away from doing this in their homepage stream already?

Okay, time to wrap this long rant up. I really hope that Apple adds Siri to the iPad someday, as I don't use an iPhone (for obvious reasons), as I'd like to have more hands on time with it, but even just knowing that it's out there makes me excited to see what's going to happen next with computing. It seems the days of us talking to our Digital Assistants or Agent Daemons as they guide us through our day is just around the corner, which is pretty exciting stuff to think about.


< Previous         Next >