Voice Typing

Posted Tuesday, July 31, 2012 12:53 pm

[image]

I wonder if it's possible to write an entire blog post just by using voice recognition? I think that trying to write something while speaking doesn't seem to work very well as there's something about the process of typing out the words and being able to edit as you go which makes the writing process more clear and smooth. However it's amazing how quickly you can enter text on a tablet - as I am doing right now simply by speaking into it. I am able to type quite quickly on a virtual keyboard, however I'm honestly not sure I would be able to type as quickly as I am now speaking. The recognition of my words in quite accurate and it allows me to not worry as much about whether or not I'm being recorded correctly. Android's new system presents to text as you say it, which is very interesting, though it can cause a pile up of problems if you notice a mistake while you're speaking. I think that in general it's nice to be able to see the progress of what you're writing as you go. I wish it did have a few more commands like new paragraph or quotes but in general it's able to quite quickly and cleanly capture everything I am saying.

I personally find it very hard to dictate what I am trying to write. First hearing my own voice makes the process very strange. Also I think that after so long of thinking through my fingers having to think while speaking but in a way that is spoken clearly is unnatural in almost every way. Maybe [I'll get used to it] as I continue to use the functionality especially since I can imagine that it will only get more and more accurate as the years go by. I can imagine the days when people considered typing as a fast way to write a paragraph or story or article will be a thing of the past. I'm not sure when that day will be but like regular books have been quickly replaced by ebooks - at least when it comes to online book sales - I think we'll be surprised at how quickly people adapt.

Doctors and other professionals have been doing things with dictation for years so maybe it's just something that you have to get used to. However right now because technology needs you to speak clearly and cleanly as well as pronouncing everything as best as possible and speaking closely to the microphone it makes the process much less like you're speaking to someone who is dictating your words and more like you are trying to *type* with your voice, as strange as that may seem. I'll have to go back and correct mistakes. Though there's not many, there are things that need to be better explained as I stumble through what I'm trying to say using my voice and obviously I have to add in most of the punctuation.

I think that would be very cool if you were able to have the microphone stay on for longer so that you can simply say a word as you are correcting something and have that word just pop into the text where you are pointing. The whole process would be very multi modal - almost as if you were talking to someone and you needed to add a new word, pointed to screen and a person types out what you said. You could also do this with a regular computer - you normally need to move your hand back and forth between mouse and keyboard, being able to just point and click, then speak whatever it is you are you want to add is very interesting I think.

As I continue to test this out right now, I'm becoming more comfortable as if I'm talking to another person. Maybe it's about just simply getting used to the idea of not having to type out every single thing I say. Just like we had to get used to using a mouse instead of only using a keyboard in order to interact with a computer and then just recently learning how to use a touch screen rather than physical buttons. People say, "Well I need a mouse for various reasons," and then just a couple years later they realize they can live without a mouse. I'm finding my voice is becoming less staccato and more sort of natural as if I were speaking slowly and clearly to someone in another room or maybe in another language. I'm also pretty impressed with how quickly a significant quantity of text is appearing on the page. Remember I am doing this all with my Android tablet, not with a keyboard. Considering the amount of text that I've done in just a very short amount of time - maybe 10 minutes or 15 minutes - it's quite an impressive way to produce a significant amount of text. You could almost write a book like that - just think about story and start talking.

It was Mark Twain who said, "I wanted to write you a short letter but I don't have time so I'll write you a long one instead," so obviously just being able to type a lot of text isn't really the ultimate goal for voice recognition. I'm just saying simply that it's quite amazing how much I've been able to put down on a page without typing a single letter. Now like I said, before I post this I'm going to go through an edit the text to make it clean and make it readable but that's almost a good thing as well because it's a process I would normally go through. The question in my mind is how long and how hard it will be to make this seem like a post that is it worth reading that has the quality of the written word, the turn of phrase or the play on words that might not normally be used when speaking, but is more common in text. Also when speaking I tend to say something that rambles on and on. Where as what I type I try to normally vary the pace and the length of the sentence to make sure that the reader isn't bored and that a paragraph is compelling. That's something I like to see when I read text, and something I like to do I write it.

Again, the idea of voice recognition helping to make sure you don't have to take your hands off the mouse or keyboard is very intriguing to me. I wonder what it would be like to have a programming editor that was voice capable. Imagine being able to say what type of class or what type of function you were creating or looking for and have it automatically be inserted at the point where your command line is. It might be kind of weird to have your computer listening all the time while you are coding and might be weird for coworkers to be in the same room or the same area as you as you shout out random cryptic words for the computer to enter into a chunk of code, but I can imagine the productivity might be something that is worth thinking about.

I think one mistake that has happened in the past is the conflation of voice recognition with artificial intelligence. Though I'm sure you got there must be quite a bit of fuzzy logic in order to understand my voice, I think of that the idea of the computer understanding everything I've said correctly and every command I give is a completely different problem. Yet they seem to be always combined. It probably goes back to 2001 and how HAL was both artificially intelligent and understood voice. But I think that voice is just another way of entering text. If we have the capability now to understand the text at a high level - which obviously we do considering everything I'm writing now has been spoken and not written or typed - then we can start explorer ways of using voice in different forms than simply giving commands to a computer that we then expect to magically understand. Even though Siri is amazing, it's still not anywhere near the sort of artificial intelligence that we would need in order to be more natural with everything we want from a computer of whatever sort.

Microsoft has tried voice commands with the XBox - the Kinect is always listening for you to say the command "XBox!" and then it will start listening for a basic instructions such as search, previous, next, and things like that. It doesn't really work all that well and it never seems to understand when you say "XBox!" the first time, but the conceptually it seems to be what I'm thinking about in that there is an opportunity to use voice as another way of interacting with computers. But we need a more standardized way of doing it. There could be certain keywords for how we all use voice, so we could do it naturally without thinking about it, just like we might double click on applications to start them now, maybe we could instead point to something and actually *say* start. I can think of a variety of different standard commands that might be useful such as cut, copy, paste or switch applications. The key would be that the interface needs to react instantly, just like it would with a button press or a tap on a touch screen. You can't wait for the system to slowly process the command or to send our voice to a central server and then back because by that point we could have just done it by hand. The other thing is that the commands need to be universal in that I need to be able to sit down at a PC or Mac or an Android tablet or what have you and be able to use the same commands, just as if I were double clicking or pinching to zoom or any of the other standard gestures that way now think of it as normal and expected.

---

Note from a keyboard:

I wrote this the other evening using my Xoom when it updated to Android Jelly Bean, but forgot to post it until now. Using a tablet almost makes it less amazing, but it could have been done just as easily with a phone, since I wasn't actually using the bigger screen in any way. Imagine trying to tap that all out using a virtual keyboard on a phone - it wouldn't be hard or take too long, but I don't think it'd be overly comfortable to do either.

The text didn't take too long to edit, though I definitely needed a keyboard and mouse to do it quickly. I left it mostly as it was as I spoke it - getting rid of the occasional bad translation or odd wording on my part, and adding in punctuation as best as possible to make it read more clearly. That said, if I were going to use this to write something important, well, I wouldn't. I would use it the same way I would normally use a transcription of a speech or interview - I would keep it in another window for reference and the occasional copy/paste, and re-write it from scratch. Probably because I'm an old-school writer who writes via typing.

Think about it though - over the next decade as my kid goes through school and college, I'm sure the technology will only get better, and more deeply integrated into our computing devices, and he will most likely not have the hangup of having to use his fingers to 'write'. That's a pretty amazing thing to consider.

-Russ

< Previous Next >