the spkydog koop: Mixing DTMF and voice input - Software Technology News, Tips, and Discussion

A lot of the call center application's you deal with today, still utilize DTMF input. With the advent of VoiceXML, this is starting to change, and speech input is becoming pretty common. The interesting thing is that a lot of apps that use speech, also mix in DTMF input. While this makes sense in a use case where you need to be 100% certain of the user's intention (i.e. "Press 1 to confirm that you wish to sell 1000 shares of Microsoft..."), or as a fallback input mode when the speech recognizer generates a number of consecutive rejections or low confidence scores (i.e. user is in a noisy environment), in many cases, applications seem to mix DTMF and speech for no apparent reason!

For example, take a look at Hey Anita's handy Rapid Messaging Service demo on their website. This is a fanastic use of voice technology accomplish a fairly ubiquitous mobile messaging service. But note how the UI is all speech-based until it asks you to confirm your recorded message and it specifically asks for the user to press 1. Why is this? Worst case, the VoiceXML builtin type boolean could of been used at this point so the user could say yes, or press 1.

The point is not being made here that the UI should be entirely voice in order to be effective. Its obvious to us by now that not all tasks a UI needs to be able to support are created equal. Afterall, multimodal user interfaces are getting a lot of attention in the industry at present. Rather, the point is why do so many telephony based IVR apps mix DTMF and speech in such unnatural ways?

On a somewhat related topic, I was recently trying to teach my mother (who has never owned a computer, and wouldn't know the Internet from a fish net!) how do use the WAP browser on her mobile phone to retrieve travel and weather info. Bad idea!

In a sudden burst of creativity, I simply entered the phone number of the Tellme portal (800-555-TELL in her mobile phone's address book, dialed it and handed her the phone. Within seconds she was using speech to navigate the portal and browse the content she was looking for, and has been a happy user ever since. On the other hand, my mother is one of those people who opt for a live operator first chance she gets when she calls a traditional DTMF-based IVR application. It takes her a while to find the correct key to press on her key pad, and she is quickly frustrated after a menu or two.

Speech technology, properly applied, greatly enhances the human/machine interaction. It puzzles me when I run into apps that break the paradigm, for no obvious reason. Thoughts?

the spkydog koop

Wednesday, December 15, 2004

Mixing DTMF and voice input

0 Comments:

About Me

Previous Posts

Meta: