<body><script type="text/javascript"> function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener('load', function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } </script> <div id="navbar-iframe-container"></div> <script type="text/javascript" src="https://apis.google.com/js/platform.js"></script> <script type="text/javascript"> gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() { if (gapi.iframes && gapi.iframes.getContext) { gapi.iframes.getContext().openChild({ url: 'https://www.blogger.com/navbar.g?targetBlogID\x3d9519466\x26blogName\x3dthe+spkydog+koop\x26publishMode\x3dPUBLISH_MODE_BLOGSPOT\x26navbarType\x3dTAN\x26layoutType\x3dCLASSIC\x26searchRoot\x3dhttps://spkydog.blogspot.com/search\x26blogLocale\x3den_US\x26v\x3d2\x26homepageUrl\x3dhttp://spkydog.blogspot.com/\x26vt\x3d-4534400202552370894', where: document.getElementById("navbar-iframe-container"), id: "navbar-iframe" }); } }); </script>

Tuesday, November 22, 2005

The IVR Cheat Sheet

Today's Wall Street Journal had a brief article about Paul English's IVR Cheat Sheet. The IVR Cheat Sheet is a list of phone numbers and tips on how to bypass the IVR applications of numerous well known companies and speak with a human call attendent.

There is some interesting usability data in this list. For example, according to the data being reported in the industry rags, the use of speech recognition greatly increases the odds of not needing to get a human call agent involved in the call when compared to traditional IVR apps using DTMF. Out of the 111 companies in the list, the tips associated with 17 of them involve speech recognition.

The list also could serve as a case study in a human factors text book. Here are some of the more interesting (and entertaining) antipatterns. These examples range from navigating poorly designed call flows, to applying some rather clever social engineering.

Ikea 800 434-IKEA "0000000 (hit ""0"" many times fast, if you do it once, or too slow, it will merely repeat the menu)"

Sears 800-4-MY-HOME Silence don't push numbers just sit there and you will be placed at front of queue.

USPS 800-275-8777 7-3-2 or send them some junk mail

SBC 800-585-7928 Again, an (intelligent, this time) IVR wants YOUR phone number first.

Verizon DSL 800 567 6789 "Say ""I don't know it"" then ""technician"""

Cingular 800-331-0500 For faster service, the option that you are looking to close your account, You get the same ppl but an immediate answer

As consumers become more savvy with regard to this sort of information (I've come across this list several times in the past few months, prior to reading about it in the Journal this morning) another interesting exercise would be to identify which applications do not make it to this list and others similar to it? Presumably, such applications are so easy to use that callers accomplish what they set out to do with no need to talk to a live call attendent. For example, how many of the apps appearing in this list are TellMe clients? I have a hunch, the answer is very few.

Wednesday, November 16, 2005

Multimodal-Enabling the Web: The Secret Sauce...

IBM's Igor Jablokov came close to hitting the bullseye in his response to a "what's the secret sauce?" question in an interview on IBM's work in multimodal-enabling the web. Igor replied that: "everything is ready". That is, there is a great markup for visual markup (XHTML) and a great markup for voice (VoiceXML) and the glue needed to tie them together (XML Events).

Igor's not talking about vaporware here of course, as the Windows version of Opera 8 does in fact support XHTML+Voice, thanks to IBM's enablers. So we've got Opera 8 installed and are brimming with enthusiasm. The question for Igor is, can you post the URL to the X+V "where's the nearest starbucks?" app so we all can try it out?

That's always the tough question - how do you attract a critical mass of web content developers to cool stuff like this? Whomever has the right answer to that question has the recipe for the secret sauce.

Read the the whole interview.

Tuesday, November 15, 2005

Talking Printers?

Well, not exactly. While Epson did recently announce this multilingual text-to-speech synthesis chip, its intended for applications such as mobile devices, accessibility devices, toys, etc. Embedded TTS actually has a fair amount of utility on portable devices, where situational impairments often constrain the user's ability to squint at tiny screen. Its also easier to implement than ASR on a resource constrained platform.

Monday, November 14, 2005

Talk to your Mac!

We recently posted about the rather seamless integration of speech recognition into Microsoft's Vista. While spkydog is not himself a Mac hack, for the sake of even and balanced reporting its worthwhile noting that Apple's OS X also has speech recognition capabilities.


While we don't have any direct experience with speech reco on OS X, it does look like like it takes a bit of manual labor to get going, which is rather uncharacteristic for anything Apple! Perhaps some of you who are regular Mac users can let us know how well it works?

Read the OS X tip.

Friday, November 11, 2005

Startup Totes Speech Recognizer "Second to None"

Is speech technology the sause needed for making search truly ubiquitous? You can look at this from a couple of angles. First, doing search from mobile devices is obviously fairly interesting, especially when you throw location into the mix. However, even with querty keyboards it is rather tedious to enter keywords. Speech recognition is one way to ease this constraint. Another angle to consider is the fact that more and more content being served up these days is not simple text, but audio (i.e. think podcasts!) as well as video. Using speech recognition to index multimedia content is thus another interesting prospect.

We've blogged on the blinkx service in the past. SimonSays Voice Technologies Inc. is a two man startup in Toronto who claims to have speaker independent speech recognition technology that can transcribe audio with 98% accuracy, with of course the rather vague qualifier "under good audio conditions". Its hard to imagine two guys working spare time could generate technology that outperforms the stuff Scansoft/Nuance, Microsoft, and IBM have been investing spending millions on for decades, employing some of the most brilliant speech researchers in the world. Nevertheless, using speech recognition to index multimedia content is indeed an interesting problem to be working on these days, and whomever does crack this nut in a decisive way stands to profit greatly. There's plenty of evidence that the search guys (Google and Yahoo) have been thinking about this as well, but we'll save that for another time.

Read the article.

Monday, November 07, 2005

POTS Death Rattle: The Dinosaurs Strike Back!

If it isn't the Chinese government, the EU, and others of their ilk trying to wrest away the Internet from the hands of the USA and into the rather incompetent hands of the United Nations (its getting hard for the commies to block those Google cache links ya know... ) its entities such as the Saudi state-owned telecom dinosaur trying to frustrate users of Skype and other discount VoIP services.

The IEEE Spectrum magazine recently ran an interesting article about the technology of a California company by the name of Narus, Inc. that operators can use to scan packets and identify VoIP sessions and block or otherwise frustrate the users, with the goal of course of to use the operators legacy circuit based services or VoIP services that will inevitably be light years behind the services offered by Skype, Vonage and other VoIP pioneers.

Actually its not just telecoms in mideast countries that are trying to frantically strike back against the rising tide of successful VoIP service providers, who threaten to eat their lunch. According to the article, Vodafone has announced plans to block VoIP traffic in Germany. While there appears to be no evidence they are doing so, broadband providers in the USA can legally block VoIP traffic as well.

Read the IEEE Spectrum article.

Thursday, November 03, 2005

Review of Microsoft Speech Team's PDC Presentation

Microsoft Speech bloggers have posted links to their speech team's recent PDC presentations. If you're interested in keeping a pulse on what Microsoft is up to in the speech area, its worthwhile to view this material, kindly made available online for everbody's benefit. In particular you'll gain insight into how speech integrates into Vista and WinFx.

In general, Microsoft is doing a reasonably good job at incorporating speech technology into the Vista desktop. There is no doubt this will help push speech technology into the mainstream, as the initial speaker claimed.

The Speech Server presentation though was a real sleeper. Not only were the demos rather poorly executed, the technology itself is stuff most of us in this field were building and shipping in speech server products in the late 90s. Its amazing that a company of Microsoft's calibre would have people on stage giving prototype demos of use case scenarios that their competitors in the speech server industry were shipping as products 6+ years ago, not to mention the fact that the technology is long past early adopter stage and well entrenched in the form of mature W3C standards, not to mention market share. In 1 hour 21 minutes of talking about speech technology, if I'm not mistaken, not once did the "V" word get mentioned. Come to think of it, I'm not recalling any mention of the "S" word (as in SALT) either, but than again, its likely if they did say it, the naughty word filter on my XHTML+Voice browser would have beeped it out. ;-)

Robert Brown summarized 3 take-aways at the end of the presentation:

1. Windows Vista is a great speech platform.
2. WinFx has a very powerful speech API built into it.
3. Speech Server is a great way to deliver speech apps to users wherever they are via their "ubiquitous" phone.

We'd have to agree and applaud the first two bullets, though we'd qualify the first by noting that Windows Vista is a great desktop speech platform. Its also cool to see a platform emerge that no doubt will be on desktops everywhere in the future, that has speech APIs designed into it as an integral feature, rather than an add-on API/library that got slapped on after the fact and is not well integrated with the rest of the platform. This will no doubt result in more speech-enabled desktop applications, since developers knows the functionality will always be there on Vista. When speech was an add-on component, developers wouldn't bother using it in their apps as making sure the speech functionality was available on the target platform meant lots of additional work.

As I've already suggested, I have to disagree with the Brown's third take away. Clearly, it cannot be contested that VoiceXML is the reigning champion these days in the rather healthy speech server industry, and until Microsoft Speech Server supports VoiceXML, they'll continue to find themselves rowing up stream against a mighty strong current.

All in all though, its an interesting presentation and worth taking a look at.

View the presentation.

Tuesday, November 01, 2005

Little guy makes noise about XML patents

A small company by the name of Scientigo claims that XML infringes upon two of its patents. The company's CEO claims that after meeting with 47 large software companies that utilize XML in their products, he's confident the patents will command royalties. It seems there are some substantial hurtles Scientigo will need to overcome in terms of prior art, but then again, many of us thought the Eolas patent would not be an issue for Microsoft either. The potential ramifications here are enormous, given XML's widespread use, and that includes our darling VoiceXML!

According to an article in today's Wall Street Journal, on Monday the Supreme Court declined to hear Microsoft's appeal on the Eolas patent lawsuit.

Read the article.