the spkydog koop: December 2004 - Software Technology News, Tips, and Discussion

IBM's embedded ViaVoice in Opera's latest browser?

Here's an older article that seems to indicate its IBM's speech technology that we'll find in the recent 8.0 beta from Opera. XHTML+Voice was mentioned in the same breath, I guess we'll have to download it and check it out!

http://news.zdnet.com/2100-3513_22-5178061.html?tag=nl

Speech technology in Opera's new browser

Opera announced a beta release of their next browser for Windows. Among lots of other new features, the browser has ASR and TTS support integrated. I haven't downloaded it myself yet but the press release mentions both command/control like stuff, as well as voice-enabled web content. XHTML + Voice support perhaps? We'll be looking at this later - stay tuned.

Mozilla's Firefox does seem to have stolen the show recently, however on mobile devices you'll have much better luck with Opera.

If you're an Opera user, you might want to check out Rijk van Geijtenbeek's blog. He's with Opera and is pretty active on their news server, and shares interesting tips in his journal.

Authoring Multimodal Web Content

For those of you who insist on being on the cutting edge, Opera has a fairly decent XHTML+Voice tutorial on their site, along with a number of helpful links to related sites. In case you weren't aware, the XHTML+Voice specification (aka X+V) was submitted to the W3C by IBM, Opera and Motorola, and later endorsed by the VoiceXML Forum. It cleanly combines XHTML, XML-Events, and VoiceXML to allow bring spoken interaction to the visual web.

Google Desktop Search Security Flaw?

This is a slight diversion from our topic, but a tidbit worth noting. On a slashdot posting today I learned that a Rice University prof and his two students have discovered a fairly nasty security hole in Google's desktop search tool. Evidently the news was first broken by the NY Times, in an article you can read for yourself:

http://www.nytimes.com/2004/12/20/technology/20flaw.html

For unorganized person's like myself, Google desktop search is quite handy. However, despite all Google's brainpower, I'm wondering if security by obfuscation isn't still a consideration. Afterall, if I can't find something that matters on my own computer, it should be even more difficult for a hacker who gains entry, right?

Hey kids, don't play hooky!

Here's a fun VoiceXML app to read about on a sunny Friday morning! You skip school these days and your Mom and the local sheriff get's a call from a friendly VoiceGenie platform! I wonder if the pre-recorded audio uses a cranky dour voice reminiscent of the attendence "cop" we had to reckon with in my high school days.

Nuance's Platform Play

Despite the positive tone of this recent press release from Nuance, it causes me to wonder just how long it will be before Scansoft gobbles up Nuance! What gives me pause about Nuance is not their core technology, as their ASR/TTS products are as good if not better than their competition. Rather, its their business model that leaves one scratching his head. It seems in recent years, instead of focusing on their core competancy (i.e. speech technology), they've spread themselves rather thin by moving up the food chain and providing a complete turnkey voice platform. Not only has this put them in a position of direct competition with many of their customers, but it raises questions about whether not they will be able to sustain the progress they've made in their core speech technologies in the past and simultaneously establish the requisite ubiquity in the marketplace they'll need as the technology becomes commoditized.

Scansoft on the other hand, at least in terms of network-based, high density speech resources, is focused on the core enabling technologies and seems much more eager to rely on third party platform partners and integrators to help plant their technology everywhere. As a consequence, Scansoft's speech server products are much easier for developers to integrate and deploy. Futhermore, Scansoft tends to be much quicker to embrace open and emerging standards, such as MRCP, and ETSI Aurora distributed speech recognition, not to mention solid support on a wider variety of computing platforms, including Linux.

In a certain sense, the situation is somewhat similar to the classic Apple vs. Microsoft situation in the PC industry. In the early days of the PC industry Microsoft focused on the software and leveraged its relationship with "platform providers" to gain ubiquity. Apple on the otherhand, focused on being all things to everybody by offering an complete turnkey platform approach, and ultimately ended up seriously marginalized.

Mixing DTMF and voice input

A lot of the call center application's you deal with today, still utilize DTMF input. With the advent of VoiceXML, this is starting to change, and speech input is becoming pretty common. The interesting thing is that a lot of apps that use speech, also mix in DTMF input. While this makes sense in a use case where you need to be 100% certain of the user's intention (i.e. "Press 1 to confirm that you wish to sell 1000 shares of Microsoft..."), or as a fallback input mode when the speech recognizer generates a number of consecutive rejections or low confidence scores (i.e. user is in a noisy environment), in many cases, applications seem to mix DTMF and speech for no apparent reason!

For example, take a look at Hey Anita's handy Rapid Messaging Service demo on their website. This is a fanastic use of voice technology accomplish a fairly ubiquitous mobile messaging service. But note how the UI is all speech-based until it asks you to confirm your recorded message and it specifically asks for the user to press 1. Why is this? Worst case, the VoiceXML builtin type boolean could of been used at this point so the user could say yes, or press 1.

The point is not being made here that the UI should be entirely voice in order to be effective. Its obvious to us by now that not all tasks a UI needs to be able to support are created equal. Afterall, multimodal user interfaces are getting a lot of attention in the industry at present. Rather, the point is why do so many telephony based IVR apps mix DTMF and speech in such unnatural ways?

On a somewhat related topic, I was recently trying to teach my mother (who has never owned a computer, and wouldn't know the Internet from a fish net!) how do use the WAP browser on her mobile phone to retrieve travel and weather info. Bad idea!

In a sudden burst of creativity, I simply entered the phone number of the Tellme portal (800-555-TELL in her mobile phone's address book, dialed it and handed her the phone. Within seconds she was using speech to navigate the portal and browse the content she was looking for, and has been a happy user ever since. On the other hand, my mother is one of those people who opt for a live operator first chance she gets when she calls a traditional DTMF-based IVR application. It takes her a while to find the correct key to press on her key pad, and she is quickly frustrated after a menu or two.

Speech technology, properly applied, greatly enhances the human/machine interaction. It puzzles me when I run into apps that break the paradigm, for no obvious reason. Thoughts?

Improving Customer Service with Speech Technology

In an article appearing today on the Communication Convergence site, Brian Garr of IBM Pervasive Computing provides some insightful recommendations to enterprises looking to use speech technology to improve customer service:

Don't rebuild when you can extend.

Plan to scale the business; don't paint yourself into a corner by locking into a proprietary system.

Use open standards.

Unify your customer experience across all horizontal touch points.

Recognize the value of conversational access and Natural Language Understanding technologies.

Its actually an interesting exercise to constrast the W3C's VoiceXML to Microsoft's SALT in the context of these recommendations.

First, in order extend something it has to exist. A survey conducted earlier this year by the VoiceXML Forum, indicates very little activity in terms of deployed SALT applications, and significant numbers of deployed VoiceXML apps. Before you dismiss this as merely the VoiceXML Forum's slant, its been shown elsewhere that a large proportion of Microsoft's SALT initiative are active members of the VoiceXML community.

While SALT has been submitted to the W3C for consideration, it is not officially a standard (any company with W3C membership can propose anything it wants). VoiceXML 2.0 emerged this year as an official W3C recommendation (i.e. a standard in W3C vernacular). Neverthless, one could argue that SALT is not proprietary, in that the specification has been published, and there is a vehicle in place (i.e. SALT Forum)for nurturing the technology. Besides the fact that this vehicle has apparently ran out of gas over a year ago, the crux of the matter is that SALT is based on a few simple constructs and non-trivial application development will require substantial tooling. It is clear where these tools are coming from. Not only are the tools proprietary, but they are also only supported on the tool vendor's proprietary platform.

Beneficial Voice App?

A colleague recently forwarded me a copy of some unsolicited junk mail from Microsoft inviting the reader to evaluate the Microsoft Speech Server. The letter was signed by a corporate vice president, and along with the typical sales hubris, had the following somewhat odd statement:

"Consider the benefits your business might accrue with one of the following applications: [...] remote officer access to law enforcement suspect database."

I suppose it depends what kind of business you are in, heh? ;-)

Microsoft Finally Embraces VoiceXML!

Most of you probably are aware that the rather clever Talk to Santa VoiceXML application is back this holiday season. But did you notice what portal is sponsoring it, or at least featuring it on their site? That's right, you are not seeing things - its MSN Canada!

the spkydog koop

Friday, December 24, 2004

IBM's embedded ViaVoice in Opera's latest browser?

Speech technology in Opera's new browser

Tuesday, December 21, 2004

Authoring Multimodal Web Content

Monday, December 20, 2004

Google Desktop Search Security Flaw?

Friday, December 17, 2004

Hey kids, don't play hooky!

Thursday, December 16, 2004

Nuance's Platform Play

Wednesday, December 15, 2004

Mixing DTMF and voice input

Thursday, December 09, 2004

Improving Customer Service with Speech Technology

Wednesday, December 08, 2004

Beneficial Voice App?

Microsoft Finally Embraces VoiceXML!

About Me

Previous Posts

Archives

Meta: