<body><script type="text/javascript"> function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener('load', function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } </script> <div id="navbar-iframe-container"></div> <script type="text/javascript" src="https://apis.google.com/js/platform.js"></script> <script type="text/javascript"> gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() { if (gapi.iframes && gapi.iframes.getContext) { gapi.iframes.getContext().openChild({ url: 'https://www.blogger.com/navbar.g?targetBlogID\x3d9519466\x26blogName\x3dthe+spkydog+koop\x26publishMode\x3dPUBLISH_MODE_BLOGSPOT\x26navbarType\x3dTAN\x26layoutType\x3dCLASSIC\x26searchRoot\x3dhttps://spkydog.blogspot.com/search\x26blogLocale\x3den_US\x26v\x3d2\x26homepageUrl\x3dhttp://spkydog.blogspot.com/\x26vt\x3d-4534400202552370894', where: document.getElementById("navbar-iframe-container"), id: "navbar-iframe" }); } }); </script>

Thursday, January 26, 2006

World's Coolest Free Desktop VoiceXML Dev. Tooling (Voxeo Does it Again!!)

Once again spkydog assures you that those guys at Voxeo are seriously on the cutting edge! If authoring VoiceXML content is your bread and butter, this is likely the announcement you've been waiting years for... Voxeo just made a sneak preview of their new beta Prophecy 2006 tools, available for free download.

So what's Prophecy 2006? Well, free online VoiceXML development studios such as Tellme, Voxeo, etc., have been around for quite a while and thousands of developers depend on them. This meat and potatoes kind of stuff for VoiceXML developers, necessary but not real exciting. There is a good reason this hosting model for VoiceXML emerged in the industry. Speech resources and telephony interfaces (hardware and software) are typically costly and complex to install, configure, maintain, etc. So most developers simply let smart guys like those at Voxeo and Tellme take care of the gory details and simply sit at their desks and hack up web apps that serve up VoiceXML markup, and use their softphones (or toll free calls over PSTN) to test their apps hosted on one of these online developer studios. This was actually key motivation behind VoiceXML to begin with, i.e. make IVR apps as portable and easy to develop as web apps.

So getting back to Prophecy 2006, what tons of developers have been looking for for is a simple free download that lets them get a 1-2 port VoiceXML server up and running on their desktop without any special telephony hardware, and no need to buy speech ports from Nuance at thousands of dollars a pop. Some years ago, Motorola released the first such free tool kit (MADK) but it only supported a VoiceXML predecesser VoxML (and maybe an early flavor of VoiceXML) and had a rather flakey interface based on the Microsoft Agent technology. Since then, similar tools have appeared, but they are either very expensive, or very limited (no real speech resources, just simulated textual input/output).

Voxeo's Prophecy platform is what you've been waiting for - a complete fully functional 2 port VoiceXML platform and softphone running on your desktop - all available for free download. If you need more ports, Voxeo indicates they will have very low cost turnkey solutions available.

The Prophecy beta comes in two flavors, with a TTS engine (and a very sweet sounding one at that!) or without. I tried the "without" version first but seemed to have problems getting it going with the MSFT SAPI TTS engine. I then grabbed the "TTS included" version and less than 3 minutes after the download completed (not bad, 300MB or so) I was happily talking to my very own VoiceXML interpreter. It worked like a charm, right out of the box. This is actually the key point here. With Prophecy, Voxeo has substantially lowered the bar when it comes to running your own local VoiceXML platform, from both the technical and economic perspective. The closest you could of gotten to this in the past is to either buy a turnkey system (not cheap) or try to assemble your own from the various open source components that are available (Open VXI, etc.) This is truly an industry first, and one that will without doubt stir up a lot of interest among the VoiceXML developer community.

In addition to the 100% conforming VoiceXML interpreter (the legendary Motorola VoxGateway), Prophecy includes support for CCXML and an app server with PSP, JSP and servlets. And, it probably goes without saying, Voxeo will be more than happy to host those cool apps you develop with Prophecy when the masses start to dial in, or sell you a nice say to use turnkey solution for your enterprise.

Don't look any further, download your copy now: http://www.voxeo.com/prophecy2006/, and spread the word!

Wednesday, January 25, 2006

IBM's "Superhuman" Speech Initiative


A 3 year old child is typically a better speech recognizer than the most powerful computer(s) running the most advanced speech algorithms. IBM is striving to change that though. A PC Magazine article published today quotes IBM's David Nahamoo saying IBM's goal is to achieve ASR performance to the level of human beings within the next 5 years.

These sorts of goals have been made before, and billions of dollars spent since, but it has yet to happen. Skepticism aside, conventional speech recognition performance today is pretty decent (thanks to cheap memory and fast processors) when the vocabulary is restricted to a particular domain or context.

IBM's "Tales" project, also described in the article involves an area we described as "hot" in a recent post - automatically recognizing human to human speech. A good solution to this particular problem could be quite useful in building searchable indexes of the reams of audio/video content that is produced. The IBM system can currently process television audio with 60-70% accuracy, but not in real-time. It takes around 4 minutes to deliver results at this level, and performance as high as 80% could be achieved with more processing (i.e. higher delay)

(Note: IBM's Les Wilson in the photo.)

Tuesday, January 24, 2006

W3C Voice Browser Working Group Publishes SCXML Draft

Here's a copy of the announcement Jim Larson emailed out this morning:

W3C releases working draft of State Chart XML (SCXML)

This working draft document describes State Chart XML (SCXML), is a general-purpose event-based state machine language that can be used in many ways, including:

  • As a higher-level dialog language controlling VoiceXML 3.0's encapsulated speech modules (voice form, voice picklist, etc.)
  • As a voice application metalanguage, where in addition to VoiceXML 3.0 functionality, it may also control database access and business logic modules.
  • As a multimodal control language in the MultiModal Interaction framework [W3C MMI], combining VoiceXML 3.0 dialogs with dialogs in other modalities including keyboard and mouse, ink, vision, haptics, etc. It may also control combined modalities such as lipreading (combined speech recognition and vision) speech input with keyboard as fallback, and multiple keyboards for multi-user editing.
  • As the state machine framework for a future version of CCXML.
  • As an extended call center managment language, combining CCXML call control functionality with computer-telephony integration for call centers that integrate telephone calls with computer screen pops, as well as other types of message exchange such as chats, instant messaging, etc.
  • As a general process control language in other contexts not involving speech processing.

You can review SCXML at http://www.w3.org/TR/scxml/ Comments for this specification are welcomed to www-voice@w3.org (archives).

Monday, January 23, 2006

Bluetooth Headset Optimized for Speech Recognition


If you've grown frustrated with speech recognition performance using your bluetooth headset, you might want to take a look at VXI Corporation's recently announced BlueParrott TalkPro headset. This product equipped with a quality noise canceling microphone are supposed to deliver consistence performance, though I have to first hand experience with the product myself. The price tag is a bit higher than what most of us are likely to pay for a bluetooth headset though. My $50/Motorola bluetooth headset works fine, with acceptable recognition performance... at least for the phone's embedded speaker dependent recognizer.

If anybody has any first hand experience with the BlueParrott product, please let us know what you think.

Friday, January 06, 2006

Speech Technology at CES 2006



There were a number of products utilizing speech technology at the International CES show this week in Las Vegas. While there were no cool demos of the speech technology in Windows Vista during Gate's keynote, he did mention speech recognition in the last couple of minutes of his speech.

Using speech recognition to find and browse your music was one particular interesting theme. At last year's CES, Gracenote and Scansoft (now Nuance) announced a partnership to speech-enable portable music players, home entertainment systems and automobile sound systems. This year Gracenote announced a music recommendation system called Gracenote Discover that helps consumers find and discover music that fit their personal tastes. In addition, Gracenote announced a speech recognition solution called MediaVOCS for hands-free control of your media collection and Playlist Plus for content recognition and auto creation of playlists on portable devices. Read the press release.

VoiceBox demonstrated a speech-enabled XM Satellite radio. Drivers can use the this technology to surf oodles of satellite radio channels hands-free while driving. A video demonstration is available on www.voicebox.com.

Korean mobile phone manufacturer Pantech introduced two DMB (Digital Multimedia Broadcasting) phones that among other things, are equipped with an embedded text-to-speech engine that can be used to read SMS messages when the user is situationally (or permanently) impaired.

Magellan's RoadMate 760 GPS Navigation system (see photo above) actually wonethe CES 2006 Innovations Award. The device introduces a number of industry firsts to this category, and addition introduces SayWhere, a fairly sweet sounding female TTS voice that clearly articulates upcoming street names to you.

Let us know if we've missed anything worth noting!

Wednesday, January 04, 2006

Google caught up voice services lawsuit

Rates Technology Inc has filed suit against Google for infringing on its VoIP patents. Damages from the suite could reach as high as $5B depending on how long litigation takes and how well the VoIP market does during that time.

Chicago patent attorney, Peter Zura has posted an interesting analysis of the actual IP claims of the patents in question. Some interesting background on RTI is revealed in an April 2005 posting by TMC's Rich Tehrani.

In short, this sort of smells like another classic IP case involving a company that provides no apparent value add to society in terms of goods and services, but rather, causes those that do add value to spend (or forfeit) inordinate amounts of resources protecting the businesses they have created.

Read the news article.

Tuesday, January 03, 2006

Opera Mini to Include Google Search

Though Opera denies the rumors floating about that it is about to be acquired by Google (or is it Microsoft?), they have recently announced that they will be including Google search in Opera Mini. Google will also become the default search engine for Opera Mobile, the smartphone version of the Opera browser.

Opera Mini is essentially a free J2ME client that relies on a server in the network to preprocess webpages for rendering on small mobile screens. Its sort of an alternative to Opera Mobile that is less of a bandwidth and resource hog, and in theory should run on a large number of J2ME handsets already in use.

Read the article.

X+V demo on ABC News

IBM's Igor Jablokov recently demonstrated an X+V-enabled mobile phone on ABC News. Fabulous interview Igor! Congrats.

Watch the demo.