<body><script type="text/javascript"> function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener('load', function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } </script> <div id="navbar-iframe-container"></div> <script type="text/javascript" src="https://apis.google.com/js/platform.js"></script> <script type="text/javascript"> gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() { if (gapi.iframes && gapi.iframes.getContext) { gapi.iframes.getContext().openChild({ url: 'https://www.blogger.com/navbar.g?targetBlogID\x3d9519466\x26blogName\x3dthe+spkydog+koop\x26publishMode\x3dPUBLISH_MODE_BLOGSPOT\x26navbarType\x3dTAN\x26layoutType\x3dCLASSIC\x26searchRoot\x3dhttps://spkydog.blogspot.com/search\x26blogLocale\x3den_US\x26v\x3d2\x26homepageUrl\x3dhttp://spkydog.blogspot.com/\x26vt\x3d-4534400202552370894', where: document.getElementById("navbar-iframe-container"), id: "navbar-iframe", messageHandlersFilter: gapi.iframes.CROSS_ORIGIN_IFRAMES_FILTER, messageHandlers: { 'blogger-ping': function() {} } }); } }); </script>

Thursday, December 22, 2005

Hacking/Skinning Your Furby

Tuesday I posted on FURBY speech technology. eecue followed up with a link to his blog detailing (with photos) his experience of actually skinning a new FURBY. This was to good to pass up, you have to check out eecue's posting.

Hey eecue, if you or anybody else figures out away to modify your FURBY's speech grammars, please let us know! Imagine customizing your FURBY linguistic capabilities with your own flavor of Furbish.... :-)

Another idea... integrate a Bluetooth radio and setup a J2ME client on your mobile phone so you can send any arbitary English text to the FURBY's onboard TTS engine? So why hasn't Hasbro already thought of this... imagine the fun you could have speaking via your FURBY!

Anybody have any additional zany FURBY application ideas?

Tuesday, December 20, 2005

Furbish (and English) Speaking FURBYs!

You may recall Hasbro's hit toy some years back - the FURBY. You might have also heard the FURBY is back this year. The modern FURBY is outfitted with what HASBRO is calling EMOTO-TRONICs technology. This combination of robotic and speech technology makes the FURBY a lifelike and intelligent toy.

The tagline on Hasbro's FURBY website reads "The new FURBY can speak HUNDREDS of words and understand DOZENS of phrases and commands!" The FURBY is actually a bilingual critter in that it understands both its native language (Furbish) as well as some English. (Versions for German, French, Spanish, Japanese, and Italian are also available.)

While the Hasbro site offers little detail on the Furbish vernacular (besides an interactive demo), don't worry, there are plenty of FURBY fans filling the void. One site in particular offers indepth documentation (and analysis) of Furbish, as well as English-to-Furbish and Furbish-to-English online dictionaries.

The FURBY's motor and linguistic brains are actually powered by Sensory, Inc.'s cutting edge RSC-4128 IC. This single chip provides the FURBY's multilingual speech recognition/synthesis as well as coordinated and complex motor controls, all at a very low price tag. (Furby's retail less than $40 USD.)

What you probably were not aware of is the fact that FURBY toys were once banned from the NSA (National Security Agency) in Maryland, because it was said they could "learn." Evidently inability to learn is one of the prequisites for a job with the NSA. ;-)

Dah doo-ay wah!! (Furbish for "big fun yeah!")

Thursday, December 15, 2005

Google About to Buy A Ticket to the Mobile Web?

Saw a posting on Slashdot today reporting rumors circulated by other bloggers (presumably in the know...) that Google is poised to gobble up Opera Software. Given Opera's reach on mobile devices, if this rumor proves true it could provide Google a convenient vehicle for gaining a toe-hold on the mobile web. It could also give XHTML+Voice some interesting momentum, as speech-enabling mobile XHTML browsers makes a lot of sense, given the usability constraints. Particularly when trying to fill out a form via a phone keypad. Another interesting angle though is that most of the text entry you do for existing Google apps (think search, gmail, etc.) would typically require some sort of dictation engine, not grammar-based recognition.

Wednesday, December 14, 2005

SALTforum.org finally gets updated, but still no traffic!

You probably didn't notice it, but the SALTforum.org website finally got updated. Nevertheless, nobody seems to care. (In case you don't remember SALT="Speech Application Language Tags", a markup language Microsoft introduced as an alternative to VoiceXML, but essentially never went anywhere.) The update is a slide deck of a SALT Tutorial given earlier this year at SpeechTEK West. As far as we know this is the first update spkydog has observed since June 2003.

Taking a look at Amazon's alexa.com traffic stats makes for a fairly interesting exercise. Among the Alexa Toolbar Community, SALTforum.org ranks 2,710,256, while the VoiceXML Forum's website (voicexml.org) ranks 328,060. In terms of page views, voicexml.org is attracting several hundred thousand views daily, with spikes exceeding 1 million when newsworthy events occur. There is not enough traffic on SALTforum to even get it on the charts.

Another interesting statistic is that tirst in the SALTforum's "people who visit this site also visit" list is voicexml.org. No trace of the SALT Forum in the voicexml.org statistics.

Just for kicks, I took a look at Alexa's traffic stats for bluetooth.org and compared to voicexml.org. I would think that Alexa.com toolbar users are your typical consumer and since bluetooth is a fairly popular consumer technology these days (does anybody still have a mobile phone without bluetooth?) it would enjoy a fair amount of traffic. Surprisingly, bluetooth.org currently ranks 1,100,715, somewhere in between voicexml.org and saltforum.org. This is surprising because VoiceXML today is primarily installed out of sight in machine rooms (Opera 8's X+V support is one notable exception of course!) and you would think the greater awareness of Bluetooth among consumers would naturally translate into more traffic on a Bluetooth technology site.

Of course, these statistics represent the Alexa user community interests, which may or may not be representative of the wider web audience. If there is a correlation with the wider audience (seems reasonable), then I suppose the SALTforum.org vs. voicexml.org data seems to suggest that SALT is indeed a dead horse. I'm not sure what to make of the voicexml.org vs. bluetooth.org comparison. It seems to sort of suggest that VoiceXML has fairly significant momentum in the industry.

Any thoughts?

Tuesday, December 13, 2005

Speech Recognition Frustrated by Increasing Noise Levels in Hospitals

A recent study from John Hopkins University reports that noise levels in hospitals has steadily increased over the past five years. A great deal of the noise is in the same frequency range of human speech. The study found that not only does the increase noise slow patient healing rates and contribute to staff stress levels, but it also frustrates attempts to introduce speech recognition technology.

Read the article.

Tuesday, December 06, 2005

Human to Human Speech Recognition: The Next Frontier

Not so long ago, Mari Ostendorf, noted speech researcher and Professor of Electrical Engineering at the University of Washington articulated a rather interesting observation:

"If you think of the amount of time you spend talking as opposed to reading
documents, you'll realize that you spend much more time talking," Ostendorf
said. "We have this speech data that is a huge potential information source, and
it's largely untapped. It really is the next generation data source."

Ostendorf went on to explain that automated speech recognition works quite well today when human speak to computers, but points out that recognizing human to human speech is an entirely different ball game.

"When people talk to one another, they speed up, they slow down, they get
excited, they get bored, they show emotion, the pitch goes up, the pitch goes
down, there are interruptions, the speech overlaps, the speaker changes -- there
is a lot of variability," she said. There are "ums" and "ahs," repetition and
hesitation. It's not just a matter of what we say, but how we say it. "We don't
notice these disfluencies -- they just pass us by; we filter them out,"
Ostendorf said. "But they are there. And a computer has to do something with
them."

We tend to think of speech recognition technology as a user interface enabler. However, if we look at the amount of audio now available on the Internet (think podcasts, radio/tv programming, etc.) it seems that speech recognition will play an increasingly important role in indexing and searching the vast amounts of audio and video content being served on the web.

Services such as www.blinkx.tv are industry pioneers in this area. Google's recent poaching of widely known speech technologists and executives (K.F Lee - Microsoft, Mike Cohen - Nuance co-founder, TV Raman - IBM, Bill Byrne - SAP) is also interesting. Same thing can be observed of Yahoo.