<body><script type="text/javascript"> function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener('load', function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } </script> <div id="navbar-iframe-container"></div> <script type="text/javascript" src="https://apis.google.com/js/platform.js"></script> <script type="text/javascript"> gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() { if (gapi.iframes && gapi.iframes.getContext) { gapi.iframes.getContext().openChild({ url: 'https://www.blogger.com/navbar.g?targetBlogID\x3d9519466\x26blogName\x3dthe+spkydog+koop\x26publishMode\x3dPUBLISH_MODE_BLOGSPOT\x26navbarType\x3dTAN\x26layoutType\x3dCLASSIC\x26searchRoot\x3dhttps://spkydog.blogspot.com/search\x26blogLocale\x3den_US\x26v\x3d2\x26homepageUrl\x3dhttp://spkydog.blogspot.com/\x26vt\x3d-4534400202552370894', where: document.getElementById("navbar-iframe-container"), id: "navbar-iframe", messageHandlersFilter: gapi.iframes.CROSS_ORIGIN_IFRAMES_FILTER, messageHandlers: { 'blogger-ping': function() {} } }); } }); </script>

Tuesday, December 06, 2005

Human to Human Speech Recognition: The Next Frontier

Not so long ago, Mari Ostendorf, noted speech researcher and Professor of Electrical Engineering at the University of Washington articulated a rather interesting observation:

"If you think of the amount of time you spend talking as opposed to reading
documents, you'll realize that you spend much more time talking," Ostendorf
said. "We have this speech data that is a huge potential information source, and
it's largely untapped. It really is the next generation data source."

Ostendorf went on to explain that automated speech recognition works quite well today when human speak to computers, but points out that recognizing human to human speech is an entirely different ball game.

"When people talk to one another, they speed up, they slow down, they get
excited, they get bored, they show emotion, the pitch goes up, the pitch goes
down, there are interruptions, the speech overlaps, the speaker changes -- there
is a lot of variability," she said. There are "ums" and "ahs," repetition and
hesitation. It's not just a matter of what we say, but how we say it. "We don't
notice these disfluencies -- they just pass us by; we filter them out,"
Ostendorf said. "But they are there. And a computer has to do something with
them."

We tend to think of speech recognition technology as a user interface enabler. However, if we look at the amount of audio now available on the Internet (think podcasts, radio/tv programming, etc.) it seems that speech recognition will play an increasingly important role in indexing and searching the vast amounts of audio and video content being served on the web.

Services such as www.blinkx.tv are industry pioneers in this area. Google's recent poaching of widely known speech technologists and executives (K.F Lee - Microsoft, Mike Cohen - Nuance co-founder, TV Raman - IBM, Bill Byrne - SAP) is also interesting. Same thing can be observed of Yahoo.

0 Comments:

Post a Comment

<< Home