<body><script type="text/javascript"> function setAttributeOnload(object, attribute, val) { if(window.addEventListener) { window.addEventListener('load', function(){ object[attribute] = val; }, false); } else { window.attachEvent('onload', function(){ object[attribute] = val; }); } } </script> <div id="navbar-iframe-container"></div> <script type="text/javascript" src="https://apis.google.com/js/platform.js"></script> <script type="text/javascript"> gapi.load("gapi.iframes:gapi.iframes.style.bubble", function() { if (gapi.iframes && gapi.iframes.getContext) { gapi.iframes.getContext().openChild({ url: 'https://draft.blogger.com/navbar.g?targetBlogID\x3d9519466\x26blogName\x3dthe+spkydog+koop\x26publishMode\x3dPUBLISH_MODE_BLOGSPOT\x26navbarType\x3dTAN\x26layoutType\x3dCLASSIC\x26searchRoot\x3dhttps://spkydog.blogspot.com/search\x26blogLocale\x3den_US\x26v\x3d2\x26homepageUrl\x3dhttp://spkydog.blogspot.com/\x26vt\x3d-4534400202552370894', where: document.getElementById("navbar-iframe-container"), id: "navbar-iframe" }); } }); </script>

Wednesday, January 25, 2006

IBM's "Superhuman" Speech Initiative


A 3 year old child is typically a better speech recognizer than the most powerful computer(s) running the most advanced speech algorithms. IBM is striving to change that though. A PC Magazine article published today quotes IBM's David Nahamoo saying IBM's goal is to achieve ASR performance to the level of human beings within the next 5 years.

These sorts of goals have been made before, and billions of dollars spent since, but it has yet to happen. Skepticism aside, conventional speech recognition performance today is pretty decent (thanks to cheap memory and fast processors) when the vocabulary is restricted to a particular domain or context.

IBM's "Tales" project, also described in the article involves an area we described as "hot" in a recent post - automatically recognizing human to human speech. A good solution to this particular problem could be quite useful in building searchable indexes of the reams of audio/video content that is produced. The IBM system can currently process television audio with 60-70% accuracy, but not in real-time. It takes around 4 minutes to deliver results at this level, and performance as high as 80% could be achieved with more processing (i.e. higher delay)

(Note: IBM's Les Wilson in the photo.)

0 Comments:

Post a Comment

<< Home