Voice recognition technology has been around a few years now and is something we’ve all encountered to some degree – be it in an automated call to a bank or via an in-built feature in your mobile phone designed to save you time when dialling a number.
But what does the future hold?
Some commentators believe that voice recognition technology is fast approaching the point where it can be used in handheld internet-enabled devices to control everything from recording TV to the household heating remotely using voice prompts.
However, against this comes news today claiming a major voice-to-text business is deceiving customers by employing humans in call centres to carry out the bulk of its conversions, effectively bypassing its specialist voice recognition supercomputer to do so.
So what is the truth about voice recognition technology – is it really on the verge of becoming the next big thing? Or is it still at an embryonic stage where reliability can only be assured by human intervention?
Writing for tech site Psyorg.com, Troy Wolverton envisages a bold Star Trek-like future in which he can use his voice to control everything from switching on the lights in his apartment to recording sports matches on the TV.
In pursuit of this he quotes a couple of industry insiders who share his vision and claim that the required technology to make it happen is just around the corner.
Todd Mozer, the CEO of US-based speech-recognition company Sensory, is one such person. According to Wolverton, Mozer foresees a world filled with speech-controlled internet devices, or SCIDs.
Mozer is backed up by Bill Meisel, the editor of industry newsletter Speech Strategy News. Meisel believes that mobile phones will, before long, be used as "universal remotes" that will be able to carry out all sorts of tasks from programming TiVos to setting alarm clocks.
Significantly, Wolverton notes that many developers are getting around the inherent problems of voice recognition by deliberately limiting the number of words that can be recognized – essentially narrowing the terms of reference and making them specific to the task in hand.
It would appear that this is precisely where voice-to-text company Spinvox has come unstuck. The BBC reports today that it has information suggesting that Spinvox has been using staff call centres in South Africa and the Phillipines to translate the bulk of its work.
Aside from the data protection ramifications of outsourcing potentially sensitive data to outside the EU, Spinvox’s claim that it uses a specialist voice recognition supercomputer known affectionately as ‘D2′ or ‘The Brain’ to do its work looks to be on shaky ground following the BBC’s revelations.
Spinvox has declined to comment on just how many messages are passed over from the D2 to its human operatives, who the company describes as "conversion experts".
However, having sourced a reported £120million from investors, the company has been left with no small amount of egg on its face by the revelations.
Throwing slang, accents and intonation into the mix
What seems to be apparent is that while narrow-field voice recognition technology, of the sort used in automated telephony systems works well, challenging a computer with an unlimited dictionary, along with numerous variations and permutations of slang, accent and speech intonation is, at least for the time being at least, more than likely to confuse i.