Speech recognition: Your smartphone gets smarter

Computerworld - When we were kids, my friends and I used to play a game where we fantasized about which technologies from Star Trek were most likely to be real-world inventions within our lifetimes. The transporter and warp drive -- not likely. But the communicator, the voice-commanded computer and the universal translator -- very likely.

When speech recognition arrived on the computer desktop, it seemed like a great idea -- but for most people, it wasn't a replacement for the keyboard and mouse. Now speech recognition technology is being put to use in a whole new environment: phones. And its presence there is further driving its use and development in directions it might never have headed on the desktop.

Speech recognition first appeared as a primitive technology in the 1950s, as little more than a curiosity. In the early 1960s, IBM's Shoebox device could recognize 16 spoken words and could respond to simple mathematical requests, such as "three plus four total."

DragonDictate by Dragon Systems was probably the first speech-recognition program for the PC, released in the early 1980s for DOS computers. It could recognize only individual words, spoken one at a time. It evolved over time into the product Dragon NaturallySpeaking (now in Version 11 and owned by Nuance Communications), which can transcribe text spoken in a normal conversational voice and speed.

Speech recognition on the desktop had two big limitations. First, in order for the program to work with a high degree of accuracy, it had to be trained to recognize the speech patterns of the user. Windows Vista's and Windows 7's native speech-to-text technology, and third-party products like Dragon NaturallySpeaking, still require a user-training period to be useful.

The second limitation was the prevalence of the keyboard. Most people were already in the habit of typing, not talking, and so speech control faced the same uphill barriers to adoption as the Dvorak keyboard layout. Why learn to use Dvorak when plain old QWERTY was readily available and worked fine?

Abhi Rele, senior product manager of Microsoft's TellMe team, a group responsible for developing speech recognition technologies for multiple environments, concurs on this point: "In the desktop environment, users have easy access to other interaction modalities -- namely, keyboard and mouse -- and therefore the use of speech is primarily targeted towards speech enthusiasts."

What speech-controlled computing needed for broader adoption was two things -- better out-of-the-box usage and a venue where speech was already king, so to speak. One such venue has been on the rise for a long time: mobile phones.

Matt Revis, vice president of product management and marketing at Nuance, explains the differences between the desktop and mobile environments like this: "The desktop is a stationary environment focused entirely on desktop use cases, and so speech for the desktop follows that task flow: supporting office apps, Web browsing, communications, etc. In mobile, speech is more directed to supporting a variety of lifestyle scenarios: professionals on the go, out-and-about fun, hands-free [calling] and so on."

Gartner analyst Tuong Nguyen agrees that voice makes more sense in a mobile context. "From a usage perspective," he says, "the value of voice recognition on a handheld device is much greater. It adds a user-friendly, intuitive method of input."

This is certainly true, Nguyen adds, if the alternative to speaking a simple declarative statement is to dig down through a slew of menus or struggle with tiny on-screen keyboards: "With the growing adoption of touch-only devices (no physical keys), voice recognition is used to enhanced data entry/input. It also supports hands-free requirements or legislation."
(Story continues on next page.)

Speech recognition works by making statistical models of spoken language. "To recognize spoken words," says Google product manager Amir Mane, "we compare the input speech to a statistical model of the language and try to find the closest match -- the system's best guess at what the user said."

Statistical models of a language require a great deal of storage to be practical. "[They] must cover all of the fundamental sounds of the language (phonemes), all of the words, and all of the different ways that the words can be strung together in the spoken language," Mane says. On top of that, there are accents, variations in sex and age, regional pronunciations, word choices ("soda" vs. "cola" vs. "pop") and so on.

Mane notes that Google Voice Search's statistical model requires three elements: acoustic models, language models and a lexicon. "An acoustic model is created by taking audio recordings of speech and the transcriptions of what was said, and using the two to create a representation of the phones -- the basic components of all words in a given language," he says.

The language model involves figuring out what words are likely to follow other words, and using that as a way to improve recognition accuracy. "The word 'empire' will be followed by the words 'state' or 'strike' [as in The Empire Strikes Back] more often than it is followed by the words 'diverse' or 'guava,' " Mane explains. Collecting data from the field helps continuously improve the language model and the lexicon.

Google isn't the only company crowdsourcing its recognition data. Speech-recognition app Vlingo puts cookies on users' phones to continuously build speech models based on users' own feedback, combined with models based on similar speakers.

Options for Protecting against Web ThreatsThis independent paper from senior analyst Jon Collins at FreeForm Dynamics considers how Web-based security threats are evolving, within the context of IT trends including mobile, home computing and other forms of remote access that could potentially increase the attack surface of the companies. It defines the scale and types of threat, what to look for in a corporate web security solution and compares the different types of technological approach available to companies and the processes that need to be considered for effective protection.

Read now.

Security KnowledgeVaultSecurity is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Social Networking - Brave New World or Revolution from Hell?Social-networking sites have revolutionized how businesses use the Internet. Instead of relying on faltering newspapers to find job candidates, companies can access thousands of potential employees through Facebook and Twitter. But social-networking sites have also left businesses vulnerable to new security threats. So are they tools to be used or security traps to be avoided?

Read now.

No comments:

Post a Comment