Speech recognition technology is ready to be consumed by the masses – so, what’s next?

Speech recognition technology is ready to be consumed by the masses – so, what’s next?
Speech recognition is in great shape – accuracy levels are good and improving all the time. The accuracy is no longer focused on the easy scenarios, but is now being used for noisier, harder conversational use-cases, making the technology practical for real-world applications. This is supported by the ability to deploy the technology in scalable ways that meet business needs, offering on-premises models as well as a public cloud. 

The way it is consumed is getting easier too. Speech recognition can support things like multi-accents and dialect models to avoid the challenges of managing deployments for the diverse world that we live and operate within. Speech technology is not just for English either – it also supports native speakers of a growing range of many different languages. The capabilities of speech technology are ever increasing, enabling businesses to operate globally with the same scale and support that they would have in the English-speaking world.

There is always greater possibility in any industry. Non-English support for speech recognition is not as good as it is for English in many cases, especially taking into account accents and dialects. With the support for multiple languages comes the challenge of understanding which language is being spoken. This means that the ability to detect and decipher language itself is still a growing need. Language identification and detection and code switching are now becoming increasingly important to the adoption of speech technology, but still remain a challenge for most speech technology providers. Personalization to specific users and use-cases is still very much a challenge but the foundations have been laid with features such as custom dictionaries and are expected to get better in the short term.

It’s not just words that are used to convey meaning in conversation. Sentiment, the speaker, hesitation and non-speech sounds all provide additional context and meaning. There is still work to do here to enable the wider meaning of speech to be determined.

Ultimately, what we really want is to truly understand the spoken word, not just transcribe what is said. That is the journey that the technology is now very much on. Understanding means supporting continuous intelligence within businesses. Enabling that understanding in real-time enables actions to be undertaken in line rather than out of band. Understanding also means using all the available context. So, that means looking wider than just the words. It means listening to sounds and sentiment but it also means using images, video and textual forms of communication that might be available to provide the deeper meaning of the communication. As speech technology continues to develop, we expect to see a broader range of usable outputs from speech analysis such as call-steering, detailed sentiment and extending voice control capabilities.

All of this advancement needs more and more data to be processed. The long pole here is having enough labeled data to support the learning required. We are undertaking some research to enable this to be less human-intensive and provide much faster learning that is continuous. These developments will unlock the power in understanding that will form the next big step in speech recognition technology.
News Topics :
Google today is expanding its speech recognition capabilities to support dozens of new languages, particularly those in emerging markets in India and Africa, the company announced this morning. That means...
Youve heard the phrases before the questions asked and answered by those ubiquitous, futuristic devices. Its all OK, Google and Hey, Siri out in the world of smartphone and voice...
For humans to achieve accurate speech recognition and communicate with one another, the auditory system must recognize distinct categories of sounds such as words from a continuous incoming...
The shortening of terms to an abbreviation of letters is meant to make things simpler, but we are all aware it often doesn’t.  For anyone stepping into a room of...
Individual speech sounds phonemes are statistically associated with negative or positive emotions in several languages, new research published in the journal Cognition by Bocconi Professor Zachary Estes, his...