From Douglas Rain’s commanding Hal in “2001: A Space Odyssey” to Scarlett Johansson’s dulcet tones in “Her”, people have always had high hopes for voice recognition.
But as with a lot of technology, what we imagined, what we wanted, and what we got are very different things.
To be fair, voice recognition has come a long way. A few years back, Siri and her compatriots Google Now and Cortana were the stuff of science fiction. While voice recognition software itself has been around for decades, it took a big step forward in processing capability and power to start integrating it into smartphones.
But voice recognition is imperfect. Although Siri can now beatbox, trying to get her to call Tom instead of Mom can sometimes be a challenge. Factors like pronunciation and speed of speech can affect a voice recognition system’s understanding. For example, the American Siri really struggles with accents, a point of frustration for many smartphone users.
How does voice recognition work?
Spoken sounds, words, or phrases are changed into electrical signals that are converted into coding patterns and assigned meanings. An acoustic world model processes and evaluates the similarity between the words of the input voice and the set of all the words accumulated in its vocabulary and determines which word was most likely spoken. The language model, i.e., a model of syntax and semantics, evaluates the sentence-level match and determines the most likely sequence of the words spoken. Then the final decision is made by matching the most likely word-sentence sequence as the recognized speech recognition output.
Big investments will help overcome software issues
The voice recognition market for smartphones in the US is relatively small at the moment. The market was worth $461.9 million in 2014 and is expected to growth to $601.6 million in 2019.
Voice recognition market for smartphones in the US ($ millions)
Source: Technavio, 2015
But despite these relatively conservative numbers, the market has huge growth potential for two reasons.
First, despite voice recognition being a feature in every Android and iOS device, not every smartphone user uses the feature, which means there’s a huge untapped market ripe for the picking. Second, though voice recognition systems have the ability to recognize various languages and determine the authenticity of a voice, the market still needs to overcome the lack of accuracy.
To overcome these challenges, vendors in the market are investing heavily in voice recognition technology to increase awareness and generate interest among potential customers.
Watch for more crowdfunded software on the market
The voice recognition market for smartphones in the US is the domain of a few well-established companies. Nuance is the only prominent vendor developing products for mobile OEMs like Samsung, Blackberry, and Acer, and major smartphone vendors have proprietary software; Apple has Siri, Microsoft has Speech and Cortana, Google has Google Now, and Samsung has S-voice.
The one thing that small vendors in the market have going for them is how imperfect the technology still is. Existing voice recognition software still needs work in terms of accuracy, user-friendliness, efficiency, and response time. This opens the door for small vendors to take the crowdfunding route to get new projects up and running. A few crowdfunded voice recognition projects currently in the works include:
- In 2013, Beyond Verbal, a start-up based in Israel, raised $2.8 million in its first round of funding led by Genesis Angels
- In 2014, Cord Project, a New York-based start-up, raised $1.8 million in seed funding led by Metamorphic Ventures and Lerer Hippeau Ventures
- In 2015 Uniphore, an India-based start-up raised their Series A round of funding from Kris Gopalakrishnan (the co-founder of Infosys)
Despite all this work to make voice recognition better, the personality and emotional response of virtual assistants like Samantha and Hal are still just the work of science fiction (although in the latter’s case that’s probably a good thing).
And while the technology is certainly not bad now—in the past 24 hours alone, Google Now has reliably made calls, identified songs and even found recipes online for me—voice recognition software stands to get a lot better over the next few years.