In 2011, Apple's Siri was one of the first truly successful NLP assistants in the hands of consumers. This followed the period of voice-driven systems that reacted to specific commands, which most users found annoying due to their limited capability to "understand" what the user wanted. Other companies quickly followed suit. After initially integrating NLP-driven systems in smartphones and tablets, several companies started promoting dedicated devices to talk to such as: Amazon's Echo Dot or Google's Home. Additionally, there was increased adoption by auto manufacturers who have integrated robust, hands-free driving assistants into cars. With this growth, it's important to understand the basic way the technology works so that you can find the best way to leverage it.
How Voice-Driven Natural Language Processing Systems Work
At the start of the processing chain, the Automated Speech Recognition module translates the user's verbal utterances into digitally interpreted words. A Voice-Command system takes those words and matches them with predefined commands causing specific actions to be invoked. An example would be responding "Yes" to the prompt "Do you want to hear your balance?" The system would then read back or print out the user's bank balance.
A Natural Language Processing (NLP) system adds a layer that takes those words and tries to extract the "intent," (i.e. the meaning of what the user is trying to achieve) and match the intent to a predefined action. Employing machine learning techniques, the user's verbiage doesn't have to exactly match predefined expressions. It just has to be "close enough" for the NLP system to decode the meaning correctly. By integrating a feedback loop, an NLP engine can improve the accuracy of the interpretation of the user's utterances as well as increase the vocabulary/wording that the system understands. For example, a well-trained system would produce the same information for the utterances "Who is an expert in big data?", "Who can help me with big data?" and "I need help with big data."
By combining NLP with a dialog manager, it is possible to create a system that can hold a human-like conversation with a back-and-forth of prompts, questions and answers. However, passing the Turing test (a test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human), is a milestone yet to be achieved.
After establishing the correct intent, the Application Logic typically retrieves data from a back-end system or other data/information source and passes the result back to the dialog manager. The resulting response can be either a non-verbal action such as switching off the light, a written response such as displaying today's weather forecast on the device's screen, a verbal response such as reading the stock market quotes, or any combination of all three output variants.
"Digital Assistants", like Siri, Microsoft Cortana, Amazon Alexa and Google Assistant, combine NLP capabilities with intelligently mined data they have gathered about the user (or other users) and attempt to present a response in the right, personal context. For instance, being asked "What's the weather like this afternoon?" the system will try to locate the user's current location and respond with the weather forecast for that city.
This is the basic way that these systems work. However, they certainly have their flaws and limitations. If you want to gain a better understanding of how different NLP applications can succeed or fail, you can read another blog about what we learned when we challenged our employees to create their own NLP demos.