Conversational AI- Design considerations
Introduction
Conversation bot design is the most happening thing when it comes to AI computing and an essential thing to consider for making products smart and digitally inclusive. With the rapid progress in AI and specifically in NLP computing, language interpretation has improved considerably making a near-normal conversation possible since the time Siri was first introduced in iPhone 4s in 2011.
Today we see, chatbots have proliferated as part of the web application extension. In fact, adding a voice or chat interface is the fastest way to qualify an application AI-ready, the chatbot also is the strategy for the mobile-first digital economy.
Some of the well-known use cases where AI-powered conversational bots have vastly improved the user experience are into following:
Pre-sales bots: Conversational bots that can help or guide a prospective customer to make a purchase decision, can convert a window shopper into a buyer are in great use. Many customers have inhibition to interact with a real human but may find it perfectly normal to interact with a bot.
Co-pilot: In-car voice assistants are already in great use. They help reducing driver distraction by assisting in non-critical driving tasks, for example, cabin comfort control, infotainment, navigation assist.
Voice-assisted-maintenance: A voice bot is a great help for technicians on the shop-floor which can guide them step by step to deal with a fault repair which otherwise takes referring to user guides, manuals, and technical drawings.
Virtual doctor: A robot health care assistant can converse with the patient for the first level interaction and guide to the next step to Doctor has been in the use already. During the recent pandemic time of Covid-19, this has been put to great use to isolate suspecting patients with symptoms at the same time enforcing social distance.
Conversational AI – Technical background and recent advances
To build an intelligent Conversational Agent, understanding user intent is a key. There are many parts to this challenge, too many variables to solve. Human comprehension of language is complex, and not everything of it is verbal. As a human listener, we also consider many things like the speaker’s facial expressions, hand, and body movement, which is also called ‘body language’ that is unfortunately not under the purview of the NLP computing domain. Language understanding has following key parts and each of them needs to be solved separately to figure out the holy grail:
– Understanding semantics (lexical)
– Understanding syntax
– Understanding context (both short and long term)
There have been several shallow and deep learning techniques that have been very successful to solve some of the language understanding problems. If we were to pick up three most important advancement, which has leapfrogged the NLP success, those would be:
- Word embedding
- Recurrent Neural Networks or RNN
- Attention
Word embedding: In a typical dictionary of any language, words are arranged alphabetically or in some order which doesn’t preserve any semantic proximity or closeness of their meaning. Word embedding is an intelligent way to give unique code to each word in the vocabulary so that synonyms have similar codes, antonyms will be in opposition, and so on. If the words are then represented in this code space, they will distinctly show a pattern according to their meaning. This is going to be extremely useful in language processing.
RNN: RNN is a special type of neural network which unlike convolution networks, is constructed specially to process chains of sequences. This is particularly suitable for language processing because all languages have a sequence of words, expressions, and phonemes. RNN has many variations that help in doing multiple things in language processing like a sentence to sentence translation, sentiment analysis, auto-complete sentences, and so on.
Attention: Attention is an improvement done on part of RNN networks which allows retaining the focus in a long chain of sequences. This, in turn, helps to solve the context part of the problem we deal with in any conversation flow.
Using these techniques there have been several improvements in constructing a neural network for an end to end language processing, they are called Transformers.
BERT, GPT, ERNIE are all transformer-based models from Google, OpenAI, and Baidu respectively. They come packed with pre-trained, trained with the corpus of wide datasets. There are domain-specific models as well, for example – SciBERT – model for scientific papers or BioBERT for understanding biomedical language.
These pre-trained models can be further customized for more specific usage in conversational bots.
Conversational Bot design
As we discussed earlier, conversational bot design needs to handle several aspects like detecting intent, understanding short term and long term context, generating a response, handling emotion, speech synthesis. In voice-based conversation things get more complex when the design needs to handle dialect, the context of other filler words like (yum, yaa, uff), pronunciation mistakes. There are modules and submodules to handle each of these aspects of conversation mentioned above and if they are used in the proper sequence and right place, we can have a great conversation engine designed. Response generation is one such key block in a conversation engine that generates the most appropriate response considering the immediate history of the conversation. This module in a sense plays the most vital role in keeping the conversation engaged and leads to a successful closure, without sounding annoying and repetitive.
Conclusion
Designing a Conversational AI system is not an open and shut process, rather it is a continuous journey. As we see in any natural system, more so in a conversational setup, relationships evolve, new things are learned. This necessitates the system to be designed in a way there is always self-learning happening. New keywords are coined, new topics are learned, a new way of saying things, following the trending topic of the day or time. Pepping up the conversation with such titbits helps to keep the conversation feel more real and more human-like.
Conversational AI is going to be everywhere, if not already, in our home, car, banks, shops, and hospitals, driving up a business opportunity of $15 billion by 2024. A lot of these rides on how successful we are as AI designers to make the conversation experience very real using the right choice of tools and techniques, and passion for perfection.
Tata Elxsi has developed a Conversational engine which uses a Lego block concept for its various components. Using these building blocks we can develop a voice or chat-based conversation for any given context. For now, this has been used to develop bots for Retail Sales, Call steering in a call center, Q&A bot, Voice-based attendance system, and so on.
Author:
Biswajit Biswas
Chief Data Scientist, Tata Elxsi
Published in Telematics Wire