In wake of the numerous advancements of Artificial Intelligence (AI) in many high-tech fields, chatbots have been becoming more advanced, applying concepts such as statistical learning to natural language processing (NLP) to add new layers of complexity and realism to chatbot systems.
Here at AllyO, we are in the midst of an ongoing effort to improve the reliability and fluidity of our chatbot conversations by replacing our existing text-matching techniques (checking for the existence of specific keywords in user messages) with cutting edge AI methodologies for a more realistic conversation flow.
Whenever new technologies are introduced, we must verify that the new system really does outperform our legacy system. To achieve that, we want to define some metrics by which we can comparatively assess the performance of both systems.
These comparison results are helpful in identifying certain problems with the new system that may have been overlooked, and where our system requires improvement.
One of the ways that this is accomplished is by using our ability to detect intent as a metric.
Our chatbot operates by identifying the intent of a user’s message to extract the right information from the conversation flow. For example, when a user says ‘My phone number is 416-555-4567,’ we identify the intent of this message to be phone_number. We then use this information to inform our chatbot about how to progress the conversation. This idea is extended across all types of sentences which we require to extract conversation information from.
This intent identification makes for an appropriate metric to test our two systems. The testing is conducted by extracting a large number of user messages from our conversation logs and feeding them back through our system. Once through the legacy system, and once through the new AI system. Once that has been done, we collect the results and compare them. Easy!
As expected, the results favor overwhelmingly the new system in identifying intent. But there is an interesting subset of the test results where user message intents are identified correctly by our legacy system and incorrectly by our new system. These cases are scarce (in fact they only make up of 0.625% of all the test cases), but they must be addressed.
Over the past few weeks, the NLP team has been working in the cyclic process of running these tests, identifying the problematic cases, applying fixes to these problematic cases, and then re-running the tests. Ideally, this cycle is to be repeated until we reach a state in which we are wholly confident that our new system will perform better than the old by every metric we have defined.
The data tells us that we see an average improvement of 23.37% over the old text-matching system at correctly identifying intents. That is to say, we are 23.37% more likely to identify an intent using the new system than the old system. This varies as we narrow down on specific intents, however we predict that the percentage is always likely positive.
Now that’s quite an improvement! But the real improvement cannot merely be represented as a percentage, in introducing the AI system, it offers users a way to realistically interact with the chatbot, responding with more natural phrases just as they would in ordinary conversation.
The user is no longer restrained to a set number of keywords that the old system required. In moving towards integrating more sophisticated AI methods into our chatbot technology, we ultimately deliver a more delightful and engaging user experience, and that, in my opinion, is the biggest benefit our AI system has to offer.
Stay up-to-date with the latest insights and trends from AI recruiting brought to you by AllyO Blog!