Hybrid QA – Making Information Retrieval Fast, Reliable, and Accessible

An Untapped Resource

At a time when younger generations are shying away from speaking over the phone and turning toward chat more and more, many companies recognize a need to keep up with the demand for text-based services. Recent advances in artificial intelligence have made it easier to provide such services by automating processes that previously only a person could do. This shift has resulted not only in immense savings of time and resources, but also in new opportunities to leverage data that was previously inaccessible on a large scale. These are just a few of the benefits that Question Answering (QA) offers.

How Do Question-Answering Systems Work?

Question answering systems allow a user to ask a natural language question and get a relevant answer back. Use cases go beyond the search engine as some companies are using QA technology to conduct policy compliance reviews, provide internal or client-facing support, and analyze correspondence patterns to identify potential issues or facilitate triage procedures. Since the user is able to ask the question in plain English (or their language of choice), it eliminates the need to learn a query language to access the database, making the system accessible to more users, regardless of technical expertise.

There are typically two ways of building a QA system, each with its advantages and disadvantages:

On one hand, neural networks work quite well with question matching, but they tend to be limited in the scope of topics they support and are expensive to create and maintain due to the vast amounts of data required for training. Applications that rely on neural networks are riskier to use, since the performance of these applications depends heavily on the quality of the data, data that is hard to verify. Companies with high-stakes operations understandably cannot take chances with a black box.

On the other hand, QA can be done with document indexing and a Lucene keyword search, similar to Elastic Search and Apache Solr. This rule-based method, is less expensive and more transparent, but tends to suffer in recall due to the granularity of the texts in the index. If the documents are divided into paragraphs, the returned answer may be missing context mentioned earlier in the document. If whole documents are used in indexing, then the context will be maintained but there will likely be a substantial amount of noise that may cause false positives.

The Hybrid Approach – A “Forest” of Information

At Lymba we eliminate such mainstream QA issues by combining text-based index with a knowledge graph. Using the document structure, we generate a text-based index. Then, we use a knowledge graph to train our NLP tools and create a semantic profile of the documents. Finally, we run our NLP pipeline to understand the semantic context of the question and look for information with similar semantic context in the documents before returning the answer. This way, we are able to connect disparate pieces of information in the documents, each document becomes a tree of information. When using a collection of documents, we create a “forest” of information.

Using a knowledge graph in conjunction with the document index has many benefits in terms of efficiency and performance. While traditional QA systems tend to have high precision and low recall, we are able to leverage the context of documents at different levels of granularity, which increases both precision and recall. Being rule-based, our approach does not require training data. Thus, it is more transparent and faster to protype and test. Our approach is also customization-friendly, as the ontology or knowledge base used to create the graph can be either pre-populated or made to order from scratch and tailored to a specific domain.

To learn more about Lymba’s research and publications, click here.