What does Natural Language Processing (NLP) mean?

What does NLP mean?

Natural Language Processing (NLP) allows a machine to process every day, written language as opposed to a coding language. This can be incredibly helpful with volumes of text-centric data - the kind of data most companies have, but are often times unable to fully utilize.

Steps in K-Extractor Pipeline

The way we process data is to push it through a sequential pipeline. Each step does a deeper analysis of the text.

The first step is Document Preprocessing. This determines what the document is. It determines its structure, whether it’s a pdf or a word document. Any type of text document can be processed, including those with tables. And then we go through the next steps: Text segmentation, part-of-speech tagging, concept extraction, word sense disambiguation, syntactic parsing, etc.

There is also over 86 standard types of entities the system is already looking for, which occurs in the concept extraction step. A lot of heavy lifting occurs in this step, including name entity recognition, collocation identification, event detection, and identification of temporal expressions to name a few.

We also look for 26 different types of semantic relations, so that when you’re starting, the system is already looking for these connections between different entities. This is developed over years of research to create efficient yet robust-enough solutions to give you some extraction with just the vanilla, out-of-the-box solution.

The other steps in the pipeline include coreference resolution - which assigns pronouns to the actual noun its associated with. You can do sentiment classification or topic detection.

We can convert the data into RDF triples. This means that the data can be integrated into a graph database. Or, we can index it in the system and be able to draw on that. Ultimately, how the data is stored is up to you and your usecase.

Workflow for System Training with an Ontology

This shows how the system is trained. We employ an ontology to provide the system the domain knowledge.

Example:

If you were to ask what’s “driving” our earnings, you are not talking about “driving” a car. In this case, the ontology would define “drive” as “what’s contributing to”.

If you do not currently have an ontology, we have an ontology-builder. To build an ontology, we take a seed file, or if you have any sort of taxonomy or existing base of an ontology – we take your documents and concepts are automatically extracted to build an ontology. Then, we leverage that in the NLP system to understand your document. Our whole thing is that in order to get an ROI from enterprise NLP, you need to be able to rapidly customize the system, and that’s what these tools help you do.

Once the ontology is finalized, we process the documents and the training rules are automatically generated. We have tools to make annotations on the data or manual adjustments to the rules. Once completed, you have a K-Extractor system for your use case.

Post K-Extractor Querying Modules

With Lymba we have a few tools that can help query the database. One of our tools is a semantic-based search that allows us to do comparisons. For example, how is this small bit of text relevant in this larger bit of text. Very helpful for regulatory comparisons for example.

Or, for example, when I asked what’s “driving” the earnings. That’s a very conceptual type of question. We run that search through the NLP pipeline to build a semantic profile of the search and then compare that to our semantically indexed data and retrieve the information with the highest relevance.

You can also use our natural language to sparql query tool (NL2Query). This will pull your answers from your graph database. The difference is that the graph will have concrete data points. This tool takes your natural language question, converts it to a SPARQL query and finds that data in the database.

The nature of your question may either be concrete or more conceptual. We do have a hybrid Q&A approach that first looks at the graph and then looks to the index for your answer.

These are the query modules you can use with the K-extractor NLP pipeline. Once the data is processed, it can go other places: we can send it to a graph and use the tools there, you can get alerts, have your documents summarized, pull key insights from documents... It is all designed for you and your use case. This is just to show that there are ways to get the knowledge once we’ve extracted it.

Hopefully you now have a better understanding of what NLP is. Thanks for watching from LYMBA. Please reach out to us with any questions or for help on your next project.

Previous
Previous

How-To Utilize Your Tables With NLP

Next
Next

Boost Your Graph with Semantic NLP: Columbia University's Knowledge Graph Conference 2020