Semantics and Documents

A look at how natural language processing (NLP) can improve the way insights are derived from text documents.

Introduction

In today’s world where Artificial Intelligence (A.I) is becoming more commonplace, we are starting to see several situations where the application of this technology is allowing us to do previously unimaginable things. What is interesting to note here is that as these applications of A.I mature in their different forms, people are coming up with more innovative ways of using them to make our lives a little bit easier. One such application of A.I is NLP, and we’ll be taking a look at how it can help us gain better insights from our text documents.

Why using semantics for search?

Whether it is in the professional, academic or any other sphere of our modern society, we likely have to deal with text documents in one way or another. From personal experience, this usually involves me searching through documents to find those few lines of text that answer my question or at the very least point me towards an answer, and I often find myself wishing that there was a better way to do this.

An obvious solution might be to use keywords to search through the documents, and while this is useful, there is room for improvement. For example, what happens if the exact word I search for is not present in the document? Is there a way to make a computer “understand” the text in a document? Such that when I search it will “know” what I am looking for, even if the specific words in my search term do not appear in the document? The short answer to these questions is, yes there is away, and it’s called semantic searching.

So how do we do this?

To get an appreciation of just how useful and potentially game-changing semantic searching is, we need to take it back to a time where computers, as we know them today, did not exist. How did people find answers to their questions? Well, legend has it that people had to walk into a library, find a book on the topic, then proceed to painstakingly read through the pages to see what they were looking for. I know, that’s insane, right?

Fast forward to today and most simple questions can be answered in a matter of seconds using any respectable search engine. However, there are certain situations where we may have thousands, if not tens of thousands of documents related to work or academics that we can’t conduct a simple Google search on. So what do we do if we need to get the information we need out of them? This is where the power of semantic search shines through.

Instead of doing a somewhat limited search on specific keywords that will probably bring back tens or even hundreds of results, depending on the amount of the documents, we could use a phrase to conduct a semantic search. This would then return a ranked list of sentences that match the search phrase, even if none of the specific words used in the search phrase is present in any of the documents.

This incredible searching ability is made possible by NLP in the sense that a machine (computer) first “understands” the words in the search phrase and how they relate to each other. For example, the word nail could be related to the human body or construction, depending on how it is used in a sentence. The same way you and I can understand that is the same way our machines do the using NLP. The machine then proceeds to extract a similar “understanding” from every sentence it searches through to find the closest match. Hopefully, at this point, you feel like you get how useful this all is, but perhaps you still need an example of something like this in action to make it all sink in. Deep learning cafe’s got you covered.

Show me what you’ve got!

One obvious use of semantic search that I’ve been alluding to throughout this post is research. When conducting any form of research, it is almost inevitable that we will encounter multiple documents with valuable information. Finding a way to narrow things down could save time and potentially money.

Coronavirus research is no exception, and kaggle released a data set that contains 44000 research papers on the topic. At Deep learning cafe, we saw an opportunity to put our NLP skills to the test and built a COVID-19-explorer that is capable of using a semantic search to return the five most relevant papers form the 44000 that are available. The results of this explorer can be seen in the video here.

In closing.

What’s important to keep in mind here is that this is just one application of NLP. At the core, what we are doing here is adding some depth to how our machines “understand” our text documents, going beyond the ones and zeros that represent them. There are many more use cases for NLP, especially in business and perhaps this post has you asking the question, “I wonder if this could be useful for my business?”. If that is the case, the team at Deep learning cafe wants to hear from you, so get in touch. We might be able to help you take your business to new levels with NLP.

 

Leave a Comment

Your email address will not be published. Required fields are marked *