What is an Insight Engine?
Traditional Search is, at its core, a simple matching engine where a user types in words and the search engine finds web pages or documents that include those words. The more unique words used, the closer the results are likely to be to what the user is looking for. This simple technique is powerful when combined with user behavior based ranking algorithms like Google employs.
An Insight Engine replaces this simple matching functionality with concept-based search. Instead of typing in a few keywords, often having to refine their search with more keywords, an individual asks a question in a natural way. Results are returned based on the concept(s) in the question. They are used by companies to assist employees and customers to find the information they are looking for about that company.
This 2-part blog series is designed to provide a basic understanding of Insight Engines and explain how they are becoming the future of Search.
Traditional Search vs. Insight Engines
The power of keyword based search algorithms should not be underestimated. Google has made a large fortune from very fast, and relevant keyword searching by refining their results based on user behavior.
However, the limitations of Traditional Search are inherent to the way the algorithm functions. They are stuck matching keywords to documents or webpages. For more complex topics, a searcher must resort to an iterative process of search, read, discover new keyword phrases, run more searches, read more, discover more, and search again. Eventually they either learn what they needed to know or give up.
Insight Engines aim to eliminate this iterative search process and provide immediate access to accurate and relevant information.
Natural Language Search, The Basis of Insight Engines
Natural-Language Processing (NLP) is the area of Machine Learning focused on extracting useful data from natural language. Natural language inputs can be in the form of a spoken question, news article, or an Amazon review. NLP includes a broad set of tools and analyses, each of which focuses on a very specific aspect of language.
An Insight Engine uses NLP for at least two purposes. First, to understand the question asked. The most relevant NLP tools for parsing a question are Sentence Breaking, Named Entity Recognition (NER), Topic Segmentation & Recognition, and Speech Recognition. After parsing the question, an Insight Engine must return a response. In order to construct a response many of the tools mentioned are used and additionally Question Answering and Automatic Summarization tools are employed.
You can find descriptions of each of these tools here:
An Insight Engine, based on Natural Language Search, is the next generation of Enterprise Search. The purpose is to move beyond keywords and into concepts. For simple questions, an answer is provided directly. For more complex or vague questions, relevant documents are provided.
Questions can be asked in a natural way, just as you would ask a friend or use with Siri, Alexa, or Google Voice. Natural Language Search begins by parsing the question into keywords, named entities, and dates. Keywords are mapped to topics (as a proxy for concepts). Named entities and dates act as both a filter on the search results and additional search terms. Personalization can be viewed as a final filter of the results so that among all the concepts, people, places, and times only those relevant to the user at that moment are returned.
The potential of the Insight Engine can be best demonstrated with an example contrasting Traditional Search to Natural Language Search.
Complete Answers + Context
Jane Doe, an employee of a large company, wants to know “What are my health benefit options next year?”
With Traditional Search she must convert this question, herself, to a keyword phrase like health benefit options to initiate a keyword search. The search then returns documents containing health, benefit, or options. The returns on health will return results for how to stay healthy. Results for benefits will include vacation time and college tuition reimbursement. Stock options and compensation will match the options keyword.
With Natural Language Search “What are my health benefit options next year?” becomes health benefit options + 2019. The term health benefit options is converted into a topic. Topic mapping links health benefit options to maternity leave, dental coverage, vision benefits, short term disability, and more. Then the filter of 2019 is applied so that documents from prior years are excluded.
With search personalization the results get even better. For example, paternity leave is pruned from Jane’s results.
Elimination of Information Silos
Now Jane wants to know how many vacation days she has accrued. However, all the HR documents post vacation time in hours.
An acceptable Insight Engine response would be to conceptualize vacation days, find related documents, extract the number of hours, convert hours to days, and return “Employees with less than 5 years get 10 days of paid vacation. Employees with 5 to 15 years get …”
A better result would be to personalize the return “Jane Doe has 4.3 of her 10 vacation days remaining.”
The ideal result would be one where the Insight Engine traverses the information silos and finds information related to Jane’s projects at work and returns “Jane Doe has 4.3 of her 10 paid vacation days remaining. The following critical events are scheduled for Project ChatBot over the next 6 months. All vacation requests within 2 weeks of the start of critical events will be denied. To request an exception, click here.”
In part 2 of this blog series, we’ll cover that basics of building an Insight Engine, which include: collecting raw data, data storage, data pipeline, data enrichment, and machine learning.