Unlocking AI's Power with Document Intelligence and Advanced Search

image

Problem Statement:

In the fast-paced business world, quick access to accurate and actionable information is crucial.

If you're a CXO, you understand the pain of waiting forever to gain insights from vital business documents such as Master Service Agreements (MSAs), Statements of Work (SOWs), Personnel files, Legal Documents, and countless others – and that's where Document Intelligence comes in.

What we wanted to achieve and Samples we used

We wanted to put some of the elements of Doc Intelligence to test, by using all the building blocks on Azure such as Form Recognizer, Cognitive Search, Semantic ranking, + GPT-4. Our goal was to compare results between using Hybrid Search (Cognitive + Vector) and Vector alone across different query types. We used the following 2 amazing references, also pre-requisites before you read this.

Enterprise Chat GPT sample : https://github.com/azure-samples/azure-search-openai-demo/tree/main/

and Hybrid + Semantic article (For understanding the Semantic ranker and different query types.)https://techcommunity.microsoft.com/t5/azure-ai-services-blog/azure-cognitive-search-outperforming-vector-search-with-hybrid/ba-p/3929167


Cognitive Search and Vector Search with Semantic Ranking

Vector + Text Search and semantic ranking are two central pillars of contemporary data analysis. Semantic search understands the intent and the context behind a search query to serve richer and more relevant results. On the other hand, vector search employs mathematical models to understand the underlying semantics of documents. When combined, these two techniques offer unparalleled precision and comprehensiveness in data analysis and document intelligence tasks.


Sample Data : 20 MSA documents created from data available from the internet.
System Prompt : We used all three options to compare the results for different kinds of queries. Here is the comparison. (Ref type of queries ). We used Semantic Ranker always

Results:

Hybrid (Vector + Text) +Semantic RankerVector + Semantic RankerResults
Short QueriesQuery : Payment terms for customer XYZ
image
image
Vector has better summarized results
Short QueriesQuery : Key contacts for ABC
image
image
Extremely Similar
Concept seeking queriesQuery: What do you think are the most critical pieces of information in the MSAs for the customer Elephant
image
image
Similar again
Concept seeking queriesQuery: What could be the areas we need to be most careful about across all the customer MSAs ?
image
image
Again, similar and ordering could be specific to different audience
Fact Seeking queriesQuery: Find all the Non-Technical , Project Management resources with greater than 10 years of experience, and their skillsets summarized. Look for this information across customers.
image
image
Very similar again, but much better organized and slightly more results in Hybrid.
Fact Seeking queriesQuery: Find all the sections where our monetary liability is greater than 100,000 USD
image
image
Better summarized with Hybrid
Keyword QueriesQuery: XYZ Key contacts
image
image
No one does better
Keyword QueriesQuery: Non-compete clauses
image
image
Similar

Laying the Future Path

The blend of semantic & vector search with Azure Cognitive Search and GPT-4 is set to revolutionize how organizations handle massive contract-related data. Prepare to navigate this potential paradigm shift and leverage these tools for maximum benefit, reach out if you want to explore more. In our testing, currently both of the outputs are showing very close results. I will be posting more datasets with detailed domain scenarios by next week. We will try to test the outputs without Semantic ranking as well.

Reach out to us contact@neuralnext.ai for seeing this demo or building your own.

  • LLMs + Hybrid (Text + vector + Semantic Ranking) Search - For getting better results
  • Document Intelligence - Simplified