Unlocking AI's Power with Document Intelligence and Advanced Search
data:image/s3,"s3://crabby-images/4cf64/4cf649613128cfb3cc2d36b9566b77e64105701e" alt="image"
Problem Statement:
In the fast-paced business world, quick access to accurate and actionable information is crucial.
If you're a CXO, you understand the pain of waiting forever to gain insights from vital business documents such as Master Service Agreements (MSAs), Statements of Work (SOWs), Personnel files, Legal Documents, and countless others – and that's where Document Intelligence comes in.
What we wanted to achieve and Samples we used
We wanted to put some of the elements of Doc Intelligence to test, by using all the building blocks on Azure such as Form Recognizer, Cognitive Search, Semantic ranking, + GPT-4. Our goal was to compare results between using Hybrid Search (Cognitive + Vector) and Vector alone across different query types. We used the following 2 amazing references, also pre-requisites before you read this.
Enterprise Chat GPT sample : https://github.com/azure-samples/azure-search-openai-demo/tree/main/
and Hybrid + Semantic article (For understanding the Semantic ranker and different query types.)https://techcommunity.microsoft.com/t5/azure-ai-services-blog/azure-cognitive-search-outperforming-vector-search-with-hybrid/ba-p/3929167
Cognitive Search and Vector Search with Semantic Ranking
Vector + Text Search and semantic ranking are two central pillars of contemporary data analysis. Semantic search understands the intent and the context behind a search query to serve richer and more relevant results. On the other hand, vector search employs mathematical models to understand the underlying semantics of documents. When combined, these two techniques offer unparalleled precision and comprehensiveness in data analysis and document intelligence tasks.
Sample Data : 20 MSA documents created from data available from the internet.
System Prompt : We used all three options to compare the results for different kinds of queries. Here is the comparison. (Ref type of queries ). We used Semantic Ranker always
Results:
Hybrid (Vector + Text) +Semantic Ranker | Vector + Semantic Ranker | Results | ||
---|---|---|---|---|
Short Queries | Query : Payment terms for customer XYZ | ![]() | ![]() | Vector has better summarized results |
Short Queries | Query : Key contacts for ABC | ![]() | ![]() | Extremely Similar |
Concept seeking queries | Query: What do you think are the most critical pieces of information in the MSAs for the customer Elephant | ![]() | ![]() | Similar again |
Concept seeking queries | Query: What could be the areas we need to be most careful about across all the customer MSAs ? | ![]() | ![]() | Again, similar and ordering could be specific to different audience |
Fact Seeking queries | Query: Find all the Non-Technical , Project Management resources with greater than 10 years of experience, and their skillsets summarized. Look for this information across customers. | ![]() | ![]() | Very similar again, but much better organized and slightly more results in Hybrid. |
Fact Seeking queries | Query: Find all the sections where our monetary liability is greater than 100,000 USD | ![]() | ![]() | Better summarized with Hybrid |
Keyword Queries | Query: XYZ Key contacts | ![]() | ![]() | No one does better |
Keyword Queries | Query: Non-compete clauses | ![]() | ![]() | Similar |
Laying the Future Path
The blend of semantic & vector search with Azure Cognitive Search and GPT-4 is set to revolutionize how organizations handle massive contract-related data. Prepare to navigate this potential paradigm shift and leverage these tools for maximum benefit, reach out if you want to explore more. In our testing, currently both of the outputs are showing very close results. I will be posting more datasets with detailed domain scenarios by next week. We will try to test the outputs without Semantic ranking as well.
Reach out to us contact@neuralnext.ai for seeing this demo or building your own.
- LLMs + Hybrid (Text + vector + Semantic Ranking) Search - For getting better results
- Document Intelligence - Simplified