This is the second post in our Knowledge Graph RAG series.
Post 1 introduced how traditional text-only RAG fails on messy enterprise documents, and showcased webAI’s proprietary vision-plus-language knowledge graph solution.
Coming next, we take our KG RAG out of the lab and onto the factory floor, publishing industry-specific comparisons (manufacturing, aviation, healthcare) against leading LLMs and enteprise AI solutions.
Here in post 2, we detail our 94% accuracy win on the RobustQA benchmark, confirming webAI’s performance advantage even on pure text retrieval tasks.
RobustQA represents one of the industry's most rigorous benchmarks for evaluating text-based retrieval augmented generation (RAG) systems. Used across the AI research community to assess how effectively systems can retrieve and utilize relevant information to answer complex questions, RobustQA provides a standardized framework for measuring retrieval accuracy and generation quality.
Our preliminary results demonstrate 94%+ accuracy on RobustQA, significantly surpassing current state-of-the-art solutions. These initial findings represent a conservative validation of our Knowledge Graph (KG) RAG approach and strongly suggest even greater performance potential as we scale our testing.
This benchmark performance translates directly into tangible business value. Organizations using our KG RAG solution experience dramatically improved information retrieval accuracy, leading to faster decision-making, reduced operational errors, and enhanced productivity across knowledge-intensive workflows.
Consider a real world scenario: An aviation maintenance team needs to quickly locate specific procedures across thousands of technical manuals. Traditional RAG systems often fragment critical information during processing, leading to incomplete or inaccurate responses. Our approach preserves document integrity, ensuring maintenance personnel receive complete, contextually accurate guidance—potentially preventing costly delays or safety issues.
Industries experiencing the most significant impact include Aviation, Manufacturing, Healthcare, Legal Services, and Financial Services—sectors where document complexity and accuracy requirements are paramount. Early customer feedback consistently highlights our system's ability to handle intricate technical documentation while maintaining precision that traditional solutions simply cannot match.
Our RobustQA evaluation methodology differs fundamentally from conventional approaches, addressing a critical limitation that has plagued traditional RAG systems.
The RobustQA dataset consists of documents that are already quite small—typically paragraph-sized excerpts or smaller passages extracted from real world PDFs. Each document contains a "text" field representing an unstructured block of natural language. However, the standard RobustQA approach takes these already-small documents and further chunks them into even smaller segments of maximum 100 tokens each, fragmenting context and adding unnecessary complexity. The traditional HIT@5 metric then measures whether the correct answer can be generated from any of the top five retrieved chunks.
Here's where our approach diverges significantly: We treat each original document (passage) as a complete retrieval unit, eliminating the need for further chunking entirely. Our KG RAG pipeline recognizes that these documents are already appropriately sized and doesn't fragment them further.
From an engineering perspective, our approach searches within the top 5 complete documents/passages to find answers, rather than searching through artificially fragmented 100-token chunks. This dramatically reduces storage requirements while eliminating noise introduced by over-segmentation of content that's already at an optimal size. Our HIT@5 metric refers to whether the complete answer exists within any of the top five entire documents—preserving semantic coherence that chunking destroys.
We have tested our approach across the complete RobustQA dataset, with results showing consistently stable, high accuracy performance. Our comprehensive evaluation demonstrates the robustness of our method across diverse document types and query complexities.
This approach delivers significant efficiency gains. By avoiding unnecessary fragmentation of already-appropriate document sizes, we reduce storage overhead while maintaining superior retrieval accuracy. Traditional systems create artificial boundaries within coherent passages, often splitting critical context that our method preserves intact.
While RobustQA provides valuable validation, we must acknowledge its inherent limitations. The benchmark evaluates only text-based retrieval, which doesn't capture the full scope of our KG RAG capabilities.
RobustQA cannot evaluate multimodal retrieval—our system's core differentiator. Real world documents contain complex visual elements: technical diagrams, data tables, charts, and interconnected multimedia content. Traditional text-only approaches fragment these relationships, losing critical contextual information.
The technical difference is profound: Text-only retrieval systems process documents linearly, often missing the spatial and visual relationships that convey meaning in technical documentation. Our multimodal approach maintains these relationships, enabling more accurate interpretation of complex materials where text, images, and structured data work together to convey complete information.
This limitation makes our upcoming real world tests particularly important—they will demonstrate our full multimodal advantage in scenarios that more accurately reflect actual enterprise document environments.
Having demonstrated strong performance across the complete RobustQA dataset, we're now implementing additional validation layers to reinforce and expand upon these comprehensive findings.
Our systematic approach includes cross-validation with alternative question sets, stress testing under various retrieval scenarios, and comparative analysis against other benchmark datasets. Each validation layer is designed to confirm the consistency of our high-accuracy performance across different evaluation frameworks.
From a technical process standpoint, we're conducting iterative testing cycles that extend beyond RobustQA to include domain-specific benchmarks while maintaining rigorous evaluation standards. This methodical approach ensures that our 94%+ accuracy represents sustainable performance across diverse evaluation contexts.
These expanded validation efforts will provide deeper insights into our system's performance characteristics while establishing comprehensive evidence of our approach's superiority over traditional chunking-based RAG architectures.
While RobustQA validation is crucial, our upcoming content will showcase where our KG RAG solution truly excels: real world, multimodal document processing.
Future posts will feature direct head-to-head comparisons against leading industry solutions, including ChatGPT O3, Claude, and specialized enterprise RAG platforms. These comparisons will demonstrate our performance advantages in realistic scenarios.
For example, in a recent test using F-18 technical flight manuals—documents rich with diagrams, tables, and complex technical specifications—our webAI KG RAG achieved 95% accuracy compared to ChatGPT O3's 80% accuracy. This 15-point advantage reflects our system's ability to process and interpret multimodal content that traditional text-based approaches cannot effectively handle.
We're preparing additional industry-specific demonstrations across aviation maintenance, manufacturing protocols, healthcare documentation, and legal case analysis. Each will illustrate how our unique multimodal capabilities translate into measurable performance improvements for real enterprise workflows.
To provide clearer insight into our methodology and enhance technical credibility, upcoming posts will include comprehensive visual documentation of our testing processes.
These assets will feature:
These visuals will clearly demonstrate our products performing technical retrieval tasks in real world scenarios, providing transparency into our methodology while showcasing the practical implementation of our KG RAG approach.
Our initial 94%+ accuracy on RobustQA strongly validates the effectiveness of our KG RAG solution and our fundamental approach of eliminating chunking-related noise. This benchmark success, achieved through our innovative page-based encoding strategy, demonstrates measurable advantages over traditional RAG architectures.
RobustQA provides legitimate interim validation while we prepare more comprehensive real world demonstrations. Our ongoing validation efforts will further reinforce these results, building toward conclusive evidence of our system's superior performance across diverse document types and enterprise scenarios.
The combination of strong benchmark performance and our unique multimodal capabilities positions us to deliver transformative value for organizations dealing with complex, information-rich documentation.
Want to learn more? Check out the video about our RobustQA benchmarking efforts.
Ready to see how our KG RAG solution can transform your organization's document processing capabilities?
Sign up for our upcoming webinar!
How Leading Manufacturers Are Using Private AI and Knowledge Graph RAG to Power SOPs, QA, and Inspections
The webinar will feature live demonstrations, explore real world enterprise applications, and show off detailed head-to-head comparisons with leading AI solutions. Experience firsthand how our KG RAG approach transforms complex document processing across industries.