webAI’s Knowledge Graph RAG bests ChatGPT o3 in Real World Manufacturing Test

August 4, 2025

Key Takeaways

webAI KG RAG dramatically outperformed ChatGPT o3 (95% accuracy vs. 60%) on complex multimodal manufacturing documentation.

Traditional RAG fails with multimodal documents, missing critical details from diagrams, tables, and images.

webAI’s proprietary graph-driven retrieval integrates visual and textual data, ensuring complete, accurate, and verifiable responses.

Manufacturers see real-world benefits: reduced downtime, fewer operational errors, and improved compliance readiness.

webAI’s Knowledge Graph RAG bests ChatGPT o3 in Real World Manufacturing Test

This is the third post in our Knowledge Graph RAG series.

Post 1 explored why traditional RAG falls short on multimodal enterprise documents and introduced webAI’s proprietary vision-plus-language knowledge graph solution.

Post 2 detailed our benchmark results, highlighting a 94% accuracy win on RobustQA.

Now, we move from benchmarks to real-world scenarios, testing webAI directly against OpenAI’s ChatGPT o3 on manufacturing-specific documentation.

‍

The TLDR: webAI Knowledge Graph RAG answered 95% of questions correctly in a head to head challenge against ChatGPT o3 about a complex manufacturing manual. o3 struggled mightily with the complex document and delivered correct answers only 60% of the time.

This isn’t academic benchmarking. This is real world, high stakes retrieval.

‍

Why Traditional RAG Struggles in Manufacturing

Manufacturing organizations rely heavily on complex, dense documents. These lengthy technical manuals, detailed SOPs, intricate schematics, and engineering diagrams aren’t simple blocks of text; they're packed with diverse visual and spatial information critical for safe and efficient operations.

Yet traditional retrieval-augmented generation (RAG) systems have fundamental flaws when dealing with these multimodal sources. These limitations include:

Image blindness and loss of visual context
Traditional RAG typically relies on basic Optical Character Recognition (OCR) to convert images into text, discarding spatial and visual cues. Ignoring these multimodal elements essentially throws away a thousand words of context with every neglected image. This can lead to critical misinterpretation, including missed essential warnings or incorrectly identified machine parts.
Fragmented, incomplete context
Legacy RAG systems often segment information into small text chunks, losing contextual coherence. In manufacturing documentation, vital instructions are frequently spread across diagrams, tables, and references (Knollmeyer et al., 2023). Recent analysis confirms this shortcoming, noting traditional RAG frequently provides incomplete or ambiguous responses, forcing engineers back into manual cross-referencing.
Maintenance overhead and slow retrieval
Conventional RAG solutions demand extensive vector databases and require cumbersome maintenance workflows. This leads to slow query response times, latency spikes, and brittle deployment processes that severely hamper operations and response times on the factory floor.

Manufacturers increasingly acknowledge these pain points as they experience operational inefficiencies and safety risks, such as extended downtimes, increased error rates, and slower response during critical incidents.

webAI’s Proprietary Knowledge Graph RAG Solution

webAI’s solution addresses these fundamental challenges head-on through our unique multimodal Knowledge Graph RAG approach:

Multimodal ingestion
Our platform integrates text, diagrams, tables, and images into a unified retrieval pipeline. Unlike traditional RAG, no information modality is excluded or ignored (Sobhan et al, 2025). This comprehensive ingestion ensures critical details from diagrams, machine schematics, and visual annotations remain intact and retrievable.

Structured graph-based retrieval
By representing multimodal content as interconnected nodes and edges, our knowledge graph directly connects related data points, enabling precise retrieval across multiple documents or sections. This structured approach ensures the system consistently finds and combines all relevant information, surpassing traditional methods that depend solely on approximate text similarity.

‍

‍Explainability and source citation
Every answer generated by our system points back to precise document pages and graph nodes, providing complete transparency and reducing the risk of hallucination or false information. This traceability is critical for compliance, audits, and trust.

‍

*Visualization of webAI’s KG RAG graph for a complex manufacturing manual*

Test Setup: webAI vs. ChatGPT o3

We wanted to evaluate our solution in a real-world manufacturing context, so Senior Solutions Engineer Keith Tenzer conducted an extensive test using a detailed 400-page milling operations manual from Haas Automation, Inc.

Haas builds some of the most popular and widely used CNC milling machines in the world. The manual was representative of the type of complex documentation regularly encountered in industrial manufacturing settings, packed with diagrams, operational tables, and procedural instructions.

Keith executed a side-by-side evaluation, comparing:

webAI’s Knowledge Graph RAG: running locally on an Apple MacBook with a Llama-4 open-source model, against
ChatGPT o3: OpenAI’s state-of-the-art reasoning model, running on cloud infrastructure with trillions of parameters

Both systems ingested the exact same milling manual. Keith then posed 20 practical, manufacturing-specific questions covering numeric lookups, image recognition, procedural explanations, and nested logical reasoning.

*Haas Vertical Mill in the VF series, retailing for over $200,000*

‍

Results at a Glance

Before we dive into the details, here's a quick overview of our evaluation comparing webAI’s KG RAG directly against ChatGPT o3:

‍

‍—Overall Accuracy—

webAI KG RAG: 19 / 20 (95%) ✔️

ChatGPT o3: 12 / 20 (60%) ✖️

‍

—Image-based Questions (5 total questions)—

webAI KG RAG: 5 / 5 (100%) ✔️

ChatGPT o3: 0 / 5 (0%) ✖️

‍

—Sources Provided—

webAI KG RAG: Every answer ✔️

ChatGPT o3: None ✖️

‍

ChatGPT o3 struggled exactly where legacy RAG research predicted: interpreting visual data and handling multi-hop context. webAI’s single incorrect response was a deliberate trick question designed as an edge case for the underlying LLM—not a limitation of our knowledge graph or retrieval approach.

‍

*The sort of complex manual content where traditional RAG struggles, and webAI’s KG RAG excels.*

‍

Comprehensive Review: Execution Highlights and Key Insights

Let's unpack specific scenarios from our head-to-head test to understand precisely why webAI’s KG RAG significantly outperformed ChatGPT o3.

Numeric Table Lookup: Spindle Speed

Query: “What is the spindle speed of the milling machine?”

webAI KG RAG: Precisely identified "1,500 RPM," and included a clear page citation, enabling immediate verification.
ChatGPT o3: Also returned the correct "1,500 RPM," demonstrating it handles straightforward numeric lookups well enough when provided clearly structured textual data.

Insight: While basic numeric retrievals are within reach of both systems, webAI’s consistent source citation adds significant trust and auditability—an essential factor in manufacturing, where accuracy must always be verifiable.

*webAI delivering answer and citation quick and clean.*

Visual Identification: Power-Button Icon

Query: “Describe the visual icon used for the power button.”

webAI KG RAG: Correctly described it as "a black vertical line," directly referencing and linking to the original visual in the documentation.
ChatGPT o3: Responded with “The image is a green, square (slightly-rounded corner) push button with the words “POWER ON” in white, all caps, centered on the face.” Worse than an honest “I don’t know,” O3 delivers a convincing incorrect answer.

Insight: o3’s failure clearly demonstrates limitations typical of traditional text-centric RAG approaches, underscoring the importance of multimodal context. webAI’s embedded visual and textual data, fused within a single proprietary knowledge graph, allows the model to effectively interpret and retrieve spatial-visual information, preserving accuracy where traditional approaches fall short.

*Again, webAI delivering the answer and citation.*

‍

*o3 not knowing the correct answer and making something up.*

‍

Detailed UI Retrieval: Restore-Menu Options

Query: “List all options displayed on the machine’s restore-menu pop-up.”

webAI KG RAG: Accurately retrieved and listed all six items: "System data, User data, Programs, Offset, Macros, ATM & Network," each clearly linked back to the corresponding visual source within the manual.
ChatGPT o3: Provided an incomplete response—listed four options, missed the critical "Macros" option, and added extraneous, incorrect entries, reflecting confusion or misunderstanding.

Insight: Traditional RAG methods frequently fail when critical context resides in images or visual tables. webAI’s multimodal graph ensures visual context isn't stripped away or fragmented, offering comprehensive retrieval that aligns exactly with the source material, significantly reducing potential operational confusion or error.

Nested Logic Retrieval: WHILE Loops in G-code

Query: “Explain how nested WHILE loops are implemented in the milling machine’s programming language.”

webAI KG RAG: Provided a complete, precise explanation, including critical index identifiers required for nesting loops, accompanied by code snippets directly sourced from the manual.
ChatGPT o3: Offered a partially correct but incomplete explanation, omitting crucial details (index identifiers), rendering its answer imprecise and potentially misleading.

Insight: This scenario highlights the strength of knowledge graph-driven multi-hop retrieval. webAI’s structured graph not only retrieves the primary context but also naturally traverses related, nested details—essential in industrial settings where accurate implementation of programming instructions is safety-critical.

Edge-Case Reasoning: Trick Question on FOR Loops

Query: “Explain the syntax of a FOR loop in the milling machine’s programming language.” (Note: FOR loops do not exist in this language.)

webAI KG RAG: Responded cautiously, indicating uncertainty and acknowledging it couldn’t find conclusive evidence of a FOR loop, effectively avoiding falsehood.
ChatGPT o3: Answered correctly.

Insight: Though webAI marked this as uncertain, this scenario primarily tested LLM reasoning rather than retrieval. It demonstrates webAI’s advantage in mitigating hallucinations through rigorous grounding in documented reality. Even in uncertainty, webAI does not fabricate, which is crucial for trustworthiness in industrial compliance and safety contexts.

Summary of Execution Results

Across the complete set of queries, the results clearly showed webAI’s decisive advantage:

Overall Accuracy: webAI KG RAG correctly answered 19 out of 20 questions (95%), while ChatGPT o3 answered only 12 correctly (60%).
Multimodal Understanding: webAI achieved perfect accuracy on image-embedded questions, clearly outperforming o3’s text-only retrieval model, which achieved 0% accuracy on image-based retrieval.
Explainability and Transparency: webAI provided full source citations for every answer, a capability entirely lacking in ChatGPT o3.

The test confirms exactly what previous research highlighted: Traditional RAG implementations consistently falter when multimodal context, structured reasoning, and detailed precision are needed.

webAI’s Knowledge Graph RAG consistently preserved critical multimodal information, accurately navigated complex document relationships, and delivered precise, verifiable answers at lightning speed—directly addressing key manufacturing challenges traditional methods cannot overcome.

Want to see the full test? Check out Keith’s video walk through on YouTube.

Why These Results Matter on the Factory Floor

In manufacturing, accuracy translates directly into real operational impacts. The difference between webAI’s 95% and o3’s 60% accuracy is not merely academic. It’s mission-critical.

Specifically, this level of superior retrieval means:

Reduced Downtime: Faster, precise retrieval enables engineers and maintenance teams to find and act on critical procedural information immediately, significantly reducing unplanned downtime.
Lower Error Rates: Reliable multimodal retrieval reduces manual cross-checking, misunderstandings, and costly operational mistakes. BMW, for instance, reported a 20% reduction in unplanned downtime after adopting graph-powered AI solutions, demonstrating real-world ROI from improved accuracy and efficiency.
Compliance and Audit Readiness: webAI’s solution inherently provides clear, verifiable sourcing for every generated answer, simplifying compliance processes and ensuring documentation meets regulatory and safety standards.

Looking Ahead: A New Era of AI Retrieval in Manufacturing

This milling-machine manual test is only the first of many industry-specific head-to-heads planned. In the coming weeks, we will publish similar detailed evaluations across aviation maintenance, healthcare SOPs, and legal documentation—comparing webAI’s KG RAG directly against leading cloud-scale solutions.

The evidence is already clear: traditional RAG simply cannot keep pace with multimodal enterprise documentation. webAI’s proprietary fusion of vision and language into a single knowledge graph represents the next generation of retrieval solutions—more accurate, more reliable, and tailored directly to the complex demands of manufacturing and other industrial domains.

Join Us on August 27: See webAI’s KG RAG Live

Experience firsthand how webAI can transform your multimodal document workflows:

How Leading Manufacturers Use Private AI & Knowledge Graph RAG
August 27, 2025 · 2 PM EST
Live demonstrations · Real-world industry Q&A · Head-to-head AI comparisons

Register Now

Bring your own challenging PDFs. webAI’s Knowledge Graph RAG will handle the rest.

Scroll up

Author Information

Pete Willhoite