Reports & Publications

Agentic AI Accuracy Benchmark Complex Document Comprehension vs. Competing Solutions

Sponsor: octonomy AI GmbH
Agentic AI Accuracy Benchmark Complex Document Comprehension vs. Competing Solutions

Abstract

octonomy AI commissioned Tolly to evaluate the accuracy of octonomy Agentic AI against three competing AI solutions in answering 50 complex knowledge questions derived from a production enterprise documentation library spanning 1,000+ pages of real-world materials  including annotated diagrams, performance curves, multi-variable data tables, and  cross-referenced specifications.

 

The questions were specifically designed to require interpretation of complex source material, the kind of documentation found across every industry, rather than simple text extraction. The benchmark spanned four question complexity categories testing distinct AI reasoning capabilities: multi-document reasoning, precision data extraction from graphical sources, visual and spatial interpretation, and complex structured data navigation.  

 

The majority of answers could only be obtained by reading values from graphs, interpolating  between data points on curves, cross-referencing information across multiple documents, or  interpreting annotated drawings. These are challenges that mirror complex knowledge work  across every industry and vertical.  octonomy AI accurately answered 96% of the questions where the accuracy of the other solutions evaluated ranged from 58% down to 26%.