Reports & Publications
Agentic AI Accuracy Benchmark Complex Document Comprehension vs. Competing Solutions
Login or create an account to download this report
Abstract
octonomy AI commissioned Tolly to evaluate the accuracy of octonomy Agentic AI against three competing AI solutions in answering 50 complex knowledge questions derived from a production enterprise documentation library spanning 1,000+ pages of real-world materials including annotated diagrams, performance curves, multi-variable data tables, and cross-referenced specifications.
The questions were specifically designed to require interpretation of complex source material, the kind of documentation found across every industry, rather than simple text extraction. The benchmark spanned four question complexity categories testing distinct AI reasoning capabilities: multi-document reasoning, precision data extraction from graphical sources, visual and spatial interpretation, and complex structured data navigation.
The majority of answers could only be obtained by reading values from graphs, interpolating between data points on curves, cross-referencing information across multiple documents, or interpreting annotated drawings. These are challenges that mirror complex knowledge work across every industry and vertical. octonomy AI accurately answered 96% of the questions where the accuracy of the other solutions evaluated ranged from 58% down to 26%.