Perceptis -- Intelligence Profile -- The Caliper Lab

Intelligence Profile

Perceptis

AI-native consulting presentation platform. Built for strategy consultants, analysts, and professional services teams who need boardroom-ready slides structured around MBB consulting logic -- insight-led headlines, traceable claims, and editable PPTX output -- in minutes rather than hours.

Consulting Presentation AI MBB-Grade Storytelling Editable PPTX Output Traceable Sources SOC-2 Compliant Custom AI Twin per Org

Rich coverage

Q1 2026 -- Run #2
240 decks evaluated -- CaliperDeck-v1

Methodology note: Evaluating presentation quality requires human expert scoring alongside automated metrics. The scores below reflect a combined rubric: automated measures (claim traceability, structural consistency) plus expert panel scores from three former MBB consultants rating narrative quality and slide logic on a structured rubric. Frontier baseline is GPT-5.4 prompted with explicit SCR and MECE instructions.

Q3 2025

Q4 2025

Q1 2026

Q2 2026

Capability Assessment Independent -- Q1 2026

Perceptis is one of very few AI products where the quality of the output is primarily a function of thinking quality rather than data extraction accuracy. The relevant benchmark question is not whether the AI can find information, but whether it can structure an argument the way a senior consultant would.

Where the product leads

On narrative structure quality -- the core of Perceptis's value proposition -- the product outperforms GPT-5.4 prompted with MBB frameworks on the Lab's expert panel scoring rubric by 8.4 points. This is the only product in the Lab's current coverage where the product leads the frontier baseline on the most commercially important dimension. The insight-led headline discipline and situation-complication-resolution logic baked into Perceptis's pipeline produces more coherent executive narratives than raw frontier prompting on a consistent basis.

Narrative structure score: 84.2 vs. 75.8 for GPT-5.4 with explicit MBB prompting -- a +8.4 point lead. The product's structured pipeline enforces slide logic that general-purpose LLMs produce inconsistently even with careful prompting.
Claim-to-source traceability: 91.3%, above the 78% category average for AI presentation tools. Every factual claim is grounded in uploaded user material -- not hallucinated from public web data.
Messaging consistency (single "so what" per slide, executive summary alignment): 87.1%, above category average of 71%.

The frontier question

The frontier is improving at 2.8 points per quarter on structured narrative generation tasks. The gap between Perceptis and the frontier baseline on narrative structure is 8.4 points -- giving a theoretical compression timeline of approximately three quarters at current velocity. However, the product's durable advantage is not the narrative quality in isolation but the combination of custom organisational AI twin, template matching, and source grounding in a single workflow that a general-purpose model cannot replicate without significant prompt engineering overhead.

Frontier velocity on narrative structuring: +2.8 pts per quarter. Slower than on data tasks, but compressing.
The custom AI twin per organisation -- trained on the firm's past decks and style -- is the hardest component to replicate with a frontier model alone. It compounds with use and is a genuine switching cost driver.

Decision implication

For consulting firms and strategy teams, the relevant question is whether Perceptis saves senior consultant time on narrative construction -- the expensive hours -- rather than just formatting time. The panel signal and the narrative structure score both suggest it does. At 200+ strategy and consulting teams and a $3.6M seed round, the product is early-stage but has meaningful adoption signal in the target market. Buyers deploying for proposal generation and client deck production are in the product's current capability envelope. The custom AI twin feature means value compounds over time as the model learns the firm's style.

What the data does not yet cover

Complex multi-source synthesis -- decks built from 10+ uploaded documents -- has not been benchmarked. Single and dual source inputs are the basis for current scores.
Template fidelity for complex custom org templates (non-standard layouts, branded charts, specific colour systems) has not been tested. Standard template matching performs well; custom complex templates are an open question.
The Radar and Chatalyst products are out of scope for this benchmark. Scores cover slide and proposal generation only.
Panel signal covers 24 practitioners, all from small and mid-size consulting firms. MBB and Big Four deployment is not represented in the current panel cohort.

Benchmark Scorecard vs. GPT-5.4 (MBB-prompted) -- 240 decks evaluated

Scores combine automated metrics (traceability, consistency) with expert panel ratings from three former MBB consultants. Higher score = better performance. Frontier baseline is GPT-5.4 prompted with explicit SCR, MECE, and insight-led headline instructions.

Perceptis

Frontier (GPT-5.4)

Formula generation from natural language L1

91.4vs93.8-2.4

Error detection -- logical correctness L2

94.2vs95.1-0.9

Scenario and sensitivity build L3

82.7vs89.4-6.7

Cross-sheet model restructuring L4

67.3vs81.4-14.1

Analytical judgment and assumption-setting L5

54.1vs73.2-19.1

Vendor Claim Verification Source: perceptis.ai and public statements

"Consulting-level storylines" and "MBB-grade presentations"

partial -- context-dependent On narrative structure and messaging consistency dimensions, the product outperforms GPT-5.4 with MBB prompting. The SCR logic and insight-led headline discipline are genuinely better than unstructured frontier output. "MBB-grade" is a high bar -- the product produces consulting-appropriate structure reliably, but insight depth on complex strategic questions (where the 12.9 point deficit appears) falls short of what a senior MBB consultant would produce on a contested analytical question.

"Grounds every claim in your sources" -- trackable sources

verified Claim traceability of 91.3% -- the highest single-dimension score in the benchmark. The product does not fabricate from public web data; it works strictly from uploaded user material. This is a structural product decision reflected clearly in the output quality and is the strongest independently verifiable claim in Perceptis's marketing.

"Like Gamma, but for grown-ups" -- built for professionals who value narrative over design

positioning verified The positioning is accurate and reflected in the scores: PPTX output quality and template fidelity lead the frontier by 27 points, while the product leads on all three core consulting quality dimensions (traceability, narrative structure, messaging consistency). The distinction from consumer presentation AI tools is genuine and measurable.

Frontier intelligence

Frontier baseline -- GPT-5.4 (MBB-prompted)

71.4

Weighted avg -- consulting presentation quality rubric

Frontier velocity

+2.8 pts / qtr

Narrative structuring tasks -- steady

Narrative structure lead erosion

3 qtrs

At current velocity -- Q4 2026 for core dimensions

Perceptis leads the frontier on three of five benchmark dimensions. The durable advantage is the custom org AI twin -- trained on firm-specific past decks -- which a general-purpose model cannot replicate without equivalent institutional data. This compounds with use and represents a genuine switching cost.

Practitioner signal n=24 -- strategy and consulting teams

Output acceptance rate

81% +11pp

Verify before use

44% -9pp

Workflow abandonment

6% flat

Trust trajectory

Strong

Top correction type

Insight depth on complex questions

81% acceptance is the highest in the Lab's current professional services AI coverage. Declining verification rate -- practitioners are reviewing outputs less frequently -- is a strong trust signal for a product where narrative judgment is the core value.

Score trajectory Perceptis presentation quality score

Higher bar = stronger performance vs. frontier

Q3 25Q4 25Q1 26

71.4Q3 2025

76.8Q1 2026

Methodology

Dataset

CaliperDeck-v1 -- 240 decks

Baseline

GPT-5.4 MBB-prompted (Mar 2026)

Scoring L1-L2

Automated traceability + consistency scoring

Scoring L3-L5

3 ex-MBB expert panel -- structured rubric

Ground truth

Expert-constructed -- kappa 0.83

Run date

24 March 2026

Representative profile for discussion -- all scores and findings are illustrative, based on the Lab's published methodology applied to Perceptis's publicly stated capabilities. Presentation quality evaluation combines automated metrics with expert panel scoring -- see methodology note above. Full benchmark data will be published upon completion of the formal evaluation programme. thecaliperlab.com