Verified context
Restormel Connect serves knowledge-graph context to AI applications with a verification chain attached to every claim. This page defines exactly what "verified" means on this API, which guarantees the pipeline enforces, and — most importantly — how you can check each one yourself. It is written for the person auditing a deployment, not only the developer integrating it.
supported carries the quoted evidence span, its character offsets, and a content hash of the exact source version it
was bound against. A skeptical reader can open the cited source and check the quote themselves. If a
surface cannot show that chain, it does not say "verified".At a glance
| Guarantee | Mechanism | How you check it |
|---|---|---|
| A claim is never "supported" without locatable evidence | Deterministic evidence binding (quote + offsets + source-version hash) | Re-find the quote at its offsets in the cited source; compare the hash |
| Misattributed claims are structurally caught | Binding runs against the cited source only — a quote from elsewhere fails | The envelope's evidence is empty and the state is not supported |
| A missing verdict is never a pass | Fail-safe coverage finalizers: omitted or unparseable verdicts become coverage gaps | Unjudged claims surface as unverified, never as supported |
| Uncertainty is flagged, not blended | The judge may abstain; abstention and low confidence route to review | Per-claim state + verification_summary counts on every response |
| Every retrieval is auditable after the fact | Provenance trace recorded per query (included and excluded claims, with reasons) | Export the trace: GET /connect/v1/traces/{trace_id}/export?format=json |
| Graph quality is held to a published bar | G2 gate: ≥ 90% supported, ≤ 2% unsupported across validated claims | Quality report on every ingest job; webhook on threshold breach |
| Scoring rules are inspectable, not implicit | Versioned verification rule sets (six weighted dimensions, named policies) | GET /connect/v1/verification-rules |
What "verified" means: the five states
Verification is two-layered, per the Evidence-Bound Verification design. Layer 1 is deterministic and model-free: at ingest, every extracted claim must bind a quoted evidence span to exact character offsets in the cited source version, recorded with that version's SHA-256 content hash. Anyone can re-run this check at any time — if the source changed or the quote is not where it was bound, the check fails. Layer 2 is a narrow entailment judgment: a model is asked only "does this bound span entail this claim?", and it may abstain. The judge runs on a different model family than the extractor, so the system that writes claims never grades its own work. Every verdict is recorded with the judge's model id, prompt version, and timestamp, append-only — re-judging adds history, it never rewrites it.
| State | Meaning | Requires |
|---|---|---|
supported | Evidence-bound and entailed | Layer 1 pass and Layer 2 entailed |
inferred | Entailed, but no directly bound span — always labeled as inference | Layer 2 entailed; Layer 1 partial |
unverified | Judge abstained, low confidence, or no bindable evidence | Routed to the human review queue |
contradicted | Evidence entails the negation | Review; excluded from strict retrieval |
excluded | Remediation or operator decision | Reversible soft-exclude — the record is kept, out of active use |
The asymmetry is deliberate: a claim with no locatable evidence in its cited source can never be supported, whatever any judge said about it. This is what closes the
misattribution hole — a claim that is true somewhere else in the corpus but cited to the wrong source fails
the deterministic binding, no judgment required. Graphs verified before this design (or imported from
elsewhere) are normalized through the same rule: a legacy-affirmed claim without a bound span is served as inferred at best.
The verification chain on every response
Retrieval responses (POST /connect/v1/retrieve and POST /connect/v1/graph) carry a verified-claim envelope per returned unit, plus a per-state summary in the response metadata so a consumer can gate on "anything
non-supported in this context?" without scanning every claim:
{
"claim": { "id": "claim:7rk2…", "text": "Virtue is a mean between two vices." },
"state": "supported",
"evidence": [
{
"quote": "virtue is a mean between two vices, the one involving excess, the other deficiency",
"offsets": [18204, 18289],
"source_ref": "source:nicomachean_ethics",
"source_hash": "9f2c41…e7",
"match": "exact"
}
],
"judge": {
"model": "gemini-2.0-flash",
"prompt_version": 1,
"confidence": 0.93,
"at": "2026-06-10T12:04:11.000Z"
},
"citation": "Nicomachean Ethics, Book II",
"trace_ref": "/connect/v1/traces/3f6f9a3a-…",
"trust_score": 88
} Three honesty rules govern this envelope. Evidence is never fabricated: if a span could not be bound, the
array is empty and the state says so. Judge attribution is never invented: if a claim has not been judged,
the field is omitted. And anything looser than an exact quote match is labeled
(normalized or fuzzy), never hidden.
Requests can also pass require_verified (or an explicit verification_policy) to exclude non-supported claims from the context
entirely — exclusions are then counted and recorded in the trace, not silently dropped.
Fail-safe gates in the ingest pipeline
The pipeline that builds the graph (extract → relate → group → embed → validate → remediate → store) fails safe, not open. The gates an auditor should know about:
- Coverage finalizers. Validation and entailment run in batches against live models, and models sometimes omit items or return malformed output. Any claim the judge did not return a verdict for is finalized as a coverage gap — recorded as an abstention and routed to review. An omission can never default to a pass.
- Abstention is an outcome, not an error. The entailment judge is explicitly allowed to answer "cannot verify". Abstentions and low-confidence verdicts land in the review queue; they are never laundered into a softer passing grade.
- Remediation cannot resurrect. Claims flagged weak or unsupported go through a repair pass; repaired text must re-bind its evidence before it can return to circulation, and claims that cannot be supported are soft-excluded — reversibly, with the record and its history retained.
- Verification cannot silently rot. Because the Layer-1 check is deterministic over hashed content, it is re-runnable at read time. If a source version changes, bindings against the old hash fail the re-check rather than continuing to vouch for text that no longer exists.
The validator itself is measured against planted ground truth, not assumed. The most recent benchmark
(2026-06-10, cross-model routing: extraction on gpt-4o-mini, validation on Llama-3.3-70B) measured 100% recall on planted fabricated and misattributed
claims, a 14.5% false-flag rate on known-good claims, and 0% affirm rate on claims the validator was never
shown — that last probe being the direct test that the fail-safe gates hold under real model behaviour.
Numbers are point-in-time, tied to those model versions, and re-measured when models or routes change.
The G2 quality bar
A graph is not "done" because ingest finished; it is done when it clears the published bar. The G2 gate
requires at least 90% of validated claims supported and at most 2% unsupported. Every ingest job's quality report states the trust score (0–100)
and the supported / weak / unsupported breakdown, so the bar is checkable per run rather than asserted
globally. The trust score weighs verification coverage and embedding coverage most heavily, alongside
structural health (orphan rate, vector index presence, relation balance), minus a penalty for
high-severity issues. You can register a webhook (job.quality_below_threshold)
to be notified when a run lands under your threshold — quality failures are pushed to you, not buried in a log.
Provenance traces and export
Every retrieval query produces a structured audit trace answering "why did the agent get this context?" — the question regulators and internal audit actually ask. The trace records the query, the verification policy in force, the seed claims chosen, the graph expansion, and a per-claim verdict for everything the engine considered: included claims with their verification state and trust score, and excluded claims with the reason they were dropped (verification gate, confidence gate, duplicate). Traces are retained for 90 days and are workspace-scoped.
GET /connect/v1/traces/{trace_id}— the versioned trace documentGET /connect/v1/traces/{trace_id}/export?format=json— downloadable export for audit files
The trace_ref on every verified-claim envelope links the claim back to the
exact query that served it, so a finding in an AI system's output can be walked back to context, claim,
evidence span, and source version in four steps.
Verification rules are public configuration
The reasoning-quality scoring behind verification is not an implicit prompt; it is a versioned rule set. Each rule set defines six weighted dimensions — logical structure, evidence grounding, counterargument coverage, scope calibration, assumption transparency, internal consistency — with per-dimension passing thresholds and named policies (strict / balanced / lenient) that set the overall pass and weak-claim thresholds. Workspaces may override weights per domain pack; the override is itself inspectable.
GET /connect/v1/verification-rules— the rule set active for your workspaceGET /connect/v1/verification-rules/built-in— the built-in "Restormel Core v1" definition
Auditing a claim yourself
- Retrieve context and read the
verified_claimsenvelopes; checkmetadata.verification_summaryfor anything non-supported. - For any claim, take
evidence[0].quote,offsets, andsource_hash. - Fetch the cited source version and confirm the quote sits at those offsets and the content hashes to the recorded value. A mismatch is a finding — the claim should not be in a supported context.
- Follow
trace_refand export the trace; confirm the verification policy and that exclusions carry reasons. - Fetch the active verification rules and confirm the policy thresholds match what your deployment claims to enforce.
Related actions
- API reference — the OpenAPI spec covering retrieve, graph, traces, and verification-rules endpoints
- Connect first graph onboarding — build the graph these guarantees apply to
- Dashboard home — ingest setup, graph store, and quality reports for your workspace
Engineering reference: docs/decisions/evidence-bound-verification.md in the repository is the
canonical design record this page summarizes; the envelope's canonical schema is @restormel/contracts (verified-claim.ts).