Architecture Overview
The integrity layer for AI-agent-led software delivery · April 2026
1. The technical situation
Software delivery produces evidence in four siloed systems: requirements (tickets, specs, docs), code (the repo), builds (CI), deployments (the pipeline). Agent-led delivery adds two evidence streams on top: agent sessions (which agent authored which code, in which session) and commit trailers (the structured agent attribution carried through git history into the build). Every stream anchors to the file path within a repo and within a commit, which is the entity Idora joins them through. No system holds the joined record connecting a requirement to the artifact built from the code that satisfies it, by which agent, in which session; that record is reconstructed from six sources every time the question is asked.
Idora’s architecture treats the joined record as a first-class artifact: a context graph where every step writes a tamper-evident receipt, joined through the File node that already exists in both the verification stream and the execution stream. The proof that a requirement was satisfied in production becomes a single graph traversal once the full receipt chain is populated.
The graph compounds. Every push writes new verification and execution receipts, and under agent-led delivery push velocity is multiples of human-paced delivery, with agent sessions producing pushes at agent cadence rather than human cadence. Established Seam–File mappings accumulate routing history (prior queries, file paths, prior determinations), routing repeat verifications to the warm path where the LLM reads known files directly rather than searching. The graph gets denser and operationally cheaper per push; switching cost grows monotonically with usage and accelerates under agent volume. The architecture is designed to compound, not just to capture.
Most context graphs expose data and let callers reason over it. Idora returns structured decisions and carries graph-state metadata (ingestion lag, graph maturity, schema version) so callers can gate confidence on the answer’s substrate. This shifts the integrity guarantee from “the data is correct” to “the decision is correct given the substrate at this moment”: auditable, reproducible, version-bound.
2. The architectural keystone: the File bridge
Verification receipts EVALUATED files. Execution receipts CONSUMED files. The file path is the join key. Both streams write into the same Neo4j graph; the file path is already shared between them. Idora doesn’t construct the relationship. It reveals it.
This is what makes the cross-stream provenance claim a single graph traversal:
MATCH path = (rs:RequirementSource)-[:CONTAINS]->(s:Seam)
<-[:VERIFIED_BY]-(v:Receipt {type:'verify-code-generation', determination:'conforms'})
-[:EVALUATED]->(f:File)
<-[:CONSUMED]-(b:Receipt {type:'build'})
-[:PRODUCED]->(a:Artifact)
WHERE rs.externalId = 'PROJ-456'
RETURN pathOne pattern, one traversal, end-to-end provenance from requirement to deployed artifact. In a relational store this requires a multi-table JOIN across specs, receipts, files, and artifacts. In Idora’s graph it is a single Cypher pattern executed against a graph that gets denser with every push. The same pattern with determination: 'does_not_conform' traverses the violation chain at incident time, surfacing which file’s verification failed and which deployed artifact carried the violation forward.
The business consequence: today, when a regulator or board member asks “show me proof that requirement X was implemented and shipped to production, by which agent, in which session,” the answer requires manual reconstruction across six evidence streams and typically takes days to weeks. Idora makes this a single API call returning a verifiable proof chain. Compliance documentation moves from human-mediated reconstruction to graph traversal.
Every other architectural commitment in Idora exists to protect this bridge: the receipts must be tamper-evident or the proof is a reconstruction, not a proof; the envelope must be read-only or the graph’s contents are no longer authoritative; the surface must return decisions rather than raw data or callers re-do the traversal in their own code; the Skill must translate natural language to structured calls or the determinism erodes at the consumption boundary.
3. Three layers, one direction of flow
Consumption layer. The v1 surface has two consumption clients of the same envelope. Enterprise coding agents (Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI) consume the envelope at edit time through structured tool calls, with Idora Skills mediating contextual intent into the appropriate call shape. The Idora Agent (Idora’s own MCP client, calibrated for human query at incident, release review, and audit time) loads caller-context Skills (incident_traversal, release_review, compliance_query) that translate human queries into structured envelope calls. Idora Skills are SKILL.md packages shipped to Anthropic’s open Agent Skills standard, distributed through Anthropic’s Skills Directory and Vercel’s skills.sh. CI direct consumption is architecturally supported by the envelope and roadmap-paced as a separate consumption pattern; v1 release-time consumption flows through the Idora Agent.
Public MCP envelope. Streamable HTTP transport, OAuth 2.0 device flow, tenant-scoped requests, per-tenant rate limits, audit log on every call. Audit log entries are themselves Receipts in the graph — every consumption event of the integrity graph becomes part of the integrity graph it queried. Routes to seven tools and two resources. Every tool returns a decision, not raw data. Two resources, pipeline_status and schema, are URI-addressable read-only references — pipeline_status exposes ingestion state for cold-start awareness, schema exposes the current receipt and graph schemas for Skill builders and tooling. Every response carries a metadata block: ingestion_lag, graph_maturity (push count and days active), evaluator (model and prompt version), envelope_version, schema_version. The envelope is read-only.
A representative tool response (example values):
{
"tool": "release_trust",
"input": {"ref": "v2.1"},
"result": {
"determination": "does_not_conform",
"confidence": "high",
"blockers": [
{"requirement": "AUTH-007",
"since": "2026-04-10",
"proof_chain_id": "rcpt:7f3e...",
"code_provenance": {"framework": "claude-code", "session": "sess_01HXKM2R8N9P3Q5V"}},
{"requirement": "DATA-014",
"since": "2026-04-12",
"proof_chain_id": "rcpt:9c2a...",
"code_provenance": {"framework": "cursor", "session": "sess_01HXM7K3VP9Q2W4N"}},
{"requirement": "API-031",
"since": "2026-04-15",
"proof_chain_id": "rcpt:3e7f...",
"code_provenance": {"framework": "claude-code", "session": "sess_01HXN8R5XS2P7T6M"}}
],
"conforming_count": 28,
"total_required": 31
},
"metadata": {
"ingestion_lag_seconds": 12,
"graph_maturity": {"push_count": 847, "days_active": 102},
"evaluator": {
"model": "claude-sonnet-4-20250514",
"prompt_version": "2026-04-21.1"
},
"envelope_version": "1.0.0",
"schema_version": "1.0.0"
}
}The metadata block is non-decorative. Callers, particularly agents, gate confidence on it. A response from a graph still in its first week of compounding reads differently from one six months in.
Context graph. Neo4j AuraDB. Nodes for RequirementSource, Seam, File, Receipt, Artifact. Edges for CONTAINS, MAPS_TO, VERIFIED_BY, EVALUATED, CONSUMED, PRODUCED, PARENT_OF. The File node is the bridge described in §2. The graph compounds: MAPS_TO edges accumulate verification_count on every push, making established connections cheaper to verify and the moat structurally harder to replicate over time.
The flow reads down. Intent enters at the top, structured calls cross the envelope, traversals execute against the graph, structured decisions return, the consuming agent narrates back to the user. Enterprise coding agents narrate edit-time grounding into the developer’s IDE session; the Idora Agent narrates incident, release, and audit responses into the human reviewer’s workflow.
4. The receipts that ground every claim
Most integrity claims in software delivery rest on logs (mutable, redactable) or attestations (signed but not chained). Idora’s claims rest on content-addressed receipts joined through a graph: every receipt grounded in cryptographic identity, every traversal grounded in receipts. The proof is structurally honest because the substrate is.
The graph is only as authoritative as the evidence it holds. Idora writes two types of tamper-evident, content-addressed receipts on every push.
Verification receipt, produced when a requirement is evaluated against code:
{
"id": "rcpt:7f3e8a2c...",
"schema_version": "1.0.0",
"type": "verify-code-generation",
"commit": "a3f8c1d",
"determination": "does_not_conform",
"confidence": "high",
"verification_dimensions": {
"mapping_clarity": "high",
"verification_clarity": "high"
},
"spec_section": "§auth.rotation",
"spec_file": "specs/gates/AUTH_SPEC.md",
"requirement_text": "token rotation every 24h · no static keys in production",
"implementation_description": "no rotation logic found · static keys present at L47, L82",
"inputs": [
{"path": "src/services/auth.service.ts",
"hash": "a1b2c3d4...",
"code_provenance": {"framework": "claude-code", "session": "sess_01HXKM2R8N9P3Q5V"}}
],
"evaluator": {
"model": "claude-sonnet-4-20250514",
"prompt_version": "2026-04-21.1"
}
}Execution receipt, produced on every build, test, deploy:
{
"id": "rcpt:b8d4f1a9...",
"schema_version": "1.0.0",
"type": "build",
"commit": "a3f8c1d",
"exit_code": 0,
"command_line": "dotnet build src/Idora.Pipeline.csproj -c Release",
"inputs": [
{"path": "src/services/auth.service.ts", "hash": "a1b2c3d4..."}
],
"outputs": [
{"path": "bin/Release/api-server.so", "hash": "9f8e7d6c..."}
],
"parent_ids": ["rcpt:5a8c2e9c..."]
}The receipt’s evaluator.prompt_version field is the audit reference: any determination produced by Idora can be reproduced by replaying the same prompt version against the same code commit. The verification_dimensions field surfaces the verifier’s structured reasoning: Mapping Clarity (does the code map to the requirement?) and Verification Clarity (does the mapped code satisfy the requirement?), both anchored to a calibrated band system. When code_provenance is present on inputs, the receipt records the agent that authored the code being verified; Idora’s audit trail captures both sides of the verification: the code’s provenance and the evaluator’s identity. The id field on each receipt is its content-addressed identity: the SHA-256 of the canonical form. The same identifier appears in tool responses as proof_chain_id, making the chain from operational decision to grounding receipt directly traversable.
The two verification dimensions carry distinct epistemic load. mapping_clarity measures whether the code unambiguously maps to the requirement under verification (an explicit reference in the code, an architectural seam named in the spec, or a literal interpretation of the requirement’s identifier in the implementation). verification_clarity measures whether the mapped code provably satisfies the requirement, distinct from whether the mapping exists. A row reading mapping_clarity: explicit · verification_clarity: high means the verifier has both located the code that should satisfy the requirement and confirmed the code does satisfy it; a row reading mapping_clarity: explicit · verification_clarity: indeterminate means the mapping is found but the satisfaction is uncertain. The two-dimension separation matters because conflating them produces the documented LLM-as-evaluator failure mode where the verifier asserts conformance based on the presence of related code rather than its actual satisfaction of the requirement.
Each receipt’s identity is its SHA-256 hash. Modify any hashed field and the result is a different receipt with a different ID. The graph holds the original. There is no amendment path.
Receipt integrity is one half of the proof claim. The other half is the integrity of the verifications themselves. Idora’s verifier is engineered against documented LLM-as-evaluator failure modes: bias toward generative register, confidence inflation under uncertainty, hallucination on missing context. Verification prompts apply bright-line dimensions (mapping clarity, verification clarity) with anti-inflation discipline: the verifier defaults to lower confidence when evidence is ambiguous, and verification artifacts carry the prompt version that produced them so determinations are reproducible across prompt evolution.
This is what makes the proof a proof rather than a reconstruction. When idora.proof walks the chain from requirement to deployed artifact, every receipt along the path is tamper-evident by construction. The graph nodes the receipts reference (RequirementSource, Seam, File, Artifact) inherit integrity from the receipt chain that establishes them: a Seam without a verification receipt has no provable status; a File without an execution receipt has no provable participation in any build.
Confidence calibration is anchored to band systems with anti-inflation discipline across Seams (4-band), Findings (3-band), and Relationships (4-band).
5. Why the surface is fixed at seven tools
The envelope exposes seven tools (release_trust, file_context, proof, gap_detection, drift, browse, bulk_proof) and two resources (pipeline_status, schema). Nine Skills ship on top of the envelope at v1: five base Skills shipped with the @idora/skill package and four caller-context Skills (incident_traversal, release_review, compliance_query, reliability_query) that the Idora Agent loads to translate human queries at incident, release review, audit, and reliability-interrogation time. This is the entire v1 surface.
The constraint is designed, not provisional. Every tool returns a decision rather than raw data, which means callers act on conclusions rather than reasoning over graph payloads. Adding a tool means adding a new decision shape; the bar is whether it expresses a question that cannot be composed from the existing seven. No Cypher or general query language is exposed, because exposing one shifts the integrity guarantee from the graph to the caller’s query correctness. No natural language interface lives at this layer. Natural language belongs in the Skill, which translates intent into structured calls, and in the agent’s narration of structured responses to humans. Keeping the layers clean preserves the auditability of every determination.
Tools compose. A release_trust returning blockers naturally chains into proof for each blocker; a file_context query naturally precedes proof for the specific verification of interest; drift composes with gap_detection to surface what changed in coverage between two refs. Skills orchestrate these compositions on behalf of users. A surface this size is learnable in an afternoon, callable correctly the first time, and stable enough to be relied on across versions. For per-tool semantics, parameters, composition patterns, and use-case examples see The Product Surface.
6. What this architecture gives up
Architectural discipline costs something. The doc would not be honest without naming what.
Read-only MCP means agents cannot drive corrections through the same surface they query. When Claude Code surfaces a violation via release_trust, it cannot then call back through MCP to acknowledge or remediate. Remediation flows through the push pipeline (the developer fixes the code, the next push writes a new conforming receipt). This is the right tradeoff. Write-via-MCP would mean an agent could mutate the integrity graph it just queried, collapsing the tamper-evidence guarantee. But it does mean the consumption surface and the correction surface are distinct.
A fixed seven-tool surface means new query patterns require new tools rather than ad-hoc Cypher. The discipline of expressing every supported question as a structured decision is what keeps determinations auditable across versions. The cost: Idora moves slower on adding novel query shapes than a system exposing raw query access would.
LLM-produced confidence on graph entities is calibrated but not algorithmically derived. Confidence on Seams, Findings, and Relationships is anchored to band systems with anti-inflation discipline (4-band on extraction and relationships, 3-band on findings, matching the convention appropriate to each entity). This is more rigorous than the field standard but is not the same as a separate evaluator-agent pass producing independent confidence. Separate evaluator passes are deferred to future work if dogfooding shows systematic miscalibration.
The architecture compounds, which means new tenants start sparse. The graph’s value increases with push density. A tenant in week one has a structurally complete but evidentially sparse graph. Idora addresses this through the metadata block (callers gate confidence on graph maturity) and through the Skill’s narration patterns (any MCP client loading an Idora Skill softens framing when maturity is low). The early-tenant experience is a real product cost.
The File bridge has scope boundaries. The File bridge joins within a repo and within a commit. Verifications across file renames recover via cold-path discovery (the LLM searches for the renamed file and a fresh MAPS_TO edge forms) but the warm-path routing history doesn’t follow the rename automatically. Cross-repo file identity is Layer 2 scope. Logic that lives outside files (runtime configuration applied via cloud consoles, feature flags toggled at runtime, ephemeral consensus) is not captured in Layer 1; Layer 2 ingests external evidence sources to extend coverage. The File bridge is robust for file-mediated delivery, which is the dominant case in April 2026 software engineering, and honestly bounded for cases where the file is not the right unit of analysis.
Idora verifies decisions formalized in source: specs, requirements, architectural records, tickets. Conversational decisions that exist only in meetings or chat threads are out of scope. The product creates pressure to formalize, which is a feature for compliance and audit: more formalized decisions means more provable delivery. Decisions that aren’t written down can’t be verified by any system; Idora positions explicitly as the integrity layer for the formalized portion of decisions, which is the portion that matters for audit and regulatory contexts.
The verification engine currently runs on Anthropic’s Claude Sonnet 4 (claude-sonnet-4-20250514), chosen for literal-interpretation discipline that aligns with the anti-inflation verification posture. The model choice is tracked through prompt-version provenance on every verification artifact; when alternative evaluators become operationally competitive (or when model upgrades materially change calibration), the historical record preserves comparability. Model-strategy rationale and the calibration discipline for evaluator transitions are documented in the forthcoming Verifier Reliability Card.
The Verifier Reliability Card is the canonical methodological disclosure for evaluator behavior. Its structure carries: dogfooded corpus methodology and sample composition; false-conform and false-block rates broken down by requirement type and code complexity regime; calibration plots showing predicted versus observed conformance distributions; indeterminate distribution analysis including the operational floor and the reasons indeterminate determinations are produced; named failure modes the verifier is engineered against (generative-register bias, confidence inflation under uncertainty, hallucination on missing context); prompt version history with calibration deltas across transitions; a reproducibility harness allowing third-party replay of any verification at a specific prompt version against a specific commit. The card publishes ahead of design partner pilots and updates on a published cadence as the corpus grows and prompt versions evolve.
Idora runs Idora in production. The substrate’s receipts include every Idora push, every verification of Idora’s own requirements against Idora’s own code, every build and deploy of Idora’s own infrastructure. The operational discipline disclosed in this document (prompt version tracking on every verification, content-addressed receipts on every push, calibration anchored to band systems with anti-inflation discipline) is dogfooded discipline, not aspirational discipline. The Reliability Card’s evidence base at v1 launch is the dogfooded corpus produced by Idora’s own operational use.
These tradeoffs are deliberate. Architectural choices that don’t cost anything are usually choices that don’t matter.
7. Where the architecture is going
Layer 1 (today) generates evidence: verification and execution receipts joined by the File bridge, exposed through the seven tools.
Layer 2 extends scope to cross-repo and cross-service integrity, and ingests external evidence sources through the same receipt pipeline.
Layer 3 adds agent accountability receipts: every agent action at a pipeline boundary (commit creation, deploy approval, infrastructure mutation, configuration change) becomes a tamper-evident record carrying agent identity, action context, and the policy that authorized the action. The same receipt pipeline writes them; the same Cypher graph stores them; the same MCP envelope exposes them. This extends “what shipped matches what was decided” to “what the agent did matches what it was authorized to do.”
Each layer expands scope, not architecture. No new database. No separate product. The same graph, the same receipts, the same surface, extended.
For the substrate detail (node types, edge semantics, the file bridge in full graph context) see Idora Context Graph. For per-tool semantics and parameters see The Product Surface.