Structural Read · 01

Baz

Idora stakeholder brief · May 5, 2026 · 12 min read


TL;DR

Baz is the most architecturally adjacent product to Idora identified to date in the agent-era code verification space. Both treat requirements as first-class artifacts, both publish “verification layer” positioning, and both are designed against the thesis that the post-divide volume of AI-generated production code requires verification infrastructure that did not exist as a category before.

The architectural difference is structural, not closeable as a feature, and reflects when each system was designed against the substrate that defines AI software production.

Idora’s substrate joins two compounding receipt streams in one graph: verification receipts (decomposed requirements mapped to code) and execution receipts (build, test, and deploy artifacts content-addressed and tied to commits). Baz produces neither as durable substrate. This brief compares the two systems on the verification surface where they touch, while noting Idora’s substrate also joins execution receipts into the same graph.

The framing the brief uses throughout, pre-divide and post-divide, refers to the architectural shift in AI software production that became measurable through 2025 and reached public consensus in early 2026. Observable markers: AGENTS.md as a Linux Foundation standard (December 2025), Anthropic Skills format adoption across Microsoft, OpenAI, Cursor, Vercel, and Supabase, the SaaSpocalypse market reaction wiping $1T+ from SaaS market cap through April 2026, Thoughtworks Tech Radar Volume 33 placing spec-driven development in Assess (November 2025), and shipped production tooling for the new pattern (GitHub Spec-Kit, Tessl, OpenSpec, BMAD). The shift is observable in industry data, not a single technologist’s frame.

  • Baz was architected pre-divide (founded end of 2023, first product GA January 2025). Built against the requirement-source landscape and consumption assumptions of 2023-2024: tickets, design tools, knowledge wikis as inputs; humans installing GitHub Apps or GitLab integrations as the consumption pattern. Output centers on inline PR or MR comments with code locations, with extensions to sandboxed auto-fix and asynchronous incident-response PRs. Each evaluation against connected sources builds a temporary index from a small set of referenced documents.
  • Idora is post-divide infrastructure built using post-divide methods (founded 2025). Architected after the substrate shift, against a landscape that includes spec-as-code at scale (AGENTS.md, GitHub Spec-Kit, OpenSpec, BMAD, Tessl), agent-runtime distribution as the first-class channel (Skills directory, MCP envelope), and post-deploy verification persistence as a category that did not exist as a need before AI started writing production code at human-team volume.

The two architectures answer different questions. Baz answers: did this PR or MR satisfy the requirements its connected sources describe at the time of review? Idora answers: does the production system match what was decided across all sources of intent (tickets, design, in-repo specs, schemas) and does that record persist queryably across releases as substrate that platform engineering, agents, CI, audit, and trust-establishment surfaces all read from?

The two products are complementary in deployment, not competing for the same buyer or workflow position. A team can run both. Baz at the PR or MR gate, Idora as the post-deploy substrate. Baz is well-built for the pre-divide question it was designed against. Idora is purpose-built for the post-divide question that the SaaSpocalypse and the spec-driven-development wave together created.


What problem does the differentiation actually solve for, in May 2026?

The post-divide enterprise problem is not “do AI-generated PRs pass review?” That problem is well-served by Baz, CodeRabbit, Greptile, Qodo, and the model labs themselves. The problem the divide created, the one that did not exist as a category before AI started writing production code at human-team volume, is verification infrastructure that survives outside the PR moment.

Three concrete versions of the same problem:

  1. A platform engineering team running coding agents in production at scale. The agents repeat themselves on every task because they have no substrate to query about the production state. “What does the auth system do here” is re-derived from raw code by every agent on every run. The team is paying for re-derivation that compounds with agent volume, and the agents produce inconsistent answers because each run starts from scratch. This is the post-merge gap appearing as agent-runtime cost and as agent-output drift, felt today by teams running Claude Code, Cursor, and internal agent frameworks at production volume.
  2. A team practicing spec-driven development at scale has 200+ specifications in the repository: markdown specs, OpenAPI files per service, ADRs, schema files, AGENTS.md hierarchies. Each PR potentially touches a subset. The team’s coding agents need to know which specs apply to a given file before editing, and the team’s CI needs to gate on whether what shipped matches what was decided across the spec corpus. Per-PR temporary-index patterns cannot fit hundreds of specs into a model context window per evaluation, and the records they produce cannot be queried after merge. This is engineering hygiene at agent-coding volume, not future audit pain.
  3. A platform engineering team that wants its agents to be accountable for what they produce. When an agent writes code that ships and surfaces problems weeks later, the team needs to answer “what was the agent’s reasoning” and “what version of the model produced this code.” Per-PR review systems do not capture agent reasoning as queryable substrate. The team’s own agents cannot answer questions about prior agent work. Agent accountability requires substrate; per-PR architecture does not produce it.

These three are different surface manifestations of the same architectural need: a persistent graph substrate that decomposes intent into atomic verification units once, maps them to code persistently, and traverses selectively per query. That substrate is what Idora is built on. The post-merge gap is felt today by platform engineering teams running coding agents at scale, observable in agent re-derivation cost, agent-output drift, and the inability to answer cross-release questions about the team’s own production system. The substrate is also the primitive that makes verification economically and architecturally workable at the volumes the post-divide world produces.


Companies at a glance

DimensionBazIdora
FoundedEnd of 20232025
HeadquartersTel Aviv with NYC presenceHouston, Texas
Stage as of May 5, 2026$8M seed (January 2025); no public Series AStealth; no capital raised
Headcount24 employees (PitchBook)Founder plus core engineering
InvestorsBattery Ventures and Boldstart Ventures co-led the seed; Vermillion, Secret Chord, Fusion VC participatingNone (pre-fundraise)
External validationCode Review Bench #1 in Precision (February 2026), ahead of OpenAI, Anthropic, Google, and CursorIn production internally; design partner pilots opening

The asymmetry is real and worth surfacing. Baz is materially further along on capital, headcount, market validation, and customer references. Idora is materially further along on architectural specificity for the post-divide problem, distribution model native to agent runtimes, and a substrate designed to compound across releases rather than evaluate per-PR.

The argument the brief makes is not that Idora is at parity with Baz today on stage and resources. The argument is that Idora’s architectural foundation is built for a different problem, one that emerges with measurable urgency in the post-divide window, and that the foundation is not closeable as a feature addition for any system designed against the pre-divide question. Stage and resources matter for go-to-market velocity. Substrate matters for what the system can answer, which is the question this brief examines.


What Baz does

Baz positions itself as “agents that operate your codebase” (baz.co, May 2026): a portfolio of coding agents covering code review, bug detection, standards enforcement, security fixes, and incident response. The primary consumption surface is the code-review boundary on GitHub and GitLab, where output across review-time agents is the same shape: an inline PR or MR comment with code locations and, for behavior-dependent checks, sandbox screenshots attached.

Products: Reviewer (AI code review on every PR / MR), Custom Reviewers (auto-generated from past review conversations), Spec Reviewer (requirements from connected ticketing systems, design sources, knowledge bases), Spec Reviewer for Backend (sandbox runtime evaluation against spec), Pixel Perfect (Figma-to-code visual verification), Fixer (sandboxed auto-fix agent that applies and validates fixes in an ephemeral runtime), SRE-Agent (asynchronous incident-response PRs), AI Coding Guidelines (AGENTS.md ingestion as auditable review guidelines, December 2025).


Requirement sources: comparison

What each system documents as a supported requirement input:

Requirement sourceBaz (documented)Idora (documented)
Jira
Azure DevOps
Linear✓ (May 2025)
Monday
Shortcut
YouTrack
Figma (design)✓ via MCP
Notion
Fibery
In-repo markdown specsNot documented✓ as RequirementSource node
OpenAPI / SwaggerNot documented✓ as RequirementSource node
Schema files (proto, GraphQL SDL)Not documented✓ as RequirementSource node
Architectural decision records (ADRs)Not documented✓ as RequirementSource node
RFC-style design docs in repoNot documented✓ as RequirementSource node
AGENTS.md hierarchies✓ via AI Coding Guidelines reviewer (December 2025)✓ as RequirementSource node

Idora ingests the external requirement sources Baz supports, treated identically as RequirementSource nodes alongside the in-repo spec corpus. The ticketing-and-design surface is not a Baz capability that Idora lacks; it is part of the substrate Idora handles, plus more.

The “Not documented” entries above reflect a verified absence across baz.co, docs.baz.co, baz.co/changelog, and baz.co/resources as of May 4, 2026. Baz’s published documentation describes ingestion of external systems with bounded per-PR scope. The pattern, per the Baz changelog, is “the agent will build a temporary index with all the referenced documents to build a single checklist to validate if code was implemented according to spec.” This is a per-PR index of a small referenced subset, not a persistent corpus-scale substrate.


Product surface: comparison

What each system exposes for consumption, and who is documented to consume it:

Surface dimensionBaz (documented)Idora (documented)
Primary distribution channelGitHub Marketplace (verified GitHub App) and direct GitLab integration (September 2025)Anthropic’s plugin marketplace (Skills) and Public MCP envelope
Primary product surfaceGitHub App or GitLab integration installed by humans on a repositoryRead-only HTTP MCP envelope: seven tools, two resources, OAuth, tenant-scoped, audit-logged
Output shapeInline pull request or merge request comments with code locations and (for behavior checks) sandbox screenshotsStructured determinations: conforms, does_not_conform, or indeterminate, with proof chains; receipts ground informational and audit queries
Documented primary consumerHuman reviewers reading PR or MR comments inside GitHub or GitLabAgents, CI pipelines, and any MCP-compatible runtime calling structured arguments directly
Skills role in the productBaz ingests skill directories as a first-class authoring unit (January 2026); review prompts ship through the Awesome Reviewers registry; consumption remains via the GitHub App, not via Anthropic’s Skills surfaceIdora Skills are a first-class consumption mode for the product, sharing one surface with direct MCP envelope calls
CLI documented usage“AI-assisted manual code review”, humans walking through PRsNot the primary surface; envelope is the primary surface
MCP server documented usage“High-signal, secure AI code reviews in any IDE or terminal”, human-driven IDE/terminal sessionsRead-only HTTP transport, programmatic consumption by agents and CI
Documented post-deploy / programmatic determination surfaceNot documentedPublic MCP envelope returning determinations callers act on
Documented historical-comparison query surfaceNot documented as a programmatic query surface; Engineering Impact dashboard provides team-level historical tracking via UIMCP envelope tools return cross-release queries (drift detection between two refs, proof chains for any deployed artifact, release readiness against governing requirements)

The pattern is consistent across every dimension. Baz’s documented surfaces terminate at human consumption: humans install the GitHub App or GitLab integration, humans read PR or MR comments, humans walk through reviews in the CLI, humans invoke reviews from IDEs through the MCP server. Idora’s documented surfaces are consumed primarily by non-human callers: agents read determinations before editing, CI pipelines gate on them, MCP-compatible runtimes call the envelope with structured arguments. Reviewers are named as one of three consumption personas in Idora’s published documentation, alongside agents and CI.

This is the product-surface manifestation of the same divide framing applied to architecture. Baz’s surface reflects a pre-divide assumption: humans install tools, humans read outputs. Idora’s surface reflects a post-divide assumption articulated in Idora’s own published essay “The New Scarcity”:

The integrity layer needs to be where agents already look for capability, not where humans go to download a SaaS app. Agent-runtime distribution is not a future channel; it is the channel.

Baz uses Skills on the authoring side: skill directories are ingested as the unit of authoring for internal AI reviewer rule sets (January 2026), and the Awesome Reviewers registry distributes review-prompt content to Anthropic’s Skills ecosystem. Consumption of the Baz product itself remains via the GitHub App and (since September 2025) GitLab integration, not via Anthropic’s Skills surface. Idora’s Skills usage is consumption-side: Idora Skills load into Claude Code, Cursor, and other Skills-compatible runtimes as the primary surface through which agents and humans query the substrate.


What “running both” looks like operationally

A team adopting both systems installs Baz as a GitHub App or GitLab integration and configures Idora to read from CI execution events. The two systems are fully independent today: no integration touchpoints, no shared state, no conflicts.

Baz attaches at the code-review boundary. Reviewers open a PR or MR, Baz comments inline on unmet requirements with code locations and (for behavior checks) sandbox screenshots. The artifact is a PR or MR comment thread in the host’s data layer; the consumer is a human reviewer.

Idora attaches at the CI execution layer post-merge. Push, build, test, and deploy events produce content-addressed receipts in the integrity graph. The artifact is a queryable proof chain; the consumer is an agent reading determinations before editing, a CI pipeline gating on conformance, or a reviewer attaching receipt chains to audit packages.

The two systems produce different artifacts for different consumers at different points in the software production lifecycle. There is no overlap in workflow position, no integration burden to coordinate, and no operational cost to running both. A team can adopt one, the other, or both without architectural conflict.

Idora runs on Idora. Every push the Idora team makes flows through the same integrity graph that customer pushes do, every release runs through the same determination surface, and every drift between spec and deployment surfaces first in our own substrate. Substrate trust is structural, not rhetorical.


Where Baz overlaps Idora

Real overlaps, named directly:

  1. Both treat requirements as first-class artifacts mapped to code.
  2. Both produce structured outputs with pointers to specific code locations.
  3. Both reach across the requirements-to-runtime axis (tickets, PRs, sandbox observation).
  4. Both engage with the Skills format under Anthropic’s open standard (Idora as a consumption surface; Baz as authoring-unit ingestion and Awesome Reviewers registry distribution).

Where Idora is structurally different

The pre-divide / post-divide framing locates these distinctions architecturally; what follows are the specific structural differences within that frame. Four distinctions, ordered by architectural durability:

  1. Human-tool distribution vs agent-runtime distribution. Baz’s primary distribution is GitHub Marketplace, where engineering managers install the GitHub App. Idora’s primary distribution is Anthropic’s plugin marketplace (Skills) and the Public MCP envelope, where agent runtimes load capability. The two products are distributed through different channels because they were designed for different primary consumers.
  2. Pre-merge gate vs post-deploy substrate. Baz operates at the pull request or merge request boundary; once a PR or MR merges, Baz’s record is done. The artifact is a PR or MR comment in the host’s data layer, assembled from a temporary index per evaluation, framed as suggestion rather than determination. Idora produces receipts that persist indefinitely after deploy. The artifact is a content-addressed, tamper-evident receipt in a persistent graph substrate, decomposed from the full requirement corpus once and reused via traversal, returned as a determination downstream systems can gate on programmatically. Different point in the lifecycle, different artifact form, different data structure, different consumption pattern. All four are surface manifestations of one substrate distinction.
  3. Sandbox runtime evaluation vs cryptographic determination. Baz’s verdicts are evaluator outputs (non-deterministic across sandbox runs). Idora’s receipts are reproducible against pinned versions.
  4. PR / MR boundary vs CI execution layer. Baz operates at the code-review boundary across GitHub pull requests and GitLab merge requests through native integrations with each host. Idora reads from the CI execution layer regardless of host (GitHub Actions, GitLab CI, Buildkite, ArgoCD) and produces receipts independent of which code host the repository lives on. The distinction is about where each system attaches to the workflow (review boundary vs execution layer), not about which code hosts each can read from.

The substrate distinction (and why it is structural)

The most important architectural distinction collapses four surface differences (requirement-source breadth at scale, post-deploy persistence, audit survivability, and agent-runtime-native consumption) into one substrate question.

Why spec-as-code at scale requires substrate, not integrations. A team practicing spec-driven development at meaningful scale has dozens to hundreds of specification files in the repository. For a verification system to operate against this surface, three architectural primitives are required:

  • Persistent decomposition. Hundreds of spec files cannot be re-parsed into atomic verification units on every PR. The token economics are prohibitive and the cross-run determinism is poor. Decomposition has to happen once, be stored as structured artifacts, and be reused.
  • Persistent mapping. The map from decomposed requirements to code locations is itself a substrate that took compute to build. Re-deriving it per PR review wastes compute and prevents the system from getting cheaper as it grows.
  • Selective traversal. Given a PR diff, the system needs to identify which subset of requirements applies. With hundreds of seams, the entire corpus cannot fit in a context window. The graph must resolve “which requirements govern these files” via traversal, not re-scanning.

Why this is not closeable as a feature for any per-PR architecture. The same three primitives are also what enable post-deploy verification persistence: receipts compounding across releases, queryable any time after the fact, with cryptographic provenance and consistent re-evaluation against pinned versions. The substrate that makes spec-as-code workable at scale is the same substrate that makes post-deploy queryability possible. Adding support for hundreds of in-repo specifications, or for cross-release verification queryable years after deploy, requires building these substrate primitives. They are a foundation, not a feature. Any system designed around per-PR temporary indexing was designed for a different problem scope; expanding that scope means rebuilding the substrate.

Why this rebuild is the harder kind of rebuild. A per-PR system carrying customers cannot rebuild the substrate as an incremental refactor. The substrate primitives change what the system is: persistent decomposition replaces per-PR re-parsing, persistent mapping replaces re-derivation, graph traversal replaces flat-corpus inclusion. These are not adjacent to the existing architecture; they are alternative foundations. The product team faces a structural choice: continue the per-PR product (which has a working business) while building the substrate alongside it as a separate system, or migrate the existing product onto a new substrate while preserving customer commitments. Both paths typically take 18-24 months at the founding-engineer level of focus, and during that period the company is partially defending the existing value proposition while partially building one that supersedes it. The architectural rebuild is genuinely possible. The market and customer dynamics around it are what make it hard, and what make purpose-built post-divide infrastructure structurally distinct from architecturally-extended pre-divide infrastructure.

What compounds in each architecture. Baz compounds on reviewer state (Reviewer Memory accumulating from feedback patterns, Module Memory capturing module-level implementation details, Custom Reviewers learning from PR history) and on team-level signal (Engineering Impact tracking detected issues over time, reported bugs as lagging-signal feedback). These are real compounding assets that improve Baz’s review quality as usage accumulates. Idora compounds on a different dimension: verification records (content-addressed receipts tied to pinned model and prompt versions) and code-state proof chains (graph traversal across releases joining requirements to artifacts to deployments). Both architectures compound; the architectural distinction is which dimension. Reviewer-state compounding improves the quality of PR-time review. Verification-record compounding produces audit-survivable cross-release proof. The two are not substitutes; they are different compounding axes serving different functions.

Why this matters in May 2026 specifically. The post-divide window is not a hypothetical projection. The shift is observable in industry data: AGENTS.md became a Linux Foundation standard (December 2025) and was adopted by Microsoft, OpenAI, Cursor, Vercel, and Supabase. Anthropic’s Skills format ships natively in Claude Code and Cowork. GitHub Spec-Kit, Tessl, OpenSpec, and BMAD have shipped production tooling for the spec-driven pattern through 2025. Thoughtworks Tech Radar Volume 33 (November 2025) placed spec-driven development in Assess. The market reaction to AI agent execution arriving at production volume, the SaaSpocalypse, wiped $1T+ from cumulative SaaS market cap through April 2026, with Atlassian reporting its first-ever decline in enterprise seats. Each of these is an observable data point, none coordinated, all converging on the same shift. The verification infrastructure category is forming because the substrate it serves has already changed.

What this brief brackets out. The comparison above operates on the verification-receipt dimension where Idora and Baz touch. Idora’s full substrate joins verification receipts to execution receipts: content-addressed records of build, test, and deploy artifacts, written into the graph from CI pipeline ingestion, joined to verification receipts through commit-and-seam mappings. The bridge between the two streams is what makes Continuous Integrity the layer after CI and CD rather than a sharper PR-time verifier. The substrate-rebuild barrier compounds across both dimensions: a per-PR architecture rebuilding to substrate must build persistent decomposition, persistent mapping, selective traversal, and a push-triggered execution-receipt pipeline. Each dimension is a foundation, not a feature.

The per-PR temporary-index pattern is well-fit for the bounded inputs Spec Reviewer was designed against, typically one or two referenced tickets plus a few wiki pages. Within that scope, the architecture works. At the scope of full spec-as-code corpora and post-deploy persistence, a different substrate foundation fits.


What Baz does well

Direct read, no minimization:

  1. Spec Reviewer’s runtime sandbox evaluation is technically impressive evidence at the PR-time gate.
  2. Ticketing system integrations are deep across Jira, Azure DevOps, Monday, Shortcut, YouTrack.
  3. Custom Reviewers (learning from PR history) is a genuine differentiator in the code review category.
  4. Pixel Perfect (Figma-to-code visual verification) is unusual and valuable.
  5. Code Review Bench #1 in Precision is real third-party validation against established model labs.
  6. Awesome Reviewers open-source registry (5K+ system prompts) is genuine developer-community top-of-funnel.

Open questions for Idora

Direct read, no rhetorical hedging:

  1. Verifier calibration disclosure forthcoming. Receipts carry pinned prompt and model versions today. A published calibration story (held-out evaluation, agreement rates with senior reviewers, known failure modes) is in scope for the next funding cycle.
  2. Design partner pilots opening but not yet running at scale. Production validation depth comparable to Baz’s customer base is forward-looking, not current.
  3. SOC 2 Type II in scope for the next funding cycle. The receipt model is designed for audit survivability today; external compliance certification follows.
  4. No published external benchmark yet. Code Review Bench targets PR-time review, which is not the surface Idora operates on. A benchmark comparable to Baz’s #1 Precision validation does not yet exist for the post-deploy verification surface; building or contributing to one is in scope.

What this means for Idora

The two products operate at different points in the software production lifecycle. The category risk is mindshare and framing more than direct product overlap.

  1. Idora’s positioning holds. The architectural properties already established (compounding state, cryptographic provenance, CI-execution-layer ingestion regardless of code host, reproducibility) defend the structural distinction without rework.
  2. The complementary deployment frame is the structurally accurate one. Baz is a code review tool that operates at PR or MR time. Idora is a substrate that produces durable proof across releases. A team can run both. Baz at the PR or MR gate, Idora as the post-deploy substrate. The two systems do not compete for the same buyer or workflow position.
  3. The post-divide problem is the entry point. AI writing production code at human-team volume requires verification infrastructure that did not exist as a category before. Audit survivability, spec-as-code at scale, and cross-release persistence are surface manifestations of the same architectural need.
  4. Design partner profile. The structural fit is sharpest with platform or infrastructure engineering teams of 30-200 engineers, shipping daily through GitHub Actions or GitLab CI, maintaining 50+ specifications in repository (markdown specs, OpenAPI files, ADRs, AGENTS.md hierarchies), and already running coding agents (Claude Code, Cursor, internal frameworks) in production code paths. Two buyer postures fit this profile: platform engineering teams facing the post-merge gap (agent re-derivation cost, agent-output drift, inability to answer cross-release questions about their own production system), and teams under regulatory or compliance pressure that requires audit-survivable proof of what was decided and shipped. Concrete enough that a team can self-identify or self-disqualify in one read.
  5. Category formation timing. The post-divide verification category is forming in real time. Convergent positioning across multiple companies (Idora, Baz, and others examined in subsequent Structural Reads) means the architectural shape of the category is being established in 2026.

Net

Baz is a well-built code review system, designed against a pre-divide requirement-source landscape, executing well within the scope it was built for. The team is exit-experienced and well-networked. The product is good at what it was designed for.

Idora is built against the post-divide question, occupying a different point in the software production lifecycle: post-deploy integrity verification with a persistent graph substrate that supports the full requirement-source surface (external systems plus in-repo specs at scale) and produces audit-survivable receipts that compound across releases.

The architectural distinction is structural, reflects different design-time questions answered by different substrate foundations, and is not closeable as a feature addition pattern. The two products are complementary in deployment.

The strategic discipline for Idora is to lead with the post-divide problem itself, surface the substrate-level distinctions where they apply, and continue building toward design partner pilots and category-leadership conversations as the verification infrastructure category forms.


Sources

Baz materials

Idora materials and divide framing

External validation of post-divide framing

  • Andrej Karpathy: agentic engineering reframe and “coding agents basically didn’t work before December and basically work since” (February 2026)
  • Anthropic: Claude Cowork research preview (January 12, 2026); 11 enterprise plugins (January 30, 2026); SaaSpocalypse $1T+ cumulative SaaS market cap impact through April 2026
  • AGENTS.md / Anthropic Skills standard (December 2025) and adoption by Microsoft, OpenAI, Cursor, Vercel, Supabase
  • Spec-driven development production tooling through 2025: GitHub Spec-Kit, Tessl, OpenSpec, BMAD
  • Thoughtworks Tech Radar Volume 33 (November 2025) placed spec-driven development in Assess

Next in the seriesForthcoming.

← Back to documentation