Does Threadbaire work with any model?

Yes. We test the same slice across at least two leading models to ensure comparable outputs and stable pass criteria.

What are AI answer receipts?

Timestamped notes that capture who did what, when, and from which source. They’re included with every run for auditability.

What effort is required to evaluate?

No SDK is required to evaluate. Bring your GPT/Claude accounts; we provide a copy-paste harness. MCP is optional.

Is there a free trial?

No. Evaluation runs as a Paid Pilot (invite-only). Details, inclusions, and timelines are provided on the Pilot page.

What about security and data flow?

Pilots run in your accounts; we do not retain or train on your content. EU hosting/data-residency respected in pilots; DPA available on request.

Do you publish evidence of repeatability?

Yes. We version pass criteria and include a manifest with hashes in the Pilot pack. Internal runs show >90% pass-rate across listed models using our validator.

Which integrations are supported?

Works with GPT and Claude. Optional connectors (Pilot): Jira, Notion, Slack. Air-gapped evaluation is supported.

Can I get the slice format and prompts?

Yes — the slice format, prompts, and JSON-only validator are included in the Pilot pack along with frozen outputs and a manifest.

Interactive demo: same memory, different models

Live method, frozen outputs. Evaluation runs as a Paid Pilot (invite-only; no free trials).

We’ll show one thing in ~60 seconds: the same memory producing comparable answers across two models with receipts you can audit. Methods (slice format, prompts, validator) ship in the Pilot pack.

See the compare Receipts zoom Request Paid Pilot

Pilots run in your account. We never train on your content. Privacy

1 Compare (60s)

Frozen outputs from two leading models. Same inputs → same story.

Models shown: Claude and GPT. We version pass criteria and include a manifest with hashes in the Pilot pack. Last updated: 2025-09-04.

Claude: Frozen output

Captured: 2025-09-03

{
  "summary": "Plan B was paused due to low ROI; since then the custom parser was cut to reduce latency/vendor risk; proceed with pared-down MVP.",
  "timeline": [
    {"ts":"2025-06-03","what":"Plan B paused","who":"E.","why":"ROI too low"},
    {"ts":"2025-06-19","what":"Cut custom parser","who":"R.","why":"Latency + vendor risk"}
  ],
  "next": "Ship pared-down MVP without the custom parser"
}

GPT: Frozen output

Captured: 2025-09-03

{
  "summary": "Plan B halted for poor ROI; later the custom parser was removed for latency/vendor concerns; move forward with an MVP that excludes it.",
  "timeline": [
    {"ts":"2025-06-03","what":"Plan B paused","who":"E.","why":"ROI too low"},
    {"ts":"2025-06-19","what":"Custom parser removed","who":"R.","why":"Latency + vendor risk"}
  ],
  "next": "Launch MVP excluding the custom parser and track cycle time"
}

PASS: same two timeline events (ts/what/who/why) and an action-first “next” aligned with the objective.
Observed pass-rate (internal runs, last 30 days): ≥ 90% using our validator; details in the manifest (Pilot).
Receipts: each run carries who/what/when/source for auditability.

Request Paid Pilot (pack includes slice format, prompts, validator, manifest)

Notes: outputs are representative and frozen; live models evolve over time. Pass criteria are versioned.

2 Receipts (what the model saw)

Timestamped notes capture lineage. Examples shown with synthetic identifiers; full receipts ship in the Pilot pack.

who: E.
what: Plan B paused
when: 2025-06-03
source: internal channel

who: R.
what: Custom parser removed
when: 2025-06-19
source: ticket #4821

Every event in a slice can carry receipts (who/what/when/source) to support audits and handoffs.

Request Paid Pilot (full receipts + manifest)

3 Methods (Pilot only)

The slice format, model prompts, and JSON-only validator live in the Pilot pack.

Slice format

Pilot only

// Portable slice: role · question · window · events[n] + receipts
// Available in Pilot pack (spec + adapters)

Request Paid Pilot

Model prompts & JSON validator

Pilot only

// Enforces exact keys/length for diff-friendly outputs
// Available in Pilot pack (standard + strict variants)

Request Paid Pilot

Under the hood

Pilot only

// Slice adapters · Receipt normalizer · Drift guards
// Keep outputs comparable across model updates

Request Paid Pilot

Works with: GPT, Claude
Optional connectors (Pilot): Jira, Notion, Slack
Air-gapped evaluation: supported
Data flow: runs in your accounts; no training on your content; EU data-residency respected; DPA available

Interactive demo: same memory, different models

1 Compare (60s)

Claude: Frozen output

GPT: Frozen output

2 Receipts (what the model saw)

3 Methods (Pilot only)

Slice format

Model prompts & JSON validator

Under the hood

Ready to evaluate?