Question 1

What is an AI agent capability audit?

Accepted Answer

A structured evaluation of your AI agents — whether candidates you're considering or systems already in production — that produces per-agent capability vectors across the task dimensions that matter to you. For multi-agent pipelines, we compute Shapley-based attribution so you can see each agent's marginal contribution to outcomes. The deliverable is a report telling you which agents to keep, cut, or replace, and why.

Question 2

Who should commission a capability audit?

Accepted Answer

Enterprises evaluating 5–15+ AI agents from different vendors with no principled way to compare them. Financial services firms with model risk requirements (MiFID II, SEC, OCC). Healthcare/pharma operations with explainability needs (FDA). Any organization running multi-agent workflows where it's unclear which agent is creating value and which is freeloading on the others.

Question 3

How is Shapley-based attribution different from agent-level benchmarks?

Accepted Answer

Public benchmarks score agents in isolation against synthetic tasks. Shapley attribution measures each agent's marginal contribution to your actual production pipeline against your real outcomes. Two agents that look identical on benchmarks can have wildly different Shapley values in your specific workflow — and that's the number that matters for keep/cut/replace decisions.

Question 4

Do you require access to our agents' internals?

Accepted Answer

No. We work from observable inputs, outputs, and outcomes. Where we need to compare embeddings or capability profiles across agents from different vendors, we use privacy-preserving projections (quantized Johnson-Lindenstrauss) so that vendor IP and your sensitive data both remain protected.

Question 5

What's the difference between this and the Value Thread Audit?

Accepted Answer

The Value Thread Audit is broader and upstream — it maps your entire data → BI → AI spend to outcomes, agents or no agents. The Capability Audit is narrower and downstream — it dissects multi-agent systems specifically. Many engagements run the value audit first and the capability audit second when warranted.

Question 6

Can the audit support regulatory or compliance reporting?

Accepted Answer

Yes. The capability vectors and Shapley attribution diagnostics are designed to map cleanly to model risk frameworks (SR 11-7 for US banking, MiFID II, FDA model explainability requirements). We deliver the audit artifacts in a form your compliance and risk teams can incorporate into their existing reporting.

AI Agent Capability Audit · 4–6 Weeks · $25–100K

Which of your AI agents are actually doing the work?

When the capability audit pays for itself

Five vendors, no principled way to compare.

Pipeline outcomes are good, contributions are murky.

Risk and compliance want explainability you don't have.

What you get

The method, briefly

How the engagement runs

Scoping

Data collection & instrumentation

Attribution analysis

Report & recommendations

Scope & pricing

Quick

Standard

Enterprise

Frequently asked questions