Governance and Verification: Why the Agent Era Needs Both Layers | BotConduct

Observatory Report

Why declared governance alone is insufficient. The case for independent behavioral verification of AI agents.

BotConduct

TL;DR: The AI agent governance stack is crystallizing fast. Microsoft shipped the Agent Governance Toolkit (AGT) on April 2. AWS shipped AI Risk Intelligence (AIRI). Bureau Veritas launched an independent AI assessment offering built on AWS. All of them answer the same question: how does an operator govern its own agents? None of them answer the other question the enterprise buyer actually asks: how does a third party verify that the agent actually behaves that way in production? Deployer-side governance and receiver-side verification are different layers. Both are needed. Here's why.

What just shipped

Microsoft Agent Governance Toolkit (AGT) — released April 2, 2026, open source, MIT license. Sits between the operator's agent framework and the actions the agent takes. Every tool call, every resource access, every inter-agent message is evaluated against policy before execution. Integrates natively with AWS Bedrock, Azure AI Foundry, Google ADK, LangChain, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, and more. Covers the 10 OWASP Agentic top risks with sub-millisecond policy enforcement.

AWS AI Risk Intelligence (AIRI) — from the AWS Generative AI Innovation Center. Automates security, operations, and governance assessments across the agentic lifecycle. Enterprise dashboard, prioritized recommendations, integrates with AWS capabilities. Bureau Veritas is using it to deliver independent AI assessments under European regulatory frameworks.

These are serious, well-funded, technically sophisticated systems. We're glad they exist. Every enterprise deploying AI agents in production should have something like them in their stack.

But read what they do carefully

AGT and AIRI are deployer-side. The operator adopts them, runs them inside their own infrastructure, and configures the policies their agents must respect. The enforcement is the operator governing its own behavior.

That's valuable — it's how you prevent a Vercel-style incident where an agent's actions slip past what the organization intended. It's the right tool for the right job.

But it doesn't answer the question the enterprise buyer actually asks when they're procuring an agent from someone else:

"How do I know this bot won't go rogue when it's out of my direct control?"

The operator can say "we use AGT with a strict policy set." The buyer still has to trust that claim. Which brings us to the layer nobody has been building.

The receiver-side gap

Imagine two parallel layers:

Layer	Who runs it	What it answers
Governance (AGT, AIRI, internal policies)	The agent's operator	"Are our agents doing what we said they would?"
Verification (BotConduct)	An independent third party	"When the agent ran in production, did it actually behave the way its operator promised?"

The first layer protects the operator. The second layer protects the receiver — the site being visited, the API being called, the procurement team on the other side of the table, the regulator asking for evidence of post-deployment conduct.

BotConduct sits on the receiving end of agent traffic. Not inside the operator's infrastructure. Not enforcing the operator's policy. Measuring what the agent actually does, cryptographically signed, verifiable without trusting the operator's self-report.

That's the difference between promised conduct (what AGT governs) and observed conduct (what BotConduct records).

Why this matters for the next twelve months

The regulatory timetable is narrow and specific:

EU AI Act high-risk obligations take effect August 2026.
Colorado AI Act becomes enforceable June 2026.
California SB 1001 applies to any bot interacting with California consumers.
GDPR per-request rights continue to apply across the EU.

Every serious compliance conversation over the next year will need two kinds of evidence. Operators will need to show how they govern their agents — AGT and AIRI produce that. And they'll need to show how their agents actually behaved in production against the frameworks in question — which is a different evidence type. Self-attestation doesn't close that gap. A signed third-party record does.

This is the same distinction that exists in financial services. A bank has internal compliance controls (SOX, internal audit, control matrices). Those are deployer-side. It also has external auditors (KPMG, EY, PwC) who independently verify that the controls actually produced the outcomes claimed. Those are receiver-side. Both are required, by law, because regulators learned long ago that internal governance without external verification is insufficient.

Complementary, not competitive

We want AGT to succeed. We want AIRI to succeed. We want Bureau Veritas to build its European AI assessment practice into something real.

An agent ecosystem where operators govern their agents rigorously and independent verification confirms that governance is working is a safer ecosystem. The operators who adopt AGT and end up with great governance will look better when they're verified externally. The operators who don't will look worse.

BotConduct's layer gets more valuable, not less, as the governance layer matures. When AGT becomes table stakes, buyers will start asking the next question: show me evidence that your governance is producing the behavior you claim. That evidence has to come from outside the operator. It has to be portable across CDNs and infrastructure. It has to be cryptographically signed so it can be cited in audits, procurement reviews, and regulatory filings.

That's what we build.

One concrete example

Imagine an enterprise procurement team evaluating a voice agent vendor. The vendor shows up with an AGT policy bundle: "every external tool call is checked, every PII access is redacted, every call ends with audit logging." Good.

The procurement lead then asks: "When your agent ran against our competitors' deployments last quarter, did it actually respect those policies, or did it drift? Show me the independent record."

Without receiver-side verification, the vendor can only answer with more self-attestation. With it, the vendor can produce a signed trajectory from an independent third party showing exactly how the agent behaved under production-like evaluation — same policies, across long sessions, under adversarial inputs, across frameworks.

That's the conversation that's coming. The tools to govern the agent's behavior shipped three weeks ago. The tool to verify that behavior independently is what we've been building since January.

If Moody's rates bonds, and FICO rates consumer credit, BotConduct rates how AI agents behave when nobody's watching — and the evidence is signed, independent, and portable across every framework an agent has to answer to.