Research Note · May 2026 · Vol. I №11

The Second Standard

Why continuous internal audit anticipates — but does not substitute for — receiver-side attestation of agent behavior observed at public surfaces.

Filed by the BotConduct Observatory Desk · May 2026


I. The first standard is consolidating

Matthew Prince described it plainly: AI is coming for the meters. Continuous internal audit, measured by AI, faster and without bias. Companies will measure themselves continuously, and the audit firms that consolidated trust in financial reporting will be displaced by infrastructure that does the measuring directly.

This is no longer theoretical. In recent weeks, Cloudflare pointed Mythos and other security-focused LLMs at live code across critical parts of its own infrastructure, and published what they observed — the models’ strengths, their weaknesses, the gaps. The company that articulated the thesis is now executing it on itself. The meter is being installed, by the same party that operates the system being measured.

The first standard — continuous self-measurement — is taking shape inside large infrastructure providers, and Cloudflare’s framing is one of the clearest articulations. The framing and the execution are now visible in the same actor.

But this is half the story.

If continuous internal audit becomes the new standard, the natural and immediate consequence is continuous external attestation — signed by a party not involved in the business, verifiable by third parties who depend on those metrics.

Because a company can measure itself continuously with AI. But its customers, its regulators, its insurers, and its financial counterparties need something different: proof of observed behavior that does not depend on trusting the company itself.

This is not competition for what Prince is building. It is the adjacent layer that his framing anticipates without naming.

The first standard is consolidation.

The second has yet to be named.

II. Why naming matters now

In the past sixty days, the cyber-AI landscape has shifted faster than the categorical frameworks meant to contain it.

LangChain shipped Auth Proxy, a control plane that keeps credentials outside the runtime of agentic systems. Alibaba released Qwen3.7-Max, a frontier model designed for thirty-five hour autonomous execution and a thousand sequential tool calls. Adversa AI published the IICL paper, showing that the very capability that makes models useful in long-horizon agentic tasks — in-context learning — is also a systemic attack surface across seventeen vendor families. Independent academic work demonstrated that browser agents can be fingerprinted from passive UI traces with ninety-six percent accuracy.

This week, the convergence accelerated.

A Chief Security Intelligence Officer at a major security vendor reframed the most important board question of the quarter — not “are we secure?” but “are we faster?” — and characterized the underlying dynamic as a collective race whose outcome depends on the entire ecosystem’s ability to find, patch, and contain. The framing is correct. What the framing implies, but does not yet name, is that “ecosystem ability” requires a record that no single ecosystem participant can produce alone, attested by parties whose epistemic position is independent of the senders being measured.

In the same window, regulatory institutions across multiple jurisdictions began articulating, almost simultaneously, the need for third-party verification of AI systems. The European Commission published draft high-risk classification guidance under the AI Act. Two leading AI labs publicly endorsed legislation requiring third-party audits of frontier AI systems. One US state authorized the country’s first AI verification pilot. Federal-level pre-deployment review remained under active debate. Bilateral guardrail conversations between major jurisdictions advanced in parallel.

Regulation cannot be the auditor it specifies. The institutions writing these rules require an evidence layer they themselves do not produce.

Each of these moves articulates the same underlying transition from a different angle. Identity declaration is decaying as a control mechanism. Behavior is what remains observable. The architecture of trust on the agentic web is being rebuilt in real time, and the parties best positioned to define the new categories are the ones already operating in production.

This note articulates the second standard: a receiver-side attestation layer that does not yet have a settled name, but which the rest of the stack increasingly requires.

III. Determinism and probability

The first epistemic asymmetry to address concerns the difference between sender-side and receiver-side claims.

LangChain Auth Proxy is deterministic. It controls the dispatch of the agent, the credentials it carries, the network destinations it can reach, and the audit trail of every outbound request. The proxy can make administrative claims with certainty because it owns the boundary on which those claims are made. When the proxy attests that an agent dispatched by organization X went to destination Y at time Z, this is verifiable from the proxy’s own logs.

A receiver-side observatory is probabilistic by necessity. It does not control the actors that arrive at the surfaces it monitors. It does not configure them, dispatch them, or limit their behavior. It observes what they do, what they declare about themselves, and what those two things imply when compared across time and across other surfaces.

The output of a receiver-side observatory is not “agent X dispatched by organization Y.” It is “the actor declaring itself as X exhibits behavior compatible with declared identity, across N observations over M time, with confidence level Z, and shows or does not show patterns associated with the family of vulnerabilities currently in the public literature.”

These are not competing regimes. They are not interchangeable. They are different epistemic operations on different sides of the same exchange, and any framework that tries to reduce one to the other obscures the asymmetry rather than resolving it.

IV. The asymmetry does not close with a protocol

A natural impulse, when faced with two regimes that need to interoperate, is to propose a protocol that translates between them.

This impulse should be resisted.

A deterministic sender-side attestation and a probabilistic receiver-side observation cannot be made equivalent by any technical interface, because they make different kinds of claims about different kinds of evidence. A signed declaration from the sender is a claim about what was dispatched. An observation from the receiver is a claim about what arrived and how it behaved. These can coincide. They can also diverge. The interesting cases are the ones where they diverge — where a properly signed agent exhibits behavior incompatible with its declared identity, or where an unsigned actor exhibits behavior compatible with a known declared identity that it has not claimed.

What closes the asymmetry is not a protocol. It is a historical record.

V. The track record closes the gap

What weighs deterministic attestation against probabilistic observation is the accumulated history of conduct by declared identity.

A signed dispatch from organization X means one thing when X has appeared across hundreds of receiver-side surfaces over months, with a consistent behavioral signature compatible with what X declares about itself. It means something else when X is appearing for the first time, or when the behavioral signature of recent X-declared actors diverges from the historical record of X.

This is not a new model. It is the model that credit bureaus apply to financial counterparties, that abuse contact databases apply to network operators, that reputation systems apply to certificate authorities, and that VirusTotal applies to software artifacts. The structure is well understood. What is new is the application of this structure to autonomous systems acting at scale on public digital surfaces.

The record cannot be built by the sender. The sender sees only its own dispatch, not what happens to the agent after it leaves the boundary. The record cannot be built by any single receiver in isolation either. A single property sees the actors that touch it, but cannot distinguish a coordinated multi-property campaign from independent visits, and cannot place the behavior it observes in the context of the same actor’s conduct elsewhere.

The record can only be built by receivers operating collectively, contributing observations, complaints, and confirmations to a registry that aggregates evidence by declared identity over time.

In one observatory we operate, this record has been accumulating for some weeks across receiver-side surfaces. The dataset is small relative to what the architecture demands at scale, but already sufficient to demonstrate that the record produces categorical distinctions that no individual property could derive from its own logs.

VI. The observatory as evidence layer

The second function of a receiver-side observatory is to serve the research ecosystem that is generating hypotheses about agentic behavior faster than empirical verification can keep up.

When Adversa AI publishes that Involuntary In-Context Learning succeeds across seventeen vendor families in controlled adversarial probing, the next question is whether this vulnerability is being exploited in production traffic, and whether the behavioral signatures of post-IICL hijacking are distinguishable from baseline agentic behavior. When the “Known By Their Actions” researchers demonstrate that fourteen frontier LLMs are fingerprintable from UI traces with ninety-six percent accuracy under laboratory conditions, the next question is whether that fingerprinting holds in the wild, against actors that randomize timing, vary their tool selection, or evolve their behavior across sessions.

These are not the kinds of questions that can be answered from synthetic red teaming alone. They require an empirical observation post, operating receiver-side, on real traffic, across multiple properties, over enough time for behavioral patterns to stabilize and then change.

The receiver-side observatory is not a competitor to the research labs that publish these papers. It is the empirical complement to them. The labs generate hypotheses about what is possible. The observatory verifies which possibilities have moved into production. Both forms of research are legitimate. The agentic ecosystem requires both.

A clarification is warranted here, because the categorical distinction is easily missed. The integrations announced this week by Anthropic — twenty-eight platforms spanning DLP, SASE, identity, SIEM, data security, AI security posture management, and observability infrastructure — provide visibility to enterprise customers over their own use of Claude. This is sender-side enterprise compliance: the customer measures its own consumption of the model, through tools the customer already owns, with evidence supplied by the model provider itself. It is a meaningful and necessary category. It is not the same operation as receiver-side attestation of agent behavior observed at public surfaces. The two share neither the data, the customer, nor the epistemic position. Conflating them obscures the gap that receiver-side observation is meant to occupy. A registry built by receivers, attesting to behavior observed independently of any sender, cannot be a partner of any sender — by construction.

The same distinction applies to regulatory frameworks. The joint statement issued this week by UK regulators directing firms to treat frontier AI cyber risk as a board-level governance obligation defines the obligation. It does not produce the evidence required to discharge it. Regulation specifies what must be measured. Sender-side compliance measures internal usage. Receiver-side attestation provides the verifiable record of what arrived from outside the regulated and sender-side perimeters — the layer that exists between jurisdictions, between vendor ecosystems, and between participants who do not share a sender. When an actor declared from one jurisdiction interacts with a property in another, the receiver does not file with the foreign regulator. The receiver requires signed behavioral evidence of what arrived, independent of any sender, in order to act. That layer is structural, not optional.

There is an architectural convergence worth noting here. Cloudflare recently documented the structure of its own vulnerability discovery harness: eight specialized agents — recon, hunt, validate, gapfill, dedup, trace, feedback, report — orchestrated in parallel rather than consolidated into a single exhaustive agent. The conclusion they offered was that narrow specialized agents in parallel beat one exhaustive agent. This is the same architectural principle, articulated from the offensive side, that a receiver-side observatory operates on the observational side: specialized engines for capture, classification, cross-property correlation, and evidence chain signing, rather than a single monolithic detector.

The convergence is not coincidental. It is what the agentic environment requires from any system that intends to operate in it seriously, on either side of the exchange.

This is the categorical function that does not yet have a settled name, and which the second standard will need to occupy.

VII. Where this leaves the stack

The architecture of agentic trust is converging on a structure with three distinct layers.

The first layer is sender-side attestation, where systems like LangChain Auth Proxy control the dispatch of agents, the credentials they carry, and the network policy that governs their outbound behavior. This layer makes deterministic claims about what was dispatched and under what conditions.

The second layer is receiver-side attestation, where observatories monitor the actors that arrive at public digital surfaces, characterize their behavior, and contribute their observations to a registry that aggregates evidence by declared identity. This layer makes probabilistic claims about what was observed and what those observations imply when placed in historical context.

The third layer is the registry itself, distributed across receivers, accumulating over time, defensible only by virtue of the duration and breadth of its operation, and irreplaceable by any single party.

The first layer is consolidating, and its commercial form is becoming visible inside infrastructure providers operating at scale. The third layer cannot exist without the second, and the second is the layer that the rest of the stack increasingly requires but does not yet have a settled vocabulary to describe.

This is what we are building. The architecture is now visible enough that the categories around it can name what they are not, even before this layer is named for what it is. Sender-side compliance is not it. Regulatory obligation is not it. Continuous internal audit is not it. Each is necessary in its own layer. None substitutes for the receiver-side record of what arrived, signed by parties who do not share a sender with the actors being attested.

The work of the next note will be to name it.


This research note is published under the BotConduct Standard. Companion documentation, methodology overviews, and verification bundles are available at botconduct.org/research.

Filed by the BotConduct Observatory Desk · May 2026

Verification: botconduct.org/verify

Public key: botconduct.org/.well-known/bcs-public-key.pem