Research Note · May 2026 · Vol. I №14

Nine documents, one gap

How eight weeks in Spring 2026 made the AI agent observability problem empirical.

Filed by the BotConduct Observatory Desk · May 2026


Abstract

Between April 2 and May 26, 2026, nine independent institutional events documented the same structural condition: AI agent behavior at the surface where it actually acts cannot be reliably attested by the agent’s provider, by the model’s lab, or by the deployer’s perimeter monitoring alone. The nine documents originate from incommensurable actors — a major vendor’s open-source governance toolkit, a US federal memorandum, a network architecture release, a supply chain research report, a national security agency advisory, a financial newspaper investigation, an offensive tooling framework, a coordinated counter-adversary operation, and a coordinated vulnerability disclosure proving that AI-driven code auditing cannot find cross-layer bugs. None coordinated. Eight identify the gap. One proposes a sender-side architecture to address part of it. The ninth — receiver-side observation, signed at the moment of action and reasoned cohort-relative — remains unspecified and is the focus of this note.


1. The nine documents

1.1 Microsoft Agent Governance Toolkit (April 2, 2026). Microsoft released an open-source governance toolkit for autonomous AI agents under MIT license. The toolkit comprises seven packages covering policy enforcement, cryptographic agent identity, execution sandboxing, runtime supervision, and compliance mapping against the EU AI Act, SOC 2, HIPAA, NIST and ISO 42001. The release is the first major vendor commitment to sender-side governance as a distinct architectural category — governance enforced inside the agent and its execution environment, before action reaches any external surface. The toolkit explicitly addresses what the deployer of an agent can attest about that agent’s behavior. It does not address what receivers of agent action — third-party sites, APIs, marketplaces — can independently verify. That asymmetry is the structural condition the remaining eight documents describe from different angles.

1.2 OMB M-26-14 (May 22, 2026). US Office of Management and Budget memorandum requiring federal agencies to implement AI-aware logging across systems incorporating AI components. The memorandum gives CISA ninety days to publish a Reference Architecture defining adequate logging at the federal level. The vocabulary that Reference Architecture adopts will propagate, on the customary delay, into federal procurement language, contractor compliance frameworks, and cyber insurance underwriting standards.

1.3 Cisco AI WAN Architecture (May 2026). Cisco’s reference architecture for enterprise AI deployments treats agent traffic as a distinct network class requiring observability instrumentation separate from traditional application monitoring. The framing acknowledges that AI agent traffic exhibits behavior patterns existing network instrumentation cannot adequately characterize.

1.4 Socket TrapDoor Research (May 2026). Socket documented npm and PyPI packages compromised specifically to inject malicious behavior into AI agent toolchains. The compromise vector is operational, not theoretical. The detection layer that would identify behavioral drift downstream of the compromise does not exist in standardized form.

1.5 NSA Cybersecurity Information Sheet on Model Context Protocol (May 2026). The National Security Agency’s CSI on MCP acknowledges the protocol as a security-relevant integration point in AI agent architectures and identifies attestation of agent behavior across MCP boundaries as an unsolved layer. The advisory does not prescribe a solution. It names the gap.

1.6 Financial Times — “AI guardrails stripped from Meta and Google models in minutes” (May 25, 2026). Jamie John and Chris Cook documented Heretic, a freely available GitHub tool that strips alignment controls from open-weight models using a technique called abliteration. The tool requires no specialized hardware and no deep expertise. The Financial Times and the AI safety group Alice tested it independently. The modified models answered questions on lethal ricin dosing, biological weapons dispersal, and malware that the original systems were trained to refuse. Over 3,500 decensored model variants have been created and collectively downloaded 13 million times. Google’s Gemma 4 was stripped within 90 minutes of its release. In February 2026, Microsoft researchers demonstrated that a single unlabeled training prompt could unalign 15 production models simultaneously. Provider-side attestation of alignment, as a safety mechanism for open-weight models, is failing in production.

1.7 BebopC2 v1.5.0 (May 25, 2026). The first publicly documented Command and Control framework to implement native Model Context Protocol support over HTTP with JSON-RPC. The release explicitly enables external AI agents to interact with sessions, tasks, listeners, and operational workflows through structured APIs. The author, a pentester and red team operator, described the motivation as enabling automation and research. Offensive tooling integrated MCP for agent-driven operations before defensive observation infrastructure standardized any equivalent.

1.8 CrowdStrike Glassworm Takedown (May 26, 2026). At 14:00 UTC on May 26, CrowdStrike’s Counter Adversary Operations team, in coordination with Google and the Shadowserver Foundation, executed a simultaneous strike against the four command-and-control channels of Glassworm — a botnet active since at least early 2025 that targeted software developers through compromised VSCode extensions on OpenVSX, malicious npm and PyPI packages, and over three hundred poisoned GitHub repositories. The operation’s framing, in CrowdStrike’s own words, marks the shift directly: “Adversaries are no longer just targeting products, they’re targeting the developers who build them.” The C2 architecture itself — Solana blockchain memo fields, BitTorrent DHT queries against hardcoded public keys, Base64-encoded paths in Google Calendar event titles, and traditional VPS infrastructure — required all four channels disrupted simultaneously to prevent reconstitution. CrowdStrike’s accompanying statement is, structurally, an acknowledgement of the limits of detection: “Defending against these threats through after-the-fact detection alone is virtually impossible. Malicious packages are installed through dependency updates in seconds, and detections usually happen when the harm is already done.” The receiver-side observation layer that would identify the diverging-developer pattern earlier — across npm, PyPI, OpenVSX, and GitHub simultaneously — is the layer Glassworm operated in for more than a year unobserved.

1.9 BadHost — CVE-2026-48710 (May 26, 2026). Coordinated disclosure through OSTIF.org of a critical authentication bypass in Starlette, the Python ASGI framework underlying FastAPI, vLLM, LiteLLM, and a substantial portion of the Python AI tooling ecosystem. A single character injected into the HTTP Host header bypasses path-based authorization across Starlette’s 325 million weekly downloads and 400,000 GitHub dependents. The target list named in the disclosure is structurally significant: vLLM, LiteLLM, Text Generation Inference, OpenAI-shim proxies, MCP servers, agent harnesses, eval dashboards, and model-management UIs. The vulnerability was discovered by X41 D-Sec during an OSTIF-sponsored audit of vLLM. The most consequential statement in the disclosure is not about the bug itself. It is about how the bug was found. The discoverers note explicitly that “Anthropic’s Claude Mythos found 10,000+ vulnerabilities through Project Glasswing — but not this one. The reason is structural: CVE-2026-48710 is not a bug in one file or one repo. It spans three independent layers — ASGI servers pass the raw Host header, Starlette trusts it for URL construction, and middleware authors assume request.url.path is safe for auth decisions. Each component behaves correctly in isolation. The vulnerability only emerges from the interaction between them, across specifications (HTTP, ASGI, Starlette, MCP). Finding it required manual security research.” OSTIF.org framed the disclosure with vocabulary that names the architectural condition directly: “This bug is a classic ‘responsibility gap’ where if this maintainer didn’t patch, thousands of exposed projects would have to individually secure their projects.”


2. What the nine documents have in common — and what distinguishes one

The documents originate from incommensurable institutional actors: a major software vendor’s open-source release, a US executive office memorandum, a networking vendor reference architecture, a supply chain security firm’s research, a signals intelligence agency advisory, an international newspaper investigation, an independent offensive security tool, a coordinated counter-adversary operation involving a top-tier threat intelligence vendor and two infrastructure partners, and a coordinated vulnerability disclosure involving four independent security research organizations. They share no coordination. They use different vocabularies. They address different primary audiences.

Eight of the nine identify the structural gap. They describe, in their own language, that AI agent behavior at the point of action cannot be reliably attested by the entities upstream of that action. The provider attests what the agent was configured to do at training or deployment. The deployer attests what crossed a controlled perimeter. Neither attests what arrived at the receiver — the website, the API endpoint, the marketplace, the SaaS interface, the third-party service the agent reached and acted upon.

The Microsoft Agent Governance Toolkit is structurally different. It does not identify the gap; it proposes a partial architecture for the sender side of it. The toolkit instruments the agent and its execution environment so that the deployer can attest, with cryptographic identity and policy enforcement, what its agent was authorized to do. That attestation has real value where the deployer is the receiver’s counterparty — closed enterprise deployments, internal automations, controlled environments where both sides of the interaction belong to the same governance domain.

The architecture does not address — and does not claim to address — what an arbitrary receiver of agent action can independently verify when it has no relationship with the deployer, no access to the deployer’s policy configuration, and no contractual basis for trusting deployer-issued attestations. The receiver has its own observation problem, and it scales cohort-relative across enterprises rather than uniformly. The Microsoft toolkit’s existence makes the receiver-side gap more visible, not less, by formalizing the sender side and leaving the receiver side architecturally vacant.

The BadHost disclosure carries the most explicit articulation of the architectural condition the other documents describe abstractly. The disclosure states that AI-driven code auditing — specifically Anthropic’s Project Glasswing, which has identified over ten thousand vulnerabilities through automated analysis — did not find CVE-2026-48710 because the bug does not exist in any single file or repository. It emerges from the interaction of independently correct layers across HTTP, ASGI, Starlette, and middleware specifications. That is a precise technical description of a category of behavior that sender-side analysis cannot observe by construction.

An ISO 42001 lead auditor, writing publicly the same week about the same architectural question, articulated the distinction from the normative side: “The receiver perspective is what’s missing in almost every current debate.” The audit profession is beginning to name what the architecture profession has not yet standardized.


3. Article 3(23) and the empirical evidence

Article 3(23) of the EU AI Act defines “substantial modification.” A high-risk AI system whose behavior drifts substantively from what was placed on the market becomes, under that definition, a new system requiring renewed conformity assessment. The provision is precise about the legal consequence. It is silent about how the modification gets registered as having occurred.

The May 25 Financial Times reporting on Heretic and abliteration makes the operational consequence of that silence concrete. A model released by Meta or Google with alignment controls in place can reach a deployment environment with those controls suppressed within ten minutes of effort. Under Article 3(23), the resulting model is a substantively modified system. Under existing observation infrastructure, the modification leaves no standardized record at the surface where the modified agent acts.

The Glassworm takedown one day later reframes the same problem from the supply-chain side. A developer’s machine compromised through a trojanized VSCode extension or a malicious npm postinstall hook becomes, in operational terms, a substantively modified development environment. The resulting code commits, package publications, and downstream artifacts inherit that modification — visible only at the surfaces those artifacts reach. As CrowdStrike framed the broader condition: “Every organization that consumes software inherits the risk of everyone who produces it.”

The BadHost disclosure, published the same day as Glassworm, reframes the problem again from the framework-composition side. A Starlette deployment that is correctly configured in isolation, with middleware that correctly enforces path-based authorization in isolation, becomes a substantively vulnerable system when both are composed under an unmodified HTTP layer. The substantial modification is not in any component. It is in the composition. Under existing observation infrastructure — including AI-driven code auditing at the scale of Project Glasswing — the modification leaves no record until the surface where the composed system acts is itself observed.

The compliance architecture proposed by Nannini, Smith, Maggini, Panai and others in their April 2026 paper maps the regulatory surface for AI providers in unusual detail. What the paper signals, without making it the focus, is that provider-side attestation alone cannot satisfy obligations whose triggering condition is downstream behavior. The provider cannot attest what its model is doing after abliteration has removed the safeguards it embedded. The deployer using the Microsoft Agent Governance Toolkit can attest what its own agent was configured to do, but cannot attest the actions of agents originating from third parties operating under different policy regimes — or no policy regime at all. And the auditor of any individual component cannot attest the behavior that emerges only when that component is composed with others.

Receiver-side observation, signed at the moment of occurrence and reasoned about against a cohort of comparable agents acting on comparable surfaces, is the architectural layer where Article 3(23)’s triggering condition can be empirically established. It is not a substitute for sender-side governance. It is the layer that remains unaddressed when sender-side governance does its work and when it does not.


4. What the empirical floor looks like

The receiver-side observation layer is not hypothetical. Over the thirty days ending May 26, 2026, BotConduct Observatory operated a dual-VPS deployment: twenty-two observatory properties spanning seventeen distinct verticals — fashion, healthcare, government, legal, finance, automotive, e-commerce, gaming, real estate, travel, logistics, news, and others — alongside a commerce site on dedicated infrastructure, and the public-facing observation site on independent infrastructure. The deployment enables cross-property correlation within each infrastructure node and cross-infrastructure actor correlation between them. The pattern data is empirical, not modeled.

Across the observation window, 42.5 percent of actors initially classified as legitimate users transitioned to targeted extraction behavior, with a median latency of two days between first observation and pattern transition. Suspicious-crawler-pattern actors transitioned to targeted extraction at 67.7 percent probability, with a median latency of eight days. Two thousand and fifty-six unique actors operated across both infrastructure nodes with consistent behavioral signatures over the observation window. The escalation probability matrix is calibrated against empirical outcomes: predictions issued at eighty percent or higher confidence verified at ninety-three percent accuracy in backtested data.

The observation network also surfaces a finding adjacent to the May 2026 NSA advisory on Model Context Protocol, and now adjacent to the BadHost disclosure as well. Automated actors are probing for MCP endpoints in production, with characteristic request patterns against /mcp and /mcp/<hash> paths originating from infrastructure distinct from generic vulnerability scanners. The MCP reconnaissance is observable empirically, weeks after NSA named MCP as a security-relevant integration point and weeks before the BadHost disclosure identified MCP servers as a primary target class. Receiver-side observation registers what perimeter scanning treats as noise — and what AI-driven code analysis cannot find by construction.

A second empirical observation departs from a common industry assumption. Human users and automated agents reach the same maximum loop depth in observed sessions. The distinguishing signal is not depth itself but the temporal and pattern profile of how depth accumulates. This refines, rather than supports, common detection heuristics built on assumed depth asymmetry.

These numbers and findings are not predictions of what receiver-side observation could detect. They are the contents of thirty days of continuous observation at a single observatory operator. The structural condition the nine documents describe is not waiting for an architecture to be built. The architecture is operationally feasible, and the empirical content it produces — frequency data, transition probabilities, temporal profiles, cross-infrastructure correlation, and reconnaissance patterns against emerging protocols — is the form Article 3(23)’s triggering condition can take when registered as having occurred. In actuarial language: longitudinal behavioral exposure characterized at the surface where autonomous agents act. The empirical floor is not a substitute for sender-side governance. It is the layer that makes Article 3(23) operationalizable, FAIR-grade risk modeling tractable, and ISO 42001 Article-level evidence available.


5. What the CISA Reference Architecture will and will not address

CISA has ninety days from May 22 to publish the federal logging Reference Architecture required by OMB M-26-14. Federal agencies will be required to comply. Federal contractors will inherit the compliance requirement through procurement. The vocabulary the Reference Architecture adopts will become the operational baseline for AI logging in US public sector deployments, and through downstream propagation, in regulated private sector deployments.

The Reference Architecture will almost certainly address logging at the model interaction layer and at the deployer perimeter. It is now plausible that it will reference or harmonize with the Microsoft Agent Governance Toolkit’s vocabulary for sender-side attestation, given the toolkit’s open license and explicit compliance mappings. Whether it addresses logging at the receiver — the third-party surface where the agent’s action lands — is the open question that will determine whether the M-26-14 framework meaningfully closes the gap the nine documents collectively describe. The BadHost disclosure, arriving four days after the OMB memorandum, sharpens that question: a Reference Architecture that does not include observation of behavior at the surface where independently correct components compose will be insufficient by construction to register the class of conditions BadHost exemplifies.

The M/613 harmonised standards mandate under the EU AI Act, currently in development, will face the same architectural decision. Whichever framework first treats receiver-side observation as a distinct and required layer will set the vocabulary the other adopts.


6. Why the convergence matters

When a major software vendor’s governance toolkit, a US executive office memorandum, a networking vendor, a supply chain research firm, a signals intelligence agency, an international newspaper, an independent red team developer, a coordinated counter-adversary operation involving a top-tier threat intelligence vendor, and a coordinated vulnerability disclosure proving the limits of AI-driven code auditing all describe the same structural problem within an eight-week window — from different institutional positions, in incompatible vocabularies, none coordinated with the others — the problem is no longer a research conjecture. It is an operational condition that AI agent deployments are already encountering and that institutional frameworks are now naming.

OSTIF named the condition explicitly in the BadHost disclosure: a “responsibility gap” where the absence of coordinated observation across composing layers forces each downstream consumer to secure itself independently. That vocabulary — responsibility gap — is precise. It describes what happens when no layer in the architecture is positioned to observe what emerges from the interaction of the others.

Eight of the documents describe the gap from different angles. One offers a sender-side architecture for half of it. The other half — receiver-side behavioral attestation, observed independently of the agent and its deployer, signed at the moment of occurrence, reasoned cohort-relative — remains an architectural decision waiting to be made.

Whether it is named explicitly in the CISA Reference Architecture, in the M/613 harmonised standards, in cyber insurance underwriting baselines, or in some convergence of all three, will determine how AI agent observability is operationalized for the rest of the decade.

The Spring 2026 evidence makes the implicit version untenable. The empirical floor is already operational. BadHost demonstrated, in the most concrete possible terms, that AI-driven code auditing at industrial scale cannot find the class of conditions that emerge from cross-layer composition. The architectural decision is whether to name the layer that can — and whether the institutions that will price, audit, and underwrite AI agent deployments treat longitudinal behavioral exposure as a category in its own right.


Notes

This is the fourth in a series of research notes examining the AI agent observability layer. Earlier notes are available at botconduct.org/research. Subsequent notes will examine the M/613 harmonised standards mandate and the convergence of cyber insurance underwriting requirements with AI-aware logging frameworks.