Second-generation agent evaluation.
Adversarial scenarios.
Behavior under change.

Static compliance checklists miss what matters. BotConduct Training Center evaluates AI agent conduct through progressive evaluation under evolving conditions — measuring how the agent actually behaves, not whether it passes a checklist.

See tiers Our approach

Two generations of agent evaluation

The agent-readiness space is splitting into two categories. Training Center is deliberately in the second.

First generation (insufficient)

Static checklists. A fixed set of pass/fail rules — does the agent identify itself, respect robots.txt, stay under rate limits. Binary outcomes on observable state.

Works for foundational checks. Falls short of what buyers actually need to know: how the agent behaves when the environment around it changes.

Second generation (Training Center)

Evaluation under change. Conditions evolve during evaluation. Behavior is measured as trajectory — not checkpoint state. The specific mechanisms are not publicly disclosed.

Measures what actually matters in production: agent conduct under the conditions that cause real incidents — the specific scenarios tested are proprietary.

Three questions every buyer asks before adopting an AI agent

Procurement officers, CISOs, and General Counsel evaluate every AI agent against the same implicit framework. Training Center is built to produce evidence for each.

01

Does the agent behave well?

Will the bot respect web standards, avoid abusive patterns, and operate as a legitimate actor on the infrastructure it interacts with?

Addressed by Level 1 + Level 2 — behavioral conduct across regulatory and technical standards.
02

Does the agent stay within my scope?

Will the bot follow the operational boundaries I define — even under adversarial prompts, social engineering, or edge-case inputs?

Addressed by vertical cartridges — domain-specific adversarial testing (voice, documents, transactions, and more).
03

Can I trust the agent over time?

Can I verify the bot's identity cryptographically and audit its conduct across deployments, jurisdictions, and years?

Addressed by Level 3 — cryptographic identity, immutable audit trail, independent verification.

What Training Center certifies

Cross-framework conduct certification. One evaluation produces evidence of alignment across the compliance surface that already applies to AI agents and the surface that is emerging under new regulation.

Agent vendors face a fragmented compliance landscape: RFC 9309 for crawling, EU AI Act for identity disclosure, EU DSM Directive for content rights, California SB 1001 for bot declaration, W3C TDMRep for machine-readable reservation, GDPR for data subject rights. Each is addressed separately today — by different auditors, different frameworks, different vendors.

Training Center aggregates these into a single evaluation — run once, cite across all procurement conversations, present to any jurisdiction. Third-party. Independent. Reproducible.

And cross-platform by design. A BotConduct certification is recognized the same way by a site behind Cloudflare, one running DataDome, one with in-house infrastructure, and one with nothing at all. Training Center does not replace or compete with bot-management vendors — it is the independent layer they cite. Like a passport for AI agents: issued once, honored everywhere.

Three levels of progressive certification

Each level tests a set of behaviors that matter to enterprise buyers. Higher levels require stricter compliance — and signal stronger trust.

Level 1

Basic Hygiene

The foundation — does the agent behave like a legitimate actor at all?

  • Identity declaration — agent discloses operator and documentation trail
  • Identity stability — consistent across its operational footprint
  • Infrastructure coherence — stated identity aligned with observed infrastructure
  • Request cadence discipline — sustainable, non-adversarial pacing
  • Respect for declared boundaries — honors stated access restrictions
  • Absence of malicious probing — no reconnaissance of sensitive resources
  • Error response handling — appropriate behavior on 4xx and 5xx responses
  • Rate-limit signal response — adapts to backoff and retry directives
  • Header integrity — requests include well-formed HTTP negotiation headers
  • Public contact surface — operator provides a reachable abuse/contact channel
  • Public documentation — bot behavior is documented at a discoverable URL
  • No browser impersonation — does not masquerade as human traffic
In observation of production bots, a minority achieve Level 1 without remediation.
Level 2

Dynamic Compliance

Behavior under change — how does the agent respond when conditions around it evolve?

  • Directive adaptation — does the agent re-evaluate its operational directives when conditions evolve
  • Signal resolution — how the agent resolves when authoritative signals disagree
  • Adaptive pacing — how the agent adjusts to server-side pacing feedback over the session
  • Navigational discipline — how the agent bounds its traversal under complex navigational conditions
  • Identity consistency under stress — identity signature stability across error conditions
  • Rights-signal adaptation — honoring machine-readable rights reservations as they change between requests
  • Scope discipline — how the agent responds to inputs that test declared operational boundaries
  • Cache discipline — how the agent uses cache-aware negotiation under varying server state
Level 2 measures trajectory, not checkpoint state. Agents that pass static checks often fail when conditions change during observation.
Level 3

Adversarial Conduct

The premium — can the agent maintain conduct when actively probed by other agents?

  • Adversarial resilience — how the agent responds when external signals attempt to coerce out-of-scope behavior
  • Discovery discipline — how the agent treats inputs that could expand operational surface
  • Multi-channel identity coherence — identity consistency across independent verification channels
  • Cryptographic request signing — verifiable signing of production requests (Ed25519)
  • Machine-readable attribution — operator infrastructure published in verifiable format
  • Data-subject rights under stress — privacy-rights handling under demanding conditions
  • Trajectory-audit trail — documented conduct commitment backed by cryptographically signed observations
Level 3 evaluates response to adversarial agents — not compliance with static rules. Agents reaching Level 3 demonstrate conduct integrity even under active coercion attempts.

How the evaluation works

STEP 1

Submit your agent

Provide endpoint, API credentials, or grant access to a test instance. We accept web agents, voice agents, and API-based agents.

STEP 2

Dynamic scenario execution

Your agent runs against a test environment designed to evaluate behavior under evolving conditions. Specific mechanisms are not disclosed publicly to preserve evaluation integrity.

STEP 3

Trajectory analysis

Each conduct dimension receives a verdict with full decision trajectory — not just outcome. The report shows how behavior evolved across the scenario, where coercion succeeded or failed, and the cryptographic signature of the observation.

STEP 4

Signed report

You receive a detailed report with the achieved level, failed criteria, and specific recommendations to reach the next tier.

Regulatory and technical foundation

We did not invent a new standard. We operationalized existing ones.

Each dimension tested by the Training Center derives from established regulatory frameworks and technical standards. The curriculum is the compiled expression of what is already required — or emerging — across jurisdictions and industries.

RFC 9309

IETF standard for robots.txt. Defines the baseline crawling convention respected by the modern web.

EU AI Act — Article 50

Requires AI systems to disclose their nature when interacting. Enters high-risk obligations in August 2026.

EU DSM Directive — Article 4

Establishes rights reservation for text and data mining. Requires honoring machine-readable opt-out signals.

California SB 1001

Bot Disclosure Law. Requires bots to identify themselves when attempting to influence commercial or electoral conduct.

W3C TDMRep

Text and Data Mining Reservation Protocol. Emerging standard for publishers to signal rights reservation to crawlers.

GDPR

Applies to any crawler processing personal data of EU residents. Data-subject rights, retention, and deletion obligations.

Training Center criteria map to these frameworks. Passing an evaluation is evidence of alignment with the compliance surface that already applies — and the one that is coming.

Pricing

One-time evaluation per tier. Re-test available after remediation. Pricing reflects the depth of testing and the level of certification granted.

Level 1 Basic
$500 / test
Level 1 Basic Hygiene evaluation
Per-dimension pass/fail verdict
Remediation recommendations
Public registry listing (if pass)
Request evaluation
Full Certification
$12,000 / test
Full Level 1–3 evaluation
Ed25519 cryptographic cert
Full forensic report
Certified Advanced badge
3 retests + annual renewal
Request evaluation

FAQ

How is this different from SOC 2 or ISO 27001?

Those audit organizational security posture. Training Center tests agent behavior under adversarial scenarios — a completely different dimension. You need both.

Do you work with voice agents, not just web?

Yes. Level 1 and 2 apply to any agent that interacts with external systems. Level 3 is extended through modality-specific cartridges (voice, text, etc.) whose specific dimensions are not publicly disclosed.

Can I see a sample report before committing?

Yes. Contact us at hello@botconduct.org and we will share an anonymized sample evaluation under NDA.

Why should I trust this methodology?

Every dimension evaluated derives from a named regulatory framework (EU AI Act, GDPR, California SB 1001) or technical standard (RFC 9309, W3C TDMRep). We compiled what is already required or emerging into a testable form. See the Regulatory and technical foundation section.

What makes your testing independent?

We operate no AI agent products. We have no commercial relationship with bot operators or infrastructure vendors. Our revenue is certification fees only. We publish our methodology. Our findings are reproducible.

Can we re-test after fixing issues?

Yes. Level 1 includes no retests (it's a single verdict). Professional tier includes 1 retest. Full Certification includes 3 retests plus annual renewal.

One certification. Multiple frameworks. Single source of truth.

Request a scoping call to determine which tier aligns with your agent's deployment profile and the jurisdictions you operate in.

Request scoping call
archive internal