How do you trust the AI agent visiting your site?

Agents are failing in production. Regulations arrive in 2026. The market expects evidence, not documentation. BotConduct measures real agent behavior — independent, verifiable, and mapped to every framework regulators use.

The reality

Agents are failing in production. The market knows it.

Adoption is nearly universal. Resistance evidence is not.

88%
reported security incidents with AI agents in the past year
14%
deploy agents with full security approval
79%
face significant adoption challenges
75%
of CEOs admit AI strategy is more performance than guidance

The model is the easy part. The hard part is knowing what your agent will do under real adversarial pressure — before you discover it in production. Your runtime gateway will catch some attacks. Your governance documents describe intent. Neither produces the behavioral evidence regulators and auditors are starting to ask for.

This already happened

Every one of these involved an agent operating without verifiable evidence of what it was doing.

Vercel — April 2026

AI agents accelerated a supply chain attack through Context.ai. The CEO publicly disclosed the breach.

Lovable — April 2026

Credentials and user AI chats exposed via IDOR vulnerability. $6.6B valuation. An agent accessed what no agent should have.

Perplexity — 2025

De-listed by Cloudflare for using stealth crawlers that rotated IPs and faked browser identity.

OpenClaw — April 2026

9 CVEs in 4 days. 135,000 instances publicly exposed. 341 malicious skills in the marketplace stealing credentials.

In every case, the agent operated without verifiable evidence of what it was doing. No measurement. No accountability. That's the gap BotConduct fills.

The regulatory reality

Three deadlines. Zero tools for behavioral evidence. Plus the operational gap above.

August 2, 2026

EU AI Act

High-risk AI systems must demonstrate robustness under adversarial conditions. Article 15 requires evidence, not documentation.

June 2026

Colorado AI Act

Deployers must complete impact assessments including testing for known risks. Self-attestation is explicitly insufficient.

Active

NIST AI RMF

Increasingly required in federal contracts. Measure function demands adversarial testing with documented results.

Compliance documents describe intent. Regulators ask for evidence of behavior.

Adversarial Evaluation Service

Productized red-teaming for AI agents in production. Three tiers. Same methodology. From $1,500.

Most adversarial evaluation services are designed for Fortune 500 budgets. We built ours for the SaaS company with 50-500 employees that just got a security questionnaire from an enterprise customer and needs verifiable evidence of how their AI agent behaves under adversarial conditions.

Self-service

Automated Assessment

$1,500
Submit your agent. We run 50+ adversarial scenarios automatically. Receive cryptographically signed behavioral trajectory in 72 hours.
  • 50+ scenarios across 5 attack categories
  • Referenced against OWASP Top 10 Agentic, NIST AI RMF, EU AI Act
  • Signed evidence (Ed25519) ready for procurement conversations
  • Renewable every 90 days
  • No integration required
See pricing →
Enterprise

Enterprise Engagement

Contact
Deep adversarial evaluation for regulated industries. Multi-week engagement. Full report. Boardroom-ready deliverables.
  • Multi-week dedicated engagement
  • Custom adversarial scenarios for your environment
  • Detailed report with executive summary
  • Live walkthrough with CISO and security team
  • Mapping to your specific regulatory framework
  • Ongoing advisory available
Contact for enterprise →

Same methodology across all tiers. The difference is depth, customization, and human attention. Most companies start with Automated to validate the approach, upgrade to Guided when they need to interpret results in their specific context, and only move to Enterprise when their regulatory environment requires it.

Framework mapping

One evaluation. Four frameworks covered.

Each BotConduct evaluation produces evidence that maps directly to regulatory requirements.

FrameworkWhat it requiresHow BotConduct covers it
NIST AI RMF MeasureAdversarial testing with documented resultsBotConduct Evaluation: 5 scenarios, signed trajectory
OWASP Top 10 AgenticVulnerability identification and mappingDirect mapping to 4 of 10 risk categories
MITRE ATLASAI threat modeling with tactics and techniquesScenarios mapped to ATLAS tactics and techniques
EU AI Act Art. 15Robustness evidence under adversarial conditionsBehavioral trajectory, cryptographically signed (Ed25519)
Measured, not modeled

What we found in 149 evaluations.

Conducted across 30 agents, including free stress tests and paid evaluations.

30
agents evaluated
5
adversarial scenarios
149
evaluations
0
false positives

30 agents tested. The data revealed two patterns nobody is talking about:

The ecosystem has one dominant unaddressed vector.
Across all agents, one type of attack succeeds far more often than the others. We disclose which under NDA. Hint: it's not the one you read about in published research.

Role beats governance.
The way an agent's prompt frames its function predicts its adversarial resistance better than any declared governance policy. In our N=30 evaluation, executor-role agents failed cost induction at 74% rate. Reviewer-role agents failed at 0% (Fisher exact p < 0.001). Governance score showed no significant correlation with resistance. We can show you exactly where on that spectrum your agent sits.

Want to know where your agent stands? Get an evaluation.

How it works

Three steps. One API call to start.

01

Submit your agent

Send your agent's system prompt via API. No SDK, no integration, no deployment changes.

02

We run 5 adversarial scenarios

Five adversarial scenarios test how your agent behaves under pressure. Conditions change during the session to measure resilience, not just initial response. Specific scenarios disclosed under NDA.

03

Receive signed evidence

Behavioral trajectory with Ed25519 signature. Framework mapping included. Evidence you can hand to a regulator.

curl -X POST https://botconduct.org/api/v3/training-center/start \
  -H "Content-Type: application/json" \
  -d '{"bot_name":"MyAgent","operator":"MyCompany","scenarios":["C1","C3"]}'

Request an evaluation →

Research alignment

Aligned with academic research

Our methodology is consistent with frameworks established in recent academic research, including DeepMind's "Practices for Governing Agentic AI Systems" (2024) and Cornell's "Agents of Chaos" (February 2026). We extend these frameworks with adversarial vectors not covered by either, particularly post-corruption state verification — where the agent actively validates false information when audited.

References: Agents of Chaos (Cornell, 2026)