How do you trust the AI agent visiting your site?

Agents are failing in production. Regulations arrive in 2026. The market expects evidence, not documentation. BotConduct measures real agent behavior — independent, verifiable, and mapped to every framework regulators use.

Evaluate your agent Contact for evaluation

The reality

Agents are failing in production. The market knows it.

Adoption is nearly universal. Resistance evidence is not.

88%

reported security incidents with AI agents in the past year

14%

deploy agents with full security approval

79%

face significant adoption challenges

75%

of CEOs admit AI strategy is more performance than guidance

The model is the easy part. The hard part is knowing what your agent will do under real adversarial pressure — before you discover it in production. Your runtime gateway will catch some attacks. Your governance documents describe intent. Neither produces the behavioral evidence regulators and auditors are starting to ask for.

This already happened

Every one of these involved an agent operating without verifiable evidence of what it was doing.

Vercel — April 2026

AI agents accelerated a supply chain attack through Context.ai. The CEO publicly disclosed the breach.

Lovable — April 2026

Credentials and user AI chats exposed via IDOR vulnerability. $6.6B valuation. An agent accessed what no agent should have.

Perplexity — 2025

De-listed by Cloudflare for using stealth crawlers that rotated IPs and faked browser identity.

OpenClaw — April 2026

9 CVEs in 4 days. 135,000 instances publicly exposed. 341 malicious skills in the marketplace stealing credentials.

In every case, the agent operated without verifiable evidence of what it was doing. No measurement. No accountability. That's the gap BotConduct fills.

The regulatory reality

Three deadlines. Zero tools for behavioral evidence. Plus the operational gap above.

August 2, 2026

EU AI Act

High-risk AI systems must demonstrate robustness under adversarial conditions. Article 15 requires evidence, not documentation.

June 2026

Colorado AI Act

Deployers must complete impact assessments including testing for known risks. Self-attestation is explicitly insufficient.

Active

NIST AI RMF

Increasingly required in federal contracts. Measure function demands adversarial testing with documented results.

Compliance documents describe intent. Regulators ask for evidence of behavior.

Adversarial Evaluation Service

Productized red-teaming for AI agents in production. Three tiers. Same methodology. From $1,500.

Most adversarial evaluation services are designed for Fortune 500 budgets. We built ours for the SaaS company with 50-500 employees that just got a security questionnaire from an enterprise customer and needs verifiable evidence of how their AI agent behaves under adversarial conditions.

Self-service

Automated Assessment

$1,500

Submit your agent. We run 50+ adversarial scenarios automatically. Receive cryptographically signed behavioral trajectory in 72 hours.

50+ scenarios across 5 attack categories
Referenced against OWASP Top 10 Agentic, NIST AI RMF, EU AI Act
Signed evidence (Ed25519) ready for procurement conversations
Renewable every 90 days
No integration required

See pricing →

Guided Assessment

Contact

Everything in Automated, plus 90-minute expert session to interpret results, customize scenarios for your specific use case, and design mitigations.

All Automated features included
90-min interpretation session with adversarial ML expert
Up to 10 custom scenarios specific to your domain
Mitigation roadmap with priority ranking
Quarterly check-ins for 12 months
Renewable annually

See pricing →

Enterprise

Enterprise Engagement

Contact

Deep adversarial evaluation for regulated industries. Multi-week engagement. Full report. Boardroom-ready deliverables.

Multi-week dedicated engagement
Custom adversarial scenarios for your environment
Detailed report with executive summary
Live walkthrough with CISO and security team
Mapping to your specific regulatory framework
Ongoing advisory available

Contact for enterprise →

Same methodology across all tiers. The difference is depth, customization, and human attention. Most companies start with Automated to validate the approach, upgrade to Guided when they need to interpret results in their specific context, and only move to Enterprise when their regulatory environment requires it.

Framework mapping

One evaluation. Four frameworks covered.

Each BotConduct evaluation produces evidence that maps directly to regulatory requirements.

Framework	What it requires	How BotConduct covers it
NIST AI RMF Measure	Adversarial testing with documented results	BotConduct Evaluation: 5 scenarios, signed trajectory
OWASP Top 10 Agentic	Vulnerability identification and mapping	Direct mapping to 4 of 10 risk categories
MITRE ATLAS	AI threat modeling with tactics and techniques	Scenarios mapped to ATLAS tactics and techniques
EU AI Act Art. 15	Robustness evidence under adversarial conditions	Behavioral trajectory, cryptographically signed (Ed25519)

Measured, not modeled

What we found in 149 evaluations.

Conducted across 30 agents, including free stress tests and paid evaluations.

agents evaluated

adversarial scenarios

149

evaluations

false positives

30 agents tested. The data revealed two patterns nobody is talking about:

The ecosystem has one dominant unaddressed vector.
Across all agents, one type of attack succeeds far more often than the others. We disclose which under NDA. Hint: it's not the one you read about in published research.

Role beats governance.
The way an agent's prompt frames its function predicts its adversarial resistance better than any declared governance policy. In our N=30 evaluation, executor-role agents failed cost induction at 74% rate. Reviewer-role agents failed at 0% (Fisher exact p < 0.001). Governance score showed no significant correlation with resistance. We can show you exactly where on that spectrum your agent sits.

Want to know where your agent stands? Get an evaluation.

How it works

Three steps. One API call to start.

Submit your agent

Send your agent's system prompt via API. No SDK, no integration, no deployment changes.

We run 5 adversarial scenarios

Five adversarial scenarios test how your agent behaves under pressure. Conditions change during the session to measure resilience, not just initial response. Specific scenarios disclosed under NDA.

Receive signed evidence

Behavioral trajectory with Ed25519 signature. Framework mapping included. Evidence you can hand to a regulator.

curl -X POST https://botconduct.org/api/v3/training-center/start \
  -H "Content-Type: application/json" \
  -d '{"bot_name":"MyAgent","operator":"MyCompany","scenarios":["C1","C3"]}'

Request an evaluation →

Research alignment

Aligned with academic research

Our methodology is consistent with frameworks established in recent academic research, including DeepMind's "Practices for Governing Agentic AI Systems" (2024) and Cornell's "Agents of Chaos" (February 2026). We extend these frameworks with adversarial vectors not covered by either, particularly post-corruption state verification — where the agent actively validates false information when audited.

References: Agents of Chaos (Cornell, 2026)