← BotConduct API FAQ First Report

Bot Behavior Atlas

Observational dataset on 170+ automated agents that visit botconduct.org. Updated continuously. Methodology partial disclosure below.

Current state: 170+ distinct bots observed · 40 hostile · 50 acceptable · 33 exemplary
Update cadence: weekly summary, continuous passive observation
Data sources: our network's access logs + honeypot telemetry
Last updated: April 2026

Categories of observed agents

Search & index crawlers

The most visible category — large operators with consistent behavior and well-documented policies. Top performers across a full year of passive observation include Googlebot, Bingbot, YandexBot, Baiduspider, Applebot, DuckDuckBot. All currently scored at 92 or higher. These operators set the behavioral baseline that newer categories are measured against.

AI model training crawlers

Rapidly growing category. GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended, CCBot (Common Crawl), and Bytespider (ByteDance) all exhibit exemplary behavioral profiles. Their volume has approximately tripled year-over-year. Several maintain dedicated documentation and opt-out endpoints.

Autonomous AI agents (emerging)

The newest and most volatile category. Unlike training crawlers (which index), AI agents act on behalf of a user. ChatGPT-User, Claude-User, PerplexityBot fall here. Behavioral patterns show higher variance than crawlers and tend to diverge from their associated crawlers (example: ClaudeBot scores 100 while Claude-User scores 82 — an observed divergence that is common across AI labs).

SEO & marketing intelligence

Commercial SEO platforms operate large crawler fleets. AhrefsBot, SemrushBot, MJ12bot (Majestic), DotBot (Moz), DataForSEOBot, serpstatbot. Behavior is generally compliant but varies with crawler age — older crawlers tend to ignore newer site signals (Sec-Fetch headers, Crawler Hints). Scoring ranges 70–100 in this category.

E-commerce intelligence

Specialized crawlers focused on product catalogs and pricing. Keepa, CamelCamelCamel, Jungle Scout's affiliated crawlers, Feedvisor. These exhibit aggressive-but-predictable patterns: high request volume on product URLs, low on non-commerce content. Passive observation indicates variable respect for rate-limit headers.

Social listening & brand monitoring

Platforms that ingest brand mentions at scale. AwarioSmartBot, Brandwatch, Mention, Meltwater. Primary behavioral signature: selective crawling (follow RSS, skim paragraphs with known brand terms). Typically low footprint per site visited.

Security scanners

Both defensive and offensive. Legitimate research operators include Censys, Shodan, BinaryEdge. Unmarked scanners (large cloud-IP sweeps hitting /.env, /wp-admin, /config.json) also fall here — these typically score below 30.

Archival & academic

Heritage-preserving crawlers with strong behavioral records. Archive.org (Wayback Machine), IABot (Wikipedia), Google Scholar, Crossref, Semantic Scholar. Almost universally exemplary — their operations are specifically designed to respect every site signal.

Observable dimensions

Our behavioral engine scores each operator across ten dimensions. These dimensions span:

Identity declaration and stability
Respect for site signals (robots.txt, sitemap.xml, rate-limit responses)
Consistency of request footprint (IP concentration, User-Agent variance)
Avoidance of known-sensitive paths
Behavioral consistency across sessions
Several other proprietary axes that evolve as new bot behaviors emerge

Specific weights and thresholds are not published. The goal of the Atlas is observational — to document what operators do, not to disclose what our engine measures.

Sample: top-tier operators (score ≥ 90)

Bot name	Operator	Score	Category
GPTBot	OpenAI	100	AI training
ChatGPT-User	OpenAI	100	AI agent
ClaudeBot	Anthropic	100	AI training
Bingbot	Microsoft	100	Search
Bytespider	ByteDance	100	AI training
Baiduspider	Baidu	100	Search
YandexBot	Yandex	100	Search
Facebook	Meta	100	Social preview
redditbot	Reddit	100	Social index
PerplexityBot	Perplexity	100	AI agent
AwarioSmartBot	Awario	100	Social listening
Googlebot	Google	92	Search
Applebot	Apple	92	Search + Apple Intelligence
AhrefsBot	Ahrefs	90	SEO
serpstatbot	Serpstat	97	SEO
TwitterBot	X Corp	97	Social preview
LinkedInBot	LinkedIn	97	Social preview

Notable divergences

A divergence occurs when two bots from the same operator exhibit materially different conduct. Interesting cases:

ClaudeBot (100) vs Claude-User (82). The crawler scores exemplary; the autonomous agent scores acceptable with signals of variance. A consistent pattern observed across AI labs where browser-style agents inherit less compliance than dedicated indexers.
GPTBot (100) vs ChatGPT-User (100). No divergence — OpenAI's dedicated agent inherits the same policies as the crawler. Rare consistency.
Googlebot (92) vs GoogleOther (lower). Googlebot is the crawl-with-permission bot; GoogleOther handles AI and experimental workloads with looser constraints.

Methodology notes

Passive observation: continuous scoring based on access-log telemetry. No cooperation from operators required. Updated hourly.

Active test: operators can voluntarily submit their bots to our adversarial test environment. Results are calibrated against passive observation. Operators with persistent divergence between the two are flagged.

Classification: the ten-dimension rubric is proprietary and evolves continuously. Categories of signals are documented above; specific weights and thresholds are not.

For operators: if your bot appears in this Atlas and you'd like to claim the entry, or you want to discuss the observed behavior privately before any public reference, reach out. We share specific telemetry under NDA on request. hello@botconduct.org

Data export

Aggregate counts are accessible at /api/registry without authentication. Full per-bot records require an API key; get one in 30 seconds.

Atlas version 1.0 — published April 2026. Part of the Bot Conduct Standard research program.
Feedback or corrections: hello@botconduct.org · @botconduct