Observational dataset on 170+ automated agents that visit botconduct.org. Updated continuously. Methodology partial disclosure below.
The most visible category — large operators with consistent behavior and well-documented policies. Top performers across a full year of passive observation include Googlebot, Bingbot, YandexBot, Baiduspider, Applebot, DuckDuckBot. All currently scored at 92 or higher. These operators set the behavioral baseline that newer categories are measured against.
Rapidly growing category. GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended, CCBot (Common Crawl), and Bytespider (ByteDance) all exhibit exemplary behavioral profiles. Their volume has approximately tripled year-over-year. Several maintain dedicated documentation and opt-out endpoints.
The newest and most volatile category. Unlike training crawlers (which index), AI agents act on behalf of a user. ChatGPT-User, Claude-User, PerplexityBot fall here. Behavioral patterns show higher variance than crawlers and tend to diverge from their associated crawlers (example: ClaudeBot scores 100 while Claude-User scores 82 — an observed divergence that is common across AI labs).
Commercial SEO platforms operate large crawler fleets. AhrefsBot, SemrushBot, MJ12bot (Majestic), DotBot (Moz), DataForSEOBot, serpstatbot. Behavior is generally compliant but varies with crawler age — older crawlers tend to ignore newer site signals (Sec-Fetch headers, Crawler Hints). Scoring ranges 70–100 in this category.
Specialized crawlers focused on product catalogs and pricing. Keepa, CamelCamelCamel, Jungle Scout's affiliated crawlers, Feedvisor. These exhibit aggressive-but-predictable patterns: high request volume on product URLs, low on non-commerce content. Passive observation indicates variable respect for rate-limit headers.
Platforms that ingest brand mentions at scale. AwarioSmartBot, Brandwatch, Mention, Meltwater. Primary behavioral signature: selective crawling (follow RSS, skim paragraphs with known brand terms). Typically low footprint per site visited.
Both defensive and offensive. Legitimate research operators include Censys, Shodan, BinaryEdge. Unmarked scanners (large cloud-IP sweeps hitting /.env, /wp-admin, /config.json) also fall here — these typically score below 30.
Heritage-preserving crawlers with strong behavioral records. Archive.org (Wayback Machine), IABot (Wikipedia), Google Scholar, Crossref, Semantic Scholar. Almost universally exemplary — their operations are specifically designed to respect every site signal.
Our behavioral engine scores each operator across ten dimensions. These dimensions span:
robots.txt, sitemap.xml, rate-limit responses)Specific weights and thresholds are not published. The goal of the Atlas is observational — to document what operators do, not to disclose what our engine measures.
| Bot name | Operator | Score | Category |
|---|---|---|---|
| GPTBot | OpenAI | 100 | AI training |
| ChatGPT-User | OpenAI | 100 | AI agent |
| ClaudeBot | Anthropic | 100 | AI training |
| Bingbot | Microsoft | 100 | Search |
| Bytespider | ByteDance | 100 | AI training |
| Baiduspider | Baidu | 100 | Search |
| YandexBot | Yandex | 100 | Search |
| Meta | 100 | Social preview | |
| redditbot | 100 | Social index | |
| PerplexityBot | Perplexity | 100 | AI agent |
| AwarioSmartBot | Awario | 100 | Social listening |
| Googlebot | 92 | Search | |
| Applebot | Apple | 92 | Search + Apple Intelligence |
| AhrefsBot | Ahrefs | 90 | SEO |
| serpstatbot | Serpstat | 97 | SEO |
| TwitterBot | X Corp | 97 | Social preview |
| LinkedInBot | 97 | Social preview |
A divergence occurs when two bots from the same operator exhibit materially different conduct. Interesting cases:
Passive observation: continuous scoring based on access-log telemetry. No cooperation from operators required. Updated hourly.
Active test: operators can voluntarily submit their bots to our adversarial test environment. Results are calibrated against passive observation. Operators with persistent divergence between the two are flagged.
Classification: the ten-dimension rubric is proprietary and evolves continuously. Categories of signals are documented above; specific weights and thresholds are not.
Aggregate counts are accessible at /api/registry without authentication. Full per-bot records require an API key; get one in 30 seconds.
Atlas version 1.0 — published April 2026. Part of the Bot Conduct Standard research program.
Feedback or corrections: hello@botconduct.org · @botconduct