Bot Behavior Atlas

Observational dataset on 170+ automated agents that visit botconduct.org. Updated continuously. Methodology partial disclosure below.

Current state: 170+ distinct bots observed · 40 hostile · 50 acceptable · 33 exemplary
Update cadence: weekly summary, continuous passive observation
Data sources: our network's access logs + honeypot telemetry
Last updated: April 2026

Categories of observed agents

Search & index crawlers

The most visible category — large operators with consistent behavior and well-documented policies. Top performers across a full year of passive observation include Googlebot, Bingbot, YandexBot, Baiduspider, Applebot, DuckDuckBot. All currently scored at 92 or higher. These operators set the behavioral baseline that newer categories are measured against.

AI model training crawlers

Rapidly growing category. GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended, CCBot (Common Crawl), and Bytespider (ByteDance) all exhibit exemplary behavioral profiles. Their volume has approximately tripled year-over-year. Several maintain dedicated documentation and opt-out endpoints.

Autonomous AI agents (emerging)

The newest and most volatile category. Unlike training crawlers (which index), AI agents act on behalf of a user. ChatGPT-User, Claude-User, PerplexityBot fall here. Behavioral patterns show higher variance than crawlers and tend to diverge from their associated crawlers (example: ClaudeBot scores 100 while Claude-User scores 82 — an observed divergence that is common across AI labs).

SEO & marketing intelligence

Commercial SEO platforms operate large crawler fleets. AhrefsBot, SemrushBot, MJ12bot (Majestic), DotBot (Moz), DataForSEOBot, serpstatbot. Behavior is generally compliant but varies with crawler age — older crawlers tend to ignore newer site signals (Sec-Fetch headers, Crawler Hints). Scoring ranges 70–100 in this category.

E-commerce intelligence

Specialized crawlers focused on product catalogs and pricing. Keepa, CamelCamelCamel, Jungle Scout's affiliated crawlers, Feedvisor. These exhibit aggressive-but-predictable patterns: high request volume on product URLs, low on non-commerce content. Passive observation indicates variable respect for rate-limit headers.

Social listening & brand monitoring

Platforms that ingest brand mentions at scale. AwarioSmartBot, Brandwatch, Mention, Meltwater. Primary behavioral signature: selective crawling (follow RSS, skim paragraphs with known brand terms). Typically low footprint per site visited.

Security scanners

Both defensive and offensive. Legitimate research operators include Censys, Shodan, BinaryEdge. Unmarked scanners (large cloud-IP sweeps hitting /.env, /wp-admin, /config.json) also fall here — these typically score below 30.

Archival & academic

Heritage-preserving crawlers with strong behavioral records. Archive.org (Wayback Machine), IABot (Wikipedia), Google Scholar, Crossref, Semantic Scholar. Almost universally exemplary — their operations are specifically designed to respect every site signal.

Observable dimensions

Our behavioral engine scores each operator across ten dimensions. These dimensions span:

Specific weights and thresholds are not published. The goal of the Atlas is observational — to document what operators do, not to disclose what our engine measures.

Sample: top-tier operators (score ≥ 90)

Bot nameOperatorScoreCategory
GPTBotOpenAI100AI training
ChatGPT-UserOpenAI100AI agent
ClaudeBotAnthropic100AI training
BingbotMicrosoft100Search
BytespiderByteDance100AI training
BaiduspiderBaidu100Search
YandexBotYandex100Search
FacebookMeta100Social preview
redditbotReddit100Social index
PerplexityBotPerplexity100AI agent
AwarioSmartBotAwario100Social listening
GooglebotGoogle92Search
ApplebotApple92Search + Apple Intelligence
AhrefsBotAhrefs90SEO
serpstatbotSerpstat97SEO
TwitterBotX Corp97Social preview
LinkedInBotLinkedIn97Social preview

Notable divergences

A divergence occurs when two bots from the same operator exhibit materially different conduct. Interesting cases:

Methodology notes

Passive observation: continuous scoring based on access-log telemetry. No cooperation from operators required. Updated hourly.

Active test: operators can voluntarily submit their bots to our adversarial test environment. Results are calibrated against passive observation. Operators with persistent divergence between the two are flagged.

Classification: the ten-dimension rubric is proprietary and evolves continuously. Categories of signals are documented above; specific weights and thresholds are not.

For operators: if your bot appears in this Atlas and you'd like to claim the entry, or you want to discuss the observed behavior privately before any public reference, reach out. We share specific telemetry under NDA on request. hello@botconduct.org

Data export

Aggregate counts are accessible at /api/registry without authentication. Full per-bot records require an API key; get one in 30 seconds.

Atlas version 1.0 — published April 2026. Part of the Bot Conduct Standard research program.
Feedback or corrections: hello@botconduct.org · @botconduct