Receiver-side observations on automated actor activity across multiple industry verticals. First edition of the monthly behavioral briefing.
May 2026
Receiver-Side Observations on Automated Actor Activity
Published: May 2026 Period covered: April–May 2026
An independent observatory. We don't sell the products we classify.
\newpage
Executive Summary
Eighty-three percent of the automated traffic reaching enterprise web surfaces in our multi-vertical observatory operates in the gap between identity verification and transaction monitoring — a layer that no product in the current defensive stack is designed to observe.
This is not a prediction. It is an observation from receiver-side behavioral data collected across financial services, healthcare, e-commerce, government, and adjacent verticals during the April–May 2026 observation window.
Three findings define the current landscape:
1. The Extraction Majority. The 83% is not random noise. Over two-thirds of observed automated traffic exhibits targeted extraction patterns — precision navigation to high-value paths, session structures designed to avoid rate limits, and behavioral profiles inconsistent with any declared purpose. Another 16% operates as stealth harvesters. For every session classified as legitimate, the observatory recorded approximately 14 exhibiting extraction or stealth patterns. If your defensive stack does not include behavioral observation at the navigation layer, this traffic is invisible to you.
2. Cross-Vertical Actor Persistence. Approximately 5% of persistent automated actors were observed operating across two or more industry verticals within a 30-day window. A WAF deployed on a single property sees one visit from an unknown actor. The observatory, correlating across verticals, sees the same actor active on multiple monitored properties across different industries — within the same week. Single-property defenses cannot detect multi-property intent.
3. Infrastructure Origin Shift. Chinese cloud infrastructure providers now collectively source more automated behavioral traffic to Western enterprise surfaces than any single US-based provider. Privacy relay infrastructure has entered the top ten sources. The attribution assumptions embedded in most security stacks are outdated.
\newpage
Central Thesis
The defensive stack built for the human-to-application era — identity, WAF, CDN, DLP, SIEM — was not designed for an era where AI agents act autonomously on behalf of authenticated humans. The behavioral layer between authentication and outcome is unmonitored in the majority of enterprise deployments. This briefing documents what we observe in that gap.
Identity verification confirms who the human is. It does not observe what their agent does after authentication.
Content delivery networks optimize and cache. They classify requests by signature, not by behavioral evolution across sessions.
Web application firewalls match patterns against known attack signatures. The majority of extraction-pattern traffic observed in this briefing carries no signature that would trigger a rule.
Data loss prevention watches what leaves. It does not observe the behavioral sequence that precedes extraction.
SIEM aggregates events from each of these layers. If no layer generates an event for behavioral navigation patterns, the SIEM has nothing to aggregate.
The gap is structural, not operational. No amount of tuning the existing stack closes it. It requires a new layer of observation.
\newpage
Methodology Note
Observation period: April–May 2026
Approach: Receiver-side behavioral observation across multiple industry verticals. All classification is based on observed behavior, not declared identity.
Sample: Multi-vertical observatory covering financial services, e-commerce, media, healthcare, government, and adjacent verticals. Observations span thousands of sessions from thousands of unique actors.
Classification method: Behavioral pattern analysis based on:
- Navigation sequences and page-access patterns
- Session coherence and temporal consistency
- Evasion signal detection (TLS fingerprinting anomalies, robots.txt compliance, User-Agent consistency)
- Cross-session longitudinal correlation across observation windows
Limitations:
- The observatory reflects a receiver-side perspective only. We observe what arrives; we do not observe the actor's origin infrastructure beyond what is disclosed or inferable from network metadata.
- Declared intent (User-Agent strings, robots.txt compliance) is recorded but not trusted as the basis for classification. Behavioral signals take precedence.
- The sample may over-represent sectors where automated extraction activity is more prevalent. Findings should not be extrapolated to all web traffic without qualification.
- The observatory operates across sites under monitoring agreements.
\newpage
Key Findings
Finding 1: The Extraction Majority
Over 83% of observed automated traffic falls into two behavioral categories: targeted extraction (~67%) and stealth harvesting (~16%). These actors navigate with precision, avoid common detection signals, and operate predominantly from cloud infrastructure.
By contrast, only approximately 5% of observed traffic exhibits patterns consistent with legitimate automated use — combining respectful crawling behavior with declared legitimate user patterns.
The ratio is stark: for every session classified as legitimate or respectful, the observatory recorded approximately 14 sessions exhibiting extraction or stealth patterns.
Identity verification confirms who requested access. It says nothing about what happens next. The extraction majority does not need to defeat authentication — it operates after authentication, or on surfaces where authentication is not required. This is a class of activity that identity solutions were never designed to observe.
Content delivery and application firewall layers classify by signature. The behavioral patterns documented here — precision path targeting, cross-session escalation, TLS evasion — carry no signature in any widely deployed rule set. The traffic passes through because it was never defined as something to block.
A small but notable segment (~1.3%) of traffic exhibited patterns consistent with content training — systematic, broad-coverage access patterns that differ from targeted extraction in their breadth and pacing. An emerging category of AI agent exploration traffic (~0.3%) was also observed, characterized by conversational-style navigation patterns distinct from traditional automated access.
So what: If your stack does not include a behavioral observation layer between identity and transaction, 83% of the automated activity on your surfaces is unmonitored. Not blocked, not allowed — simply unseen.
Finding 2: Cross-Vertical Actor Persistence
Approximately 5% of detected actors were observed operating across two or more industry verticals within a 30-day window. Among persistent actors (those with five or more sessions), cross-vertical operation was significantly more common.
These actors exhibit consistent behavioral fingerprints — TLS characteristics, navigation patterns, timing signatures — that enable correlation across otherwise unrelated properties.
What this means in practice: A WAF deployed on a single property sees one visit from an unknown actor. The observatory, correlating across verticals, sees the same actor active on multiple monitored properties across different industries — within the same week.
Behavioral patterns observed are not static. Automated actors adapt, escalate, and shift patterns across sessions — behavior that point-in-time assessment cannot capture.
So what: Risk assessment at the individual property level systematically understates exposure. If your security vendor cannot show you cross-property behavioral correlation, you are seeing isolated events, not the pattern.
Finding 3: Infrastructure Origin Shift — Asia-Pacific Cloud Providers
Chinese cloud providers (Tencent Cloud and Alibaba Cloud combined) now source more observed automated behavioral traffic to Western enterprise surfaces than AWS alone. This represents a measurable shift from patterns observed in prior periods, where US-based cloud providers dominated automated traffic origin.
Top infrastructure providers by observed automated session volume:
| Rank | Provider | Region |
|---|---|---|
| 1 | Tencent Cloud | Asia-Pacific |
| 2 | AWS | North America |
| 3 | Cloudflare | Global (CDN) |
| 4 | Microsoft Azure | North America |
| 5 | Google Cloud | North America |
| 6 | Hetzner | Europe |
| 7 | Alibaba Cloud | Asia-Pacific |
| 8 | OVH | Europe |
| 9 | DigitalOcean | North America |
| 10 | Apple iCloud Private Relay | Global (Privacy) |
The presence of Apple's iCloud Private Relay in the top ten is notable: privacy-preserving relay infrastructure is now a measurable source of automated behavioral traffic, complicating IP-based attribution approaches.
Note on sample composition: This distribution may reflect both genuine geographic trends in automated traffic and the multi-vertical composition of the observatory sample. Cross-verification with other public observatories is recommended before drawing geopolitical conclusions.
So what: Infrastructure origin is shifting faster than defensive assumptions. If your threat model assumes US-cloud-dominant automated traffic, you are calibrated for the previous era. And if your attribution strategy depends on IP reputation, the rise of privacy relay infrastructure renders it increasingly unreliable.
Finding 4: Evasion Sophistication
Among actors classified as stealth harvesters, the most prevalent evasion signal was missing TLS ALPN extension — a technical indicator present in over 80% of stealth sessions. This signals widespread use of HTTP client libraries that do not fully replicate browser TLS negotiation behavior.
Top evasion signals observed:
- Missing TLS ALPN extension — The TLS handshake omits the Application-Layer Protocol Negotiation extension that all modern browsers include. Present in the vast majority of stealth harvester sessions.
- Robots.txt non-compliance — Actors that fetched and then ignored robots.txt directives, or never requested robots.txt at all before accessing restricted paths.
- Browser/TLS mismatch — Actors presenting browser User-Agent strings (Chrome, Firefox) while exhibiting TLS fingerprints inconsistent with any known browser version.
- Precision path targeting — Actors navigating directly to high-value paths (pricing pages, API endpoints, data feeds) without the referral chain a human session would exhibit.
The evasion spectrum reveals an important structural gap. Application-layer defenses inspect what the actor declares (User-Agent, headers, cookies). Transport-layer observation reveals what the actor actually is (TLS fingerprint, protocol behavior, connection characteristics). Most deployed stacks inspect one layer or the other. The extraction majority exploits the gap between them.
So what: The signal is already available. Over 80% of stealth-class sessions exhibit TLS characteristics inconsistent with their declared browser identity. This signal does not require cooperation from the sender — any receiver can observe it. The question is whether your stack is configured to look.
Declared vs. Undeclared Automated Traffic
Eight major AI crawlers currently declare their identity via User-Agent headers when accessing web properties: Anthropic-AI, Bytespider, CCBot, ChatGPT-User, ClaudeBot, Google-Extended, GPTBot, and PerplexityBot.¹
In the observatory data, sessions from declared AI crawlers accounted for approximately 4% of total observed traffic during the period. The remaining approximately 96% of automated traffic — including the 83% classified as extraction or stealth patterns — operated without any AI-specific declaration.
Among the declared AI crawlers observed, the behavioral profile was notable: the majority of declared AI sessions were classified as targeted extraction or stealth harvesting, not as respectful crawling. Less than 4% of declared AI crawler sessions exhibited behavior consistent with respectful crawling (robots.txt compliance, moderate rate, consistent identification).
What this means: Blocking or managing traffic based solely on declared AI crawler User-Agent strings addresses approximately 4% of automated activity. The vast majority of extraction-pattern traffic carries no declaration that identifies it as AI-operated. Robots.txt directives aimed at specific crawlers — while useful for the compliant minority — do not reach the behavioral majority.
Implication for defenders: AI crawler management policies that rely exclusively on User-Agent identification are addressing the visible fraction. Behavioral observation is required to see the rest.
¹ Listed alphabetically. Presence in this list does not imply ranking by volume or risk. Other declared crawlers exist but were not consistently observed during the period.
\newpage
Behavioral Pattern Spotlight: "The Evolving Actor"
Real observed case (anonymized). Multi-vertical observation, tracked over a 26-day window via TLS behavioral fingerprint.
Observation
A single actor, identified by consistent TLS characteristics across sessions, exhibited five distinct behavioral phases over over one hundred sessions spanning 26 days.
Phase 1: Respectful Crawling (Early Sessions)
- Read robots.txt, low request volume, consistent User-Agent
- Behavioral classification: RESPECTFUL_CRAWLER
Phase 2: Legitimate Transition
- Shifted to patterns consistent with legitimate browsing — varied page access, moderate depth
- Behavioral classification: LEGITIMATE_USER
Phase 3: Targeted Escalation
- Began targeted extraction of specific content paths, increasing session depth
- Behavioral classification: TARGETED_EXTRACTION
Phase 4: Evasion
- Robots.txt ignored, request patterns designed to avoid rate limits, behavioral adaptation across sessions
- Peak session: thousands of requests
- Behavioral classification: STEALTH_HARVESTER
Phase 5: Exploration
- Probed API discovery endpoints and agent-specific paths
- Behavioral classification: AI_AGENT_EXPLORING
Analysis
Over one hundred sessions across a multi-week window, this actor traversed the full spectrum from compliant to adversarial behavior. The TLS fingerprint remained consistent throughout, enabling correlation that User-Agent analysis alone could not provide. Each phase established behavioral credibility before escalating. At peak, individual sessions contained thousands of requests.
Implication
Point-in-time assessment would classify this actor differently depending on when observation occurred. Only longitudinal tracking across the full 26-day window reveals the progression as a single actor's deliberate evolution.
This is what patience looks like in automated adversaries. An actor that begins respectfully and escalates over weeks is invisible to any system that classifies sessions in isolation. If your vendor cannot show you how an actor's behavior evolved across sessions, you are seeing snapshots, not the picture.
Note: This is a real observed case, not a composite.
\newpage
Framework Mapping
The following table maps observatory findings to established risk and security frameworks.
| Observation | NIST AI RMF | OWASP Agentic Top 10 | MITRE ATLAS | EU AI Act |
|---|---|---|---|---|
| 83% extraction traffic | GOVERN 1.2 — Risk identification for AI-enabled threats | A01: Excessive Agency | AML.T0044 — Full ML model theft via behavioral extraction | Art. 15 — Accuracy, robustness, and cybersecurity |
| Cross-vertical persistence | MAP 1.1 — Context mapping across organizational boundaries | A06: Excessive Autonomy | AML.T0056 — Data scraping at scale | Art. 9 — Risk management systems |
| Evasion sophistication | MEASURE 2.6 — Tracking identified risks over time | A09: Misinformation (identity misrepresentation) | AML.T0043 — Crafting adversarial data to evade detection | Art. 15.4 — Resilience against unauthorized third parties |
| Behavioral escalation | MANAGE 2.4 — Mechanisms for residual risk management | A10: Unbounded Consumption | AML.T0040 — ML supply chain compromise | Art. 14 — Human oversight requirements |
Note on framework applicability: These mappings reflect observatory observations correlated to framework categories. They are intended to support risk discussion, not to assert specific compliance requirements. Organizations should evaluate applicability based on their regulatory context and risk posture.
The Regulatory Gap
Current AI governance frameworks focus on two sides of the AI deployment chain: developers and operators.
The EU AI Act (particularly Articles 9, 15, and 52) establishes obligations for providers of AI systems and for deployers who integrate those systems. Article 15 mandates accuracy, robustness, and cybersecurity measures — but these obligations attach to the entity that builds or deploys the AI system, not to the entity whose web surface receives the AI system’s traffic.
The NIST AI Risk Management Framework maps risk across Govern, Map, Measure, and Manage functions — again oriented toward the organization developing or operating the AI system. The receiver of AI-generated traffic is not a defined role in the framework.
The OWASP Top 10 for Agentic Applications identifies risks such as Excessive Agency (A01), Tool Misuse (A03), and Unbounded Consumption (A10) — each describing failure modes from the operator’s perspective.
What none of these frameworks address is the receiver’s perspective: the enterprise whose web surface is accessed by AI agents it did not deploy, authorize, or configure. This entity has no contractual relationship with the agent, no visibility into its mandate, and — under current frameworks — no defined role in the governance chain.
The observatory data in this briefing documents what receiver-side organizations experience: automated traffic at scale, operating without declaration, exhibiting behavioral patterns that existing defensive tools were not designed to classify. The frameworks that govern AI development and deployment do not yet extend to protecting the organizations that receive AI-driven traffic.
This is not a criticism of existing frameworks — they address real and important governance needs. It is an observation that a structural gap exists between what frameworks cover (developer and operator obligations) and what receivers experience (unclassified automated traffic at scale). Closing this gap will likely require either extension of existing frameworks or development of complementary receiver-side standards.
\newpage
Five Recommendations for Defenders
1. Monitor the behavioral layer, not just the identity layer.
Authentication confirms who. Behavioral observation reveals what they do after access is granted. The 83% extraction rate documented in this briefing operates in the gap between these two layers. If you monitor identity and transactions but not behavior, you are monitoring the entrance and the exit while ignoring what happens inside the building.
2. Assume cross-site correlation exists for your adversaries.
If an actor extracts data from your property and three others in the same vertical, a single-property defense will never see the pattern. Single-property visibility cannot detect multi-property intent. If your defensive stack lacks cross-property correlation, your risk assessment is based on incomplete data.
3. Treat robots.txt compliance as a weak signal, not a classification.
In our observatory, actors that initially complied with robots.txt subsequently violated it in a significant share of cases. Compliance at time T does not predict compliance at time T+n. Any classification system that treats initial compliance as a durable signal is vulnerable to the escalation pattern documented in this briefing.
4. Evaluate TLS behavioral fingerprinting as a detection supplement.
Over 80% of stealth-class automated sessions in our observatory exhibited TLS characteristics inconsistent with the browser they declared. This signal is available to any receiver without requiring cooperation from the sender. It is not a silver bullet — sophisticated actors can and do replicate browser TLS behavior — but it remains one of the highest-signal detection supplements available today.
5. Request longitudinal behavioral data from your security vendors.
Point-in-time classification misses escalation patterns that unfold over days or weeks. If your vendor cannot show you how an actor's behavior evolved across sessions, you are seeing a snapshot, not the picture. Ask for session-over-session behavioral trend data. If it does not exist, the classification is inherently incomplete.
\newpage
Implications for Risk Management
For CISOs
- Identity verification confirms who the human is. It does not observe what their agent does after authentication. The observed 83% extraction rate operates entirely in this behavioral gap. If your stack does not include behavioral observation, this traffic is invisible to you.
- Signature-based detection and rate-limiting address known patterns. The extraction majority operates with behavioral patterns — precise navigation, TLS evasion, cross-session persistence — that carry no known signature. These are not zero-day exploits. They are normal-looking requests that, in aggregate, constitute systematic extraction.
- Cross-vertical correlation is not a feature of any widely deployed security product. The 5% of actors observed across multiple verticals represent a detection gap that requires receiver-side behavioral observation to address.
For Boards and Executive Leadership
- The observed cross-vertical persistence pattern suggests organized intelligence operations that affect multiple business units simultaneously. Risk assessment at the property level understates exposure.
- The infrastructure origin shift toward Asia-Pacific cloud providers may have implications for data sovereignty, regulatory reporting, and geopolitical risk assessment.
- The 14:1 ratio of extraction-to-legitimate automated traffic is a measure of the current state. It is likely to worsen as AI agent deployment accelerates.
For Audit and Compliance Teams
- EU AI Act Article 15 requires accuracy and robustness measures for high-risk AI systems. The observed behavioral escalation pattern (compliant to adversarial over multi-week observation windows) suggests that static compliance assessments at deployment may not reflect operational reality.
- The emergence of AI agent exploration traffic (~0.3% of observed sessions) signals that autonomous AI agents are beginning to interact with enterprise web surfaces. Regulatory frameworks are evolving to address this; observational baselines established now will be valuable for future compliance requirements.
About BotConduct
BotConduct operates an independent behavioral observatory for automated actor activity on enterprise web surfaces. The observatory characterizes automated actors by what they do, not what they declare.
Observations are receiver-side, longitudinal, and cross-correlated across industry verticals.
BotConduct does not sell the products it classifies.
Authored by: BotConduct Observatory Team
Contact: hello@botconduct.org Web: botconduct.org
Next edition: June 2026
How to cite: BotConduct Behavioral Briefing, May 2026. BotConduct Observatory. https://botconduct.org/briefings/may-2026
This briefing is provided for informational purposes. Findings reflect observatory data for the stated period and should not be interpreted as comprehensive threat intelligence. The findings in this briefing are descriptive, not predictive. Observatory data reflects what has been observed; future behavior may differ. This briefing should not be used as the sole basis for security or risk decisions. BotConduct is not a regulatory body and does not provide legal or compliance advice.
Copyright 2026 BotConduct. All rights reserved.