Will IT turn the AI bot battle into a money maker? (And is that even a good idea?)
Enterprise IT has been waging a difficult battle against generative AI (genAI) bots and crawlers, and banning some crawlers has simply not worked. That’s led to both data leakage and painful out-of-pocket bandwidth costs.
Cloudflare is one of the few companies that has had some success in blocking these unwelcome visitors, although some have complained that by blocking most of the crawlers, Cloudflare is also filtering out some legitimate traffic.
Now, Cloudflare is exploring a different tactic: If you can’t beat them, charge them. Cloudflare’s plan is to use different mechanisms to enable charging the visitors.
“We’re excited to help dust off a mostly forgotten piece of the web: HTTP response code 402,” the company said in a blog post. “Each time an AI crawler requests content, they either present payment intent via request headers for successful access (HTTP response code 200), or receive a 402 Payment Required response with pricing. Cloudflare acts as the Merchant of Record for pay per crawl and also provides the underlying technical infrastructure.”
Cloudflare deserves credit for getting creative with the problem, but its initial description doesn’t go into a lot of details about the pricing particulars, other than saying it’s up to the site owner.
As tempting as this offer might be — and for a smaller business, it could prove irresistible — enterprise IT should think seriously about whether this arrangement is in a company’s long-term interest.
Setting a price
The key question is whether there’s any reasonable/realistic price that makes long-term sense.
Cloudflare did not say whether it would require a revenue share with the enterprises. But given the effort it’s putting into this venture, it seems likely there would be some kind of a split. (If it had decided not to take a cut, it seems likely Cloudfare would’ve mentioned that. That makes me think some kind of split is on the table.)
Setting that calculation aside, how much money makes it worthwhile to the company?
For starters, there is the bandwidth issue, which is what started this whole controversy. These spiders tend to spend a lot of time grabbing everything they can, creating a massive bandwidth bill for Cloudfare customers.
But there’s no practical way to determine that figure ahead of time. Will a genAI firm agree to cover the costs of added bandwidth weeks or months ahead? And as I’ve noted before, isolating the added bandwidth costs generated by any one spider is all but impossible.
Even if you could somehow resolve the bandwidth payment issue, that’s only a small part of a larger compensation issue. Shouldn’t companies also be reimbursed for the use of their content? How is that possibly calculated?
Given that a genAI agent will grab once and use many times, what dollar amount is appropriate?
Then there are the compliance and cybersecurity issues. If an agent grabs sensitive customer data (PII, payment details, purchase histories, etc.), how do you offset the financial losses if that data gets out into the wild? Fines? Lawsuits? What if your security credentials are grabbed? What if someone improperly locked down a decryption key and the agent grabs it?
Will it even work?
Another of the big challenges with genAI agents/crawlers/spiders is that they tend to be quite good at obfuscating themselves and working around barriers. If they can get around most “do not enter” text barriers, how much harder will they work to sidestep “pay here” hurdles?
The strategic question is simple. Is it in a company’s longterm interests to hand over almost all of its data to a genAI agent? And if it’s not, then the money payment situation is at best a distraction.
Assume this: Once your data gets incorporated in a large language model, it’s gone. This is a forever decision. And that’s why you don’t want to let a quick small cash grab rob you of that.
Cloudfare posed some excellent strategic questions. In its blog, it wrote that companies “might want to charge different rates for different paths or content types. How do you introduce dynamic pricing based not only upon demand, but also how many users your AI application has? How do you introduce granular licenses at internet scale, whether for training, inference, search, or something entirely new?”
The company added another critical possibility: “The true potential of pay per crawl may emerge in an agentic world. What if an agentic paywall could operate entirely programmatically? Imagine asking your favorite deep research program to help you synthesize the latest cancer research or a legal brief, or just help you find the best restaurant in Soho — and then giving that agent a budget to spend to acquire the best and most relevant content. By anchoring our first solution on HTTP response code 402, we enable a future where intelligent agents can programmatically negotiate access to digital resources.”
Interesting. But does your enterprise want to permanently support those efforts?
Also, I hate to be cynical, but why would your team trust an agent’s representations about their intended use of the data? What possible motivation would the agent have to tell the truth?Microsoft’s 19-hour Outlook outage exposes fragility in cloud infrastructure – ComputerworldRead More