Cloudflare and the Billion Requests

The Email

March 2022 Analytics Snapshot

We’ve aggregated data from 26 of your Cloudflare sites during the month of March. Cloudflare served 788.57 GB of data, and mitigated 2.04k firewall events.

I was like “wow, that’s a lot more data than usu… wait, over a ┬┐BILLION REQUESTS FROM JAPAN!?!?!?

For comparison, February 2022’s stats had 183.7 GB of data and 214 million requests from Japan. Ok, that’s still a load of requests, but 4x as much is weird.

Also, A. BILLION. requests.

What’s changed?

I know what domain’s getting the traffic, but as for what’s changed about it - I’m honestly not sure. I surely haven’t done anything! In pondering the issue and writing this, I went back looking for what data I could find.

I keep all my emails because I’m like that, so I went back and grabbed a few stats emails. The domain was added to my Cloudflare account on 2020-11-28, and took a week or two for me to get running again to do the redirection thing… here’s the stats over time for my account (I don’t get domain-level stats, I can’t afford to pay for them!).

Most of the traffic typically has come to/from this blog or, a photos-laden travel blog.

MonthTotal DataRequests (JP)Requests (USA)Requests (China)
July 2020141.6 GBNot listed333,58816,127
August 2020138.37 GBNot listed207,701Not listed
December 202014 GBNot listed187,179Not listed
December 202122.85 GBNot listed1,222,5513,531,231
January 202264.49 GB56,851,4261,640,193986,683
February 2022183.7 GB214,536,9041,015,3101,033,670
March 2022788.57 GB1,018,691,3001,049,8011,137,785

In July/August, the top source of data was Australia, because… reasons?

Of course, because I didn’t think to look while the data was available in the UI, the good bit’s cut off!

The view from Cloudflare

The domain

Oh, the domain? It’s Melbourne IT was one of the first big registrars/hosting companies in Australia and they had a need for a local mirror, so they also made it publicly available.

I inherited the domain while at my previous job, when they were going to terminate the hosting and just let the domain lapse because ~8TB of fast disk is a lot to keep running.

I didn’t have the space/bandwidth to support a full mirror, and there was loads of traffic still going there, so I set up a little Flask app that returns HTTP 301 (Moved Permanently) responses to other mirrors. That should work for any sensible client, but … I guess it’s not for some! There’s always been a lot of requests, most of which I rate-limited by dropping (or trying to send other HTTP status codes) but they just. keep. coming. back.

Splunking the data

Of course I want to use Splunk to analyse the data, so I spun up some Python scripting to pull it. It’s in yaleman/cloudflare-stats if you’re interested.

tl;dr query all the zones, then query the GraphQL endpoint for analytics - do it every ~24 hours because that’s all I get, if I want hourly stats.

It was a pretty chill start to the year, and then… March got spicy!

Daily stats, Bytes and Requests

Thankfully, most of the requests are cached on Cloudflare’s network! Only about 10-100MB a day is actually served from my web server via the Cloudflare Tunnel, not exposing my server to the internet directly!

Here’s a sample, normally the cache percentages are lower because people ask for a lot of random file paths…

DateRequestsBytes (total)Cached BytesBytes per Request (avg)Cached %

A peak of nearly 90 million requests on each of the 6th and 9th are pretty spectacular.

All in all, a bit of an achievement, I guess?

I’d really love to know what’s changed in Japan - did someone shut down some mirrors in early March?

#cloudflare #python #linode #wow #flask