Our idea of relaying incoming tracking data through our CDN seems to be unrealistic. It works near-perfectly, until there’s even a minor network hiccup on either end (the CDN network or Clicky’s home network). We’re tracking so much data, when the network is unavailable even for just 10-20 seconds, data queues up insanely fast and this starts an inevitable downward spiral.
This is what happened today, a few different times. We tried many different tricks to make it work better, but nothing worked. There’s a bit of missing data from your stats most likely. We apologize for that.
The CDN has been temporarily disabled so that all outgoing and incoming traffic points to our “real” servers again as soon as possible (DNS time-to-live values were only 15 minutes because of the auto-failover that Dyn DNS gives us, so this should happen very fast).
We’re going to re-enable the CDN for our static files and tracking code within the next day or two, because that all worked perfectly. It was the incoming traffic data that was the problem, and this will now bypass the CDN entirely and go straight to headquarters. It’s the outgoing data that’s the most important anyways, so this isn’t the biggest deal in the world. Logging incoming data would have been a nice addition to the arsenal, but life goes on.
Please don’t assume we didn’t test the CDN before deployment, because of course we did. But it wasn’t until the full brunt of the entire world was sending incoming data to these servers that problems started surfacing. That scenario was untestable so the problems resulting it from it were not predictable until the system was live.
Update: The CDN is live again as of ~2AM PST on March 10. It is now only handling outgoing data. Things are working smoothly right now. If you notice anything strange, please let us know!