Like we said, nothing exciting for a while. We’ve been working behind the scenes massively improving our infrastructure and updating some problem servers for greater reliability, and some old servers so they’re much, much faster. We’re not quite done, but here’s the story so far:
In each of our tracking servers, we doubled the RAM and added much faster drives to store the incoming traffic data. Initially there were a few problems but they were resolved.
As an update to that story, the problems we mentioned were related to the file system we were using, Ext3. The upgrades we initially made did help with performance, but load on the servers was still much higher than we thought it would be. After many hours of research, we discovered that this file system, which is the default for almost any Linux installation, isn’t well suited to storing, updating, and deleting thousands of tiny files 24/7. It turns out the file system of our dreams is called ReiserFS. Article after article said check it, this file system is amazing for dealing with thousands of tiny files – use it if that’s what you’re doing. So we did.
We reformatted the drives that store our incoming traffic data to ReiserFS and the results were stunning. Load plummeted to levels we haven’t seen for well over a year. So this was actually the biggest bottleneck of our existing setup, but that isn’t to say our RAM and hard drive upgrades were fruitless. Before we discovered ReiserFS, the hardware upgrades still made a significant difference – just not as big as we thought they would, which is why we kept researching. Once we added ReiserFS into the equation, the results were what we were hoping for.
We also made a couple of very major efficiency improvements to the code that logs incoming traffic. The tracking servers are currently in a state of bliss and thanking us kindly for helping them work more efficiently.
Software to Hardware RAID migration
In the last 6 or so servers we built, we were using Linux’s built in software RAID to mirror a pair of drives. Software RAID has served me well in the past but it doesn’t seem to be quite as reliable for extremely heavy read/write drives. About once a month, we had a RAID failure which would almost always lead to one of our biggest database tables on that server having corruption. So we’d have to take that server offline and repair the 1 or more tables with corruption, which is a slow process to say the least.
A Redundant Array of Independent Disks is supposed to prevent this type of thing. A drive popping offline should be no problem – you either replace it or re-add it to the array and it rebuilds and nothing noticeable happens from the end user’s perspective. But this wasn’t the case with our Linux software RAID servers.
The main reason we went with software RAID was for cost savings. Not that hardware RAID is that expensive, but it adds about 15% to the cost of each server we build. So, no more software RAID. All servers that had this setup have been migrated to hardware RAID. All of our older servers use hardware RAID and they’ve never had a single problem.
Upgrades to old servers
As I just mentioned, none of our older servers have ever had any problems. On the other hand, they’re all a bit slow, as they’re not using drives meant for high performance. The database servers affected most by this were 2, 3, 5, 6, and 7.
We’ve migrated 2, 3, and 5 to much faster drives. If any of your sites are on these servers, you should notice very significant speed improvements when viewing your stats. We haven’t yet migrated 6 or 7. We currently only have 1 spare server ready to host the data from another one. db7 seems to be slightly slower than db6 so that is the one that will be getting the upgrade first, this coming weekend most likely.
Next week, I will be at our data center again building some new servers, hopefully for the last time for a while! At this point, db6 will be moved to new hardware. db12 will also be moving, as it’s also on slower drives. db12 is much newer than these others ones so it has less data, which means the speed is still acceptable – but that’s only for now. Over time its performance will slowly degrade as well, so we’re just going to move it now.
Once that is completed… we’ll be done!!!
Well that was fun!
Actually, not really. This is the type of work that is opposite of fun. I’ve built so many new servers and installed Debian Linux so many times the last month, it’s probably some kind of world record. But, that’s ok – all of this needed to be done, Clicky is much better because of it, and we hope you have noticed the improvements.
Now, we can get back to working on the software, which is what we really live for. Look for some great new features soon!