Just call this the week from hell. Actually, it was the month from hell. Server crashes plus configuration problems on a new server this week brought just about everything to a halt.
The trash bin in the back of our data center is littered with the remnants of servers that didn’t work as promised and hard drives corrupted by viruses and worms because a virus program also didn’t work as it should have.
Part of the problem started with a planned move from our data center in Loudoun County to a new one at Virginia Tech’s Corporate Research Park in Blacksburg. The idea was to have the server farm closer at hand. This meant buying new equipment and, in true Murphy’s law fashion, some of the equipment didn’t work like I expected new servers to perform.
Since the first of April, I’ve had three of Sun’s new Coolthreads Sun Fire servers bite the dust, two Dells running Linux suffer kernel failures and one Windows 2003 server reset itself and destroy everything from the last 10 days.
Capitol Hill Blue, which runs on multiple servers, went offline three times in the last three weeks, crippled by a bug in a new content management system and the same bug corrupted the backups without our knowledge.
We thought we were on the homestretch Friday moving the last of our servers, the one containing blogs for Fred First, Colleen Redman and others. But Fred’s blog crashed before the move and wouldn’t reboot on the new server. Finally traced the bug late Friday and finished up the move at 12:50 a.m. today.
So far (fingers crossed) everything is running fine. I’ve suffered more hardware and software problems in the last three weeks than in the last 11-and-a-half years of running and hosting web sites.
Lessons learned:
- Sun Servers ain’t what they used to be. I guess I shouldn’t be surprised. David St. Lawrence, an escapee from the corporate drudge of Sun, said it ain’t the company it used to be either.
- Abacus, a server co-location company located in San Diego and Germany, is a ripoff. I had hoped to locate a mirror site there but their tech people failed to respond in a timely manner when we needed assistance and it took them three days to fix a minor problem. That’s a shame. Abacus used to be a good company. Now they are just sham artists in it for the quick buck.
- Backups don’t work when the file corruption that brings down a server is also on the backup files.
- Linux is good for running Tivo and small web sites but it doesn’t have the power for large, full-scale operations that demand multi-threading, heavy processing needs and high traffic.
- I need some sleep.
ps..hope it’s all in the past and all that could go wrong,has.. 🙂
I agree with Dusty. Surely, you’ve paid your dues for this decade! It sure is nice when things are going smoothly, especially after rough seas. I don’t realize how much I miss the sounding board of the blog until it’s not there for me. Thanks for your blood, sweat and tears toward the good of your server residents.
Sean:
Don’t know how you measure volume but 400,000 page requests a day ain’t that much traffic. Capitol Hill Blue gets more requests than that an hour and averages 12.5 million page requests a day. Those are page requests, not hits. I know how to read a weblog and so do my advertisers. Event that is peanuts when compared with really high volume news sites like Washingtonpost.com or MSNBC, which get 12 million pages requests an hour.
In my opinion, real volume on the Internet is measured in millions, not thousands or even hundreds of thousands. I’ve read the reports that Google runs on a large-scale cluster of Linux servers but we’re walking about a specialized application with many, many servers. That’s hardly an out-of-the-box use of RedHat.
Doug