Netuality

Taming the big bad websites

Archive for the ‘Datacenter’ Category

What to do when the meteor strikes

leave a comment

There’s nothing quite like a good Single Point of Failure (SPOF) during a holiday dinner.

says John Farmer on his blog, and I couldn’t agree more. Start with a meteor strike scenario for a change, just imagine a giant rock crushing your measly SPOF-ridden infrastructure in one unlucky data center. Waiting for the black swan to appear learn to keep calm and react normally using the tips from a triple post about incidents, outages and systems maintenance:

Simple problems can easily become large complicated problems after a few bad decisions made in haste. Take a breath before continuing. This is especially important with a page at 3AM or if a panicky client is in your office. Tell the client you’ll handle the problem and run through your normal procedure.

[...]

Remember the prime directive – your job is to restore service as quickly as possible. You are not there to debug interesting problems with your service.

Recommended reading!

Written by Adrian

May 11th, 2010 at 6:23 pm

Posted in Datacenter

Tagged with , , ,

Benchmarking the cloud: not simple

2 comments

Understanding the impact of using virtualized servers instead of real ones is perhaps one of the most complex issues when migrating from a traditional configuration to a cloud-based setup. Especially because virtualized servers are created equal … but only on paper.

A Rackspace-funded “report” tries to find out the performance differences between Rackspace Cloud Servers and Amazon EC2. I guess the only conclusion we can get from their so-called report is that Cloud Server disk throughput is better than EC2′s. As the “CPU test” is a kernel compile which also stresses the disk, I don’t think we can reliably get any conclusion from these.

An intrepid commenter ran a CPU-only test (Geekbench) and found out that EC2 performs slightly better than Rackspace in terms of raw processor performance. The same commenter, affiliated with Cloud Harmony,  mentions that a simple hdparm test shows that Rackspace hdd has more than twice the throughput of EC2 hdd, at least in terms of buffered reads. Last but not least, don’t forget that for better disk performance Amazon recommends EBS instead of the VM disk.

We cannot reliably make an informed cloud vendor choice just using VM benchmarks. Ideally, you should benchmark your own app on each cloud infrastructure and choose the one which gives you the best user-facing performance, because at the end of the day this is what matters most. Sadly, today this means experimenting with sometimes wildly different APIs and provisioning models.

Written by Adrian

January 18th, 2010 at 10:02 am

Posted in Datacenter

Tagged with , , , , ,

How big is your meat cloud? The golden number for servers

one comment

Just went through a recent thread on Slashdot discussing “how many admins per user computer” or how many desktops per admin to be more specific. While the client desktop subject is totally uninteresting, I found in the comment noise a few interesting tidbits about the meat cloud size in different server environments.

On the low non-automated end there were figures such as “1 admin per 70 Linux boxes or 30 Windows machines” (are Windows servers really twice as dificult to manage than Linux servers?) – confirmed by another commenter working for a Government facility. Of course, it depends on how many different hardware brands and software services you have to manage…

Another allegedly 12-year experienced sysadmin commented that the larger the organization, the bigger the ratio. Going from 50 server per sysadmin on small organizations to 250 on corporations (but his company revenue “definitions” are a bit weird). An insightful comment mentions Facebook’s Jeff Rotschild according to which Facebook has roughly 130 servers per admin or (interesting metric) 1 million or more users per engineer.

Of course in specific cases this number can go way higher. Especially when you have to deal with quasi-identical hardware and software configurations running in a very large cluster. On the extreme scale there’s the Microsoft container data center in Chicago which supposedly has a total of 30 employees supporting some 300,000 servers. That’s 10,000 servers/employee! At this point I suspect they basically only change faulty hardware and wire new capacity when needed, everything else should be fully automated.

Written by Adrian

January 5th, 2010 at 7:16 pm

Posted in Datacenter

Tagged with , ,