Netuality

Taming the big, bad, nasty websites

Archive for the ‘Datacenter’ tag

What to do when the meteor strikes

leave a comment

There’s nothing quite like a good Single Point of Failure (SPOF) during a holiday dinner.

says John Farmer on his blog, and I couldn’t agree more. Start with a meteor strike scenario for a change, just imagine a giant rock crushing your measly SPOF-ridden infrastructure in one unlucky data center. Waiting for the black swan to appear learn to keep calm and react normally using the tips from a triple post about incidents, outages and systems maintenance:

Simple problems can easily become large complicated problems after a few bad decisions made in haste. Take a breath before continuing. This is especially important with a page at 3AM or if a panicky client is in your office. Tell the client you’ll handle the problem and run through your normal procedure.

[...]

Remember the prime directive – your job is to restore service as quickly as possible. You are not there to debug interesting problems with your service.

Recommended reading!

Written by Adrian

May 11th, 2010 at 6:23 pm

Posted in Datacenter

Tagged with , , ,

Linkdump: using Hbase, CAP visuals, Farmville and more

leave a comment

Two great posts from my colleagues about why Adobe is using HBase: part 1 and part 2. As I’ve experienced all these firsthand, I guarantee this is solid, relevant information. Both articles are highly recommended reads.

Speaking about HBase, there’s rumor on the street that they are taking HBASE-1295 (multi data center replication) very seriously and we’ll be seeing a new feature announcement relatively soon. Waiting forward!

An older but still interesting presentation on how RIPE NCC is using Hadoop and HBase to store and search through IP addresses for Europe, Middle East and Russia can be found here:

It looks like Farmvile is still in the MySQL+memcache phase, according to the High Scalability blog. And they use PHP. When will they start looking into NoSQL? Hopefully soon enough to have a good crop.

Nathan’s visual guide to NoSQL systems while perhaps not entirely correct is a nice tentative to put all these projects on the same map. I would love to see a “patched” version of the visual guide taking into account all the information left in the comments…

Oh and Twitter is using Protocol Buffers to store information on Hadoop. And they’re going to opensource their implementation.

Written by Adrian

March 17th, 2010 at 1:20 pm

How big is your meat cloud? The golden number for servers

one comment

Just went through a recent thread on Slashdot discussing “how many admins per user computer” or how many desktops per admin to be more specific. While the client desktop subject is totally uninteresting, I found in the comment noise a few interesting tidbits about the meat cloud size in different server environments.

On the low non-automated end there were figures such as “1 admin per 70 Linux boxes or 30 Windows machines” (are Windows servers really twice as dificult to manage than Linux servers?) – confirmed by another commenter working for a Government facility. Of course, it depends on how many different hardware brands and software services you have to manage…

Another allegedly 12-year experienced sysadmin commented that the larger the organization, the bigger the ratio. Going from 50 server per sysadmin on small organizations to 250 on corporations (but his company revenue “definitions” are a bit weird). An insightful comment mentions Facebook’s Jeff Rotschild according to which Facebook has roughly 130 servers per admin or (interesting metric) 1 million or more users per engineer.

Of course in specific cases this number can go way higher. Especially when you have to deal with quasi-identical hardware and software configurations running in a very large cluster. On the extreme scale there’s the Microsoft container data center in Chicago which supposedly has a total of 30 employees supporting some 300,000 servers. That’s 10,000 servers/employee! At this point I suspect they basically only change faulty hardware and wire new capacity when needed, everything else should be fully automated.

Written by Adrian

January 5th, 2010 at 7:16 pm

Posted in Datacenter

Tagged with , ,