Netuality

Taming the big, bad, nasty websites

Archive for the ‘Yahoo’ tag

Linkdump: Coop, HBase performance and a bit of Warcraft

one comment

Riptano is to Cassandra what Cloudera is to Hadoop or Percona to MySQL. Mmmkey?

A great, insightful post from Pingdom (as usual) allows us to take a peek behind the doors at largest web sites in the world, just by reading selected stuff from their respective developer blogs.

Yahoo decreased data-center cooling costs compared to power costs from 50 cents/dollar to only one cent/dollar. This is obtained on their most recent Yahoo Computing Coop data-center built in Lockport, New York.

The data center operates with no chillers, and will require water for only a handful of days each year. Yahoo projects that the new facility will operate at a Power Usage Effectiveness (PUE) of 1.1, placing it among the most efficient in the industry. [...]

If it looks like a chicken coop, it’s because some of the design principles were adapted from …. well, chicken coops. “Tyson Foods has done research involving facilities with the heat source in the center of the facility, looking at how to evacuate the hot air,” said Noteboom. “We applied a lot of similar thought to our data center.”

The Lockport site is ideal for fresh air cooling, with a climate that allows Yahoo to operate for nearly the entire year without using air conditioning for its servers.

High Scalability blog dissects a paper describing Dapper, Google’s tracing system used to instrument all the components of a software system in order to understand its behavior. Immensely interesting:

As you might expect Google has produced and elegant and well thought out tracing system. In many ways it is similar to other tracing systems, but it has that unique Google twist. A tree structure, probabilistically unique keys, sampling, emphasising common infrastructure insertion points, technically minded data exploration tools, a global system perspective, MapReduce integration, sensitivity to index size, enforcement of system wide invariants, an open API—all seem very Googlish.

On my favorite blog :) HStack.org Andrei wrote a great post about real-life performance testing of HBase:

The numbers are the tip of the iceberg; things become really interesting once we start looking under the hood, and interpreting the results.

When investigating performance issues you have to assume that “everybody lies”. It is crucial that you don’t stop at a simple capacity or latency result; you need to investigate every layer: the performance tool, your code, their code, third-party libraries, the OS and the hardware. Here’s how we went about it:

The first potential liar is your test, then your test tool – they could both have bugs so you need to double-check.

But the most interesting distributed system of the week is World of Warcraft. Ars Technica describes a tour of the Blizzard campus and here’s a peek at the best NOC screen ever:

For the hooorde!

Written by Adrian

April 27th, 2010 at 10:35 pm

Posted in Linkdump

Tagged with , , , ,

January 13 linkdump: KDD, EC2 congested, Coherence, Zimbra

leave a comment

Call to arms for the annual ACM KDD Conference. KDD stands for Knowledge Discovery and Data Mining, so if you’re looking for some hardcore use cases and new algorithms to apply, this is definitely the place to be (Washington, July 25-28):

KDD-2010 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition.

There’s rumor on the street that Amazon EC2 is over-subscribed. From the trenches it appears that their scalability is … well, duh … not infinite and elasticity is a tiny bit rigid:

Anyone that uses virtualized computing, whether it is in the cloud or in their own private setup (VMWare for example) knows you take a performance hit. These performance hits can be considerable, but on the whole, are tolerable and can be built into an architecture from the start.

The problems that we are starting to see from Amazon, are more than just the overhead of a virtualized environment. They are deep rooted scalability problems at their end that need to be addressed sooner rather than later.

My Adobe colleague Ricky Ho has posted some notes on Oracle’s Coherence (formerly Tangosol), a distributed Java cache rich in features. A great read especially if you want a technical intro to the product (code snippets and everything).

The acquisition of the day is Zimbra being bought by VMWare. Yahoo is selling Zimbra a loss, it seems. Analysts wonder what exactly is VMWare planning to do, well they’re probably going up the stack and working on providing their own cloud ecosystem and related services. “VMWare Applications”, soon?

Under the terms of the agreement, Yahoo can continue to use Zimbra technology in its communications services. VMWare’s interest in Zimbra is a bit of a mystery since VMWare focuses on selling virtualization technology; in the release, VMWare offers somewhat of an explanation saying that the purchase furthers its “mission of taking complexity out of the datacenter, desktop, application development and core IT services”

Written by Adrian

January 13th, 2010 at 8:23 pm

Posted in Linkdump

Tagged with , , , , , ,