As seen today on Google Reader. A strip is a strip is a strip is a strip:

Taming the big, bad, nasty websites
As seen today on Google Reader. A strip is a strip is a strip is a strip:

Riptano is to Cassandra what Cloudera is to Hadoop or Percona to MySQL. Mmmkey?
A great, insightful post from Pingdom (as usual) allows us to take a peek behind the doors at largest web sites in the world, just by reading selected stuff from their respective developer blogs.
Yahoo decreased data-center cooling costs compared to power costs from 50 cents/dollar to only one cent/dollar. This is obtained on their most recent Yahoo Computing Coop data-center built in Lockport, New York.
The data center operates with no chillers, and will require water for only a handful of days each year. Yahoo projects that the new facility will operate at a Power Usage Effectiveness (PUE) of 1.1, placing it among the most efficient in the industry. [...]
If it looks like a chicken coop, it’s because some of the design principles were adapted from …. well, chicken coops. “Tyson Foods has done research involving facilities with the heat source in the center of the facility, looking at how to evacuate the hot air,” said Noteboom. “We applied a lot of similar thought to our data center.”
The Lockport site is ideal for fresh air cooling, with a climate that allows Yahoo to operate for nearly the entire year without using air conditioning for its servers.
High Scalability blog dissects a paper describing Dapper, Google’s tracing system used to instrument all the components of a software system in order to understand its behavior. Immensely interesting:
As you might expect Google has produced and elegant and well thought out tracing system. In many ways it is similar to other tracing systems, but it has that unique Google twist. A tree structure, probabilistically unique keys, sampling, emphasising common infrastructure insertion points, technically minded data exploration tools, a global system perspective, MapReduce integration, sensitivity to index size, enforcement of system wide invariants, an open API—all seem very Googlish.
On my favorite blog
HStack.org Andrei wrote a great post about real-life performance testing of HBase:
The numbers are the tip of the iceberg; things become really interesting once we start looking under the hood, and interpreting the results.
When investigating performance issues you have to assume that “everybody lies”. It is crucial that you don’t stop at a simple capacity or latency result; you need to investigate every layer: the performance tool, your code, their code, third-party libraries, the OS and the hardware. Here’s how we went about it:
The first potential liar is your test, then your test tool – they could both have bugs so you need to double-check.
But the most interesting distributed system of the week is World of Warcraft. Ars Technica describes a tour of the Blizzard campus and here’s a peek at the best NOC screen ever:

For the hooorde!
From the GigaOm analysis:
Fortunately, for them, it seems unlikely that Google will take to the courts to enforce its new intellectual property. A big reason is that “map” and “reduce” functions have been part of parallel programming for decades, and vendors with deep pockets certainly could make arguments that Google didn’t invent MapReduce at all.
Should Hadoop come under fire, any defendants (or interveners like Yahoo and/or IBM) could have strong technical arguments over whether the open-source Hadoop even is an infringement. Then there is the question of money: Google has been making plenty of it without the patent, so why risk the legal and monetary consequences of losing any hypothetical lawsuit? Plus, Google supports Hadoop, which lets university students learn webscale programming (so they can become future Googlers) without getting access to Google’s proprietary MapReduce language.
[...]
A Google spokeswoman emailed this in response to our questions about why Google sought the patent, and whether or not Google would seek to enforce its patent rights, attributing it to Michelle Lee, Deputy General Counsel:
“Like other responsible, innovative companies, Google files patent applications on a variety of technologies it develops. While we do not comment about the use of this or any part of our portfolio, we feel that our behavior to date has been inline with our corporate values and priorities.”
From Ars Technica:
Hadoop isn’t the only open source project that uses MapReduce technology. As some readers may know, I’ve recently been experimenting with CouchDB, an open source database system that allows developers to perform queries with map and reduce functions. Another place where I’ve seen MapReduce is Nokia’s QtConcurrent framework, an extremely elegant parallel programming library for Qt desktop applications.
It’s unclear what Google’s patent will mean for all of these MapReduce adopters. Fortunately, Google does not have a history of aggressive patent enforcement. It’s certainly possible that the company obtained the patent for “defensive” purposes. Like virtually all major software companies, Google is frequently the target of patent lawsuits. Many companies in technical fields attempt to collect as many broad patents as they can so that they will have ammunition with which to retaliate when they are faced with patent infringement lawsuits.
Google’s MapReduce patent raises some troubling questions for software like Hadoop, but it looks unlikely that Google will assert the patent in the near future; Google itself uses Hadoop for its Code University program.
Even if Google takes the unlikely course of action and does decide to target Hadoop users with patent litigation, the company would face significant resistance from the open source project’s deep-pocketed backers—including IBM, which holds the industry’s largest patent arsenal.
Another dimension of this issue is the patent’s validity. On one hand, it’s unclear if taking age-old principles of functional software development and applying them to a cluster constitutes a patentable innovation.
Still nothing from the big analysts, Gartner and the gang…
Yes, language propaganda again. Ain’t it fun ?
Here comes a nice quote from the latest Steve Yegge post (read it entirely if you have the time, it’s both fun and educational – at least for me). So, there:
I made the famously, horribly, career-shatteringly bad mistake of trying to use Ruby at Google, for this project. And I became, very quickly, I mean almost overnight, the Most Hated Person At Google. And, uh, and I’d have arguments with people about it, and they’d be like Nooooooo, WHAT IF… And ultimately, you know, ultimately they actually convinced me that they were right, in the sense that there actually were a few things. There were some taxes that I was imposing on the systems people, where they were gonna have to have some maintenance issues that they wouldn’t have. [...] But, you know, Google’s all about getting stuff done.
[...]
Is it allowed at Google to use Lisp and other languages?
No. No, it’s not OK. At Google you can use C++, Java, Python, JavaScript… I actually found a legal loophole and used server-side JavaScript for a project.
Mmmmm … key ?
According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%. C# is selling more books from year to year and will probably level up with Java in 2008. Javascript is on the rise (due to AJAX, for sure) and PHP is on a surprising decrease path (although the job statistics indicate quite the contrary).

In 2007, the number of sold Ruby books was larger than the number of Python books. In their article they qualify Ruby as being a “mid-major programming language” and Python as “mid-minor programming language”. However, after the announcement of Google App Engine the number of Python downloads from ActiveState has tripled in May. This should become visible in the book sales statistics, pretty soon.