Netuality

Taming the big, bad, nasty websites

Archive for the ‘Google’ tag

Linkdump: Coop, HBase performance and a bit of Warcraft

one comment

Riptano is to Cassandra what Cloudera is to Hadoop or Percona to MySQL. Mmmkey?

A great, insightful post from Pingdom (as usual) allows us to take a peek behind the doors at largest web sites in the world, just by reading selected stuff from their respective developer blogs.

Yahoo decreased data-center cooling costs compared to power costs from 50 cents/dollar to only one cent/dollar. This is obtained on their most recent Yahoo Computing Coop data-center built in Lockport, New York.

The data center operates with no chillers, and will require water for only a handful of days each year. Yahoo projects that the new facility will operate at a Power Usage Effectiveness (PUE) of 1.1, placing it among the most efficient in the industry. [...]

If it looks like a chicken coop, it’s because some of the design principles were adapted from …. well, chicken coops. “Tyson Foods has done research involving facilities with the heat source in the center of the facility, looking at how to evacuate the hot air,” said Noteboom. “We applied a lot of similar thought to our data center.”

The Lockport site is ideal for fresh air cooling, with a climate that allows Yahoo to operate for nearly the entire year without using air conditioning for its servers.

High Scalability blog dissects a paper describing Dapper, Google’s tracing system used to instrument all the components of a software system in order to understand its behavior. Immensely interesting:

As you might expect Google has produced and elegant and well thought out tracing system. In many ways it is similar to other tracing systems, but it has that unique Google twist. A tree structure, probabilistically unique keys, sampling, emphasising common infrastructure insertion points, technically minded data exploration tools, a global system perspective, MapReduce integration, sensitivity to index size, enforcement of system wide invariants, an open API—all seem very Googlish.

On my favorite blog :) HStack.org Andrei wrote a great post about real-life performance testing of HBase:

The numbers are the tip of the iceberg; things become really interesting once we start looking under the hood, and interpreting the results.

When investigating performance issues you have to assume that “everybody lies”. It is crucial that you don’t stop at a simple capacity or latency result; you need to investigate every layer: the performance tool, your code, their code, third-party libraries, the OS and the hardware. Here’s how we went about it:

The first potential liar is your test, then your test tool – they could both have bugs so you need to double-check.

But the most interesting distributed system of the week is World of Warcraft. Ars Technica describes a tour of the Blizzard campus and here’s a peek at the best NOC screen ever:

For the hooorde!

Written by Adrian

April 27th, 2010 at 10:35 pm

Posted in Linkdump

Tagged with , , , ,

Google’s Map/Reduce patent and impact on Hadoop: none expected

leave a comment

From the GigaOm analysis:

Fortunately, for them, it seems unlikely that Google will take to the courts to enforce its new intellectual property. A big reason is that “map” and “reduce” functions have been part of parallel programming for decades, and vendors with deep pockets certainly could make arguments that Google didn’t invent MapReduce at all.

Should Hadoop come under fire, any defendants (or interveners like Yahoo and/or IBM) could have strong technical arguments over whether the open-source Hadoop even is an infringement. Then there is the question of money: Google has been making plenty of it without the patent, so why risk the legal and monetary consequences of losing any hypothetical lawsuit? Plus, Google supports Hadoop, which lets university students learn webscale programming (so they can become future Googlers) without getting access to Google’s proprietary MapReduce language.

[...]

A Google spokeswoman emailed this in response to our questions about why Google sought the patent, and whether or not Google would seek to enforce its patent rights, attributing it to Michelle Lee, Deputy General Counsel:

“Like other responsible, innovative companies, Google files patent applications on a variety of technologies it develops. While we do not comment about the use of this or any part of our portfolio, we feel that our behavior to date has been inline with our corporate values and priorities.”

From Ars Technica:

Hadoop isn’t the only open source project that uses MapReduce technology. As some readers may know, I’ve recently been experimenting with CouchDB, an open source database system that allows developers to perform queries with map and reduce functions. Another place where I’ve seen MapReduce is Nokia’s QtConcurrent framework, an extremely elegant parallel programming library for Qt desktop applications.

It’s unclear what Google’s patent will mean for all of these MapReduce adopters. Fortunately, Google does not have a history of aggressive patent enforcement. It’s certainly possible that the company obtained the patent for “defensive” purposes. Like virtually all major software companies, Google is frequently the target of patent lawsuits. Many companies in technical fields attempt to collect as many broad patents as they can so that they will have ammunition with which to retaliate when they are faced with patent infringement lawsuits.

Google’s MapReduce patent raises some troubling questions for software like Hadoop, but it looks unlikely that Google will assert the patent in the near future; Google itself uses Hadoop for its Code University program.

Even if Google takes the unlikely course of action and does decide to target Hadoop users with patent litigation, the company would face significant resistance from the open source project’s deep-pocketed backers—including IBM, which holds the industry’s largest patent arsenal.

Another dimension of this issue is the patent’s validity. On one hand, it’s unclear if taking age-old principles of functional software development and applying them to a cluster constitutes a patentable innovation.

Still nothing from the big analysts, Gartner and the gang…

Written by Adrian

January 22nd, 2010 at 7:39 pm

Posted in Articles

Tagged with , , , ,

Google: sorry, but Lisp/Ruby/Erlang not on the menu

7 comments

Yes, language propaganda again. Ain’t it fun ?

Here comes a nice quote from the latest Steve Yegge post (read it entirely if you have the time, it’s both fun and educational – at least for me). So, there:

I made the famously, horribly, career-shatteringly bad mistake of trying to use Ruby at Google, for this project. And I became, very quickly, I mean almost overnight, the Most Hated Person At Google. And, uh, and I’d have arguments with people about it, and they’d be like Nooooooo, WHAT IF… And ultimately, you know, ultimately they actually convinced me that they were right, in the sense that there actually were a few things. There were some taxes that I was imposing on the systems people, where they were gonna have to have some maintenance issues that they wouldn’t have. [...] But, you know, Google’s all about getting stuff done.

[...]

Is it allowed at Google to use Lisp and other languages?

No. No, it’s not OK. At Google you can use C++, Java, Python, JavaScript… I actually found a legal loophole and used server-side JavaScript for a project.

Mmmmm … key ?

Written by Adrian

May 29th, 2008 at 12:35 am

Posted in Tools

Tagged with , , , ,

Java going down, Python way up, and more …

8 comments

According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%. C# is selling more books from year to year and will probably level up with Java in 2008. Javascript is on the rise (due to AJAX, for sure) and PHP is on a surprising decrease path (although the job statistics indicate quite the contrary).

According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%

In 2007, the number of sold Ruby books was larger than the number of Python books. In their article they qualify Ruby as being a “mid-major programming language” and Python as “mid-minor programming language”. However, after the announcement of Google App Engine the number of Python downloads from ActiveState has tripled in May. This should become visible in the book sales statistics, pretty soon.

Written by Adrian

May 24th, 2008 at 5:36 pm

Posted in Tools

Tagged with , , , , ,

memcached vs tugela vs memcachedb

2 comments

This presentation was planned for an older Wurbe event, but as this never quite happened in the last 4 months I am publishing it now, before it becomes totally obsolete.

My original contribution here is a comparison between the original memcached server from Danga and the tugela fork from the MediaWiki programmers. I’ve also tried memcachedb but the pre 1.0 version (from Google Code) in November 2007 was quite unstable and unpredictible.

In a nutshell, these memcache versions are using BerkeleyDB instead of memory slab allocator. There are 2 direct consequences:

  • when the memory is large enough for the whole cache, database-backed servers will be slower (my tests shown 10-15% which might be tolerable – or not – for your app)
  • when you’ve got lots of data to cache and your server’s memory is low, relying on bdb is significantly better than letting the swap mechanism to do its job (from my benchmarks, the difference can go up to 10 times faster especially under very high concurrency conditions)

Tugela will prove especially useful when running it on virtualized servers with very low memory.

My tests were performed with the “Tummy” Python client and Stackless for the multithreaded version. In one of the following weeks I’ll update the benchmarks for memcachedb 1.0.x – and I promise never ever to wait 4 months for a presentation, again …

Written by Adrian

March 17th, 2008 at 12:38 am