<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Netuality</title>
	<atom:link href="http://www.netuality.ro/feed" rel="self" type="application/rss+xml" />
	<link>http://www.netuality.ro</link>
	<description>Taming the big, bad, nasty websites</description>
	<lastBuildDate>Sun, 13 Jun 2010 07:44:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Comic strips and contextual advertising</title>
		<link>http://www.netuality.ro/comic-strips-and-contextual-advertising/uncategorized/20100613</link>
		<comments>http://www.netuality.ro/comic-strips-and-contextual-advertising/uncategorized/20100613#comments</comments>
		<pubDate>Sun, 13 Jun 2010 07:44:22 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=229</guid>
		<description><![CDATA[As seen today on Google Reader. A strip is a strip is a strip is a strip:

]]></description>
			<content:encoded><![CDATA[<p>As seen today on Google Reader. A strip is a strip is a strip is a strip:</p>
<p style="text-align: center;"><img class="size-full wp-image-230 aligncenter" title="dilbert_strips" src="http://www.netuality.ro/wp-content/uploads/2010/06/dilbert_strips.jpg" alt="" width="500" height="337" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/comic-strips-and-contextual-advertising/uncategorized/20100613/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What to do when the meteor strikes</title>
		<link>http://www.netuality.ro/what-to-do-when-the-meteor-strikes/datacenter/20100511</link>
		<comments>http://www.netuality.ro/what-to-do-when-the-meteor-strikes/datacenter/20100511#comments</comments>
		<pubDate>Tue, 11 May 2010 15:23:26 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Datacenter]]></category>
		<category><![CDATA[blogs]]></category>
		<category><![CDATA[incident]]></category>
		<category><![CDATA[outage]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=223</guid>
		<description><![CDATA[There&#8217;s nothing quite like a good Single Point of Failure (SPOF) during a  holiday dinner.
says John Farmer on his blog, and I couldn&#8217;t agree more. Start with a meteor strike scenario for a change, just imagine a giant rock crushing your measly SPOF-ridden infrastructure in one unlucky data center. Waiting for the black swan [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>There&#8217;s nothing quite like a good Single Point of Failure (SPOF) during a  holiday dinner.</p></blockquote>
<p>says <a href="http://farmhead.blogspot.com/" target="_blank">John Farmer on his blog</a>, and I couldn&#8217;t agree more. <a href="http://farmhead.blogspot.com/2010/05/planning-for-saas-infrastructure.html" target="_blank">Start with a meteor strike scenario</a> for a change, just imagine a giant rock crushing your measly SPOF-ridden infrastructure in one unlucky data center. Waiting for the black swan to appear learn to keep calm and react normally using the tips from a triple post about <a href="http://farmhead.blogspot.com/2010/04/tips-for-handling-service-incidents.html" target="_blank">incidents</a>, <a href="http://farmhead.blogspot.com/2010/04/tips-for-handling-service-outages.html" target="_blank">outages </a>and <a href="http://farmhead.blogspot.com/2010/04/tips-and-tricks-for-system-maintenance.html" target="_blank">systems maintenance</a>:</p>
<blockquote><p>Simple problems can easily become large complicated problems after a few  bad decisions made in haste. Take a breath before continuing. This is  especially important with a page at 3AM or if a panicky client is in  your office. Tell the client you’ll handle the problem and run through  your normal procedure.</p>
<p>[...]</p>
<p>Remember the prime directive – your job is to restore service as quickly  as possible. You are not there to debug interesting problems with your  service.</p></blockquote>
<p>Recommended reading!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/what-to-do-when-the-meteor-strikes/datacenter/20100511/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linkdump: leaner meaner MySQL, gulping from the data buffet and lessons learned at Reddit</title>
		<link>http://www.netuality.ro/linkdump-leaner-meaner-mysql-gulping-from-the-data-buffet-and-lessons-learned-at-reddit/linkdump/20100510</link>
		<comments>http://www.netuality.ro/linkdump-leaner-meaner-mysql-gulping-from-the-data-buffet-and-lessons-learned-at-reddit/linkdump/20100510#comments</comments>
		<pubDate>Mon, 10 May 2010 17:58:12 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=216</guid>
		<description><![CDATA[The Percona guys are pleading for a MySQL strongly optimized for a single type of storage engine:
We could save a lot of CPU cycles by having storage format same as  processing format.  We could tune Optimizer to handle Innodb specifics  well.  We could get rid of SQL level table locks and [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.mysqlperformanceblog.com/2010/05/08/the-doom-of-multiple-storage-engines/" target="_blank">Percona guys are pleading for</a> a MySQL strongly optimized for a single type of storage engine:</p>
<blockquote><p>We could save a lot of CPU cycles by having storage format same as  processing format.  We could tune Optimizer to handle Innodb specifics  well.  We could get rid of SQL level table locks and using Innodb  internal data dictionary instead of Innodb files.  We would use Innodb  transactional log for replication (which could be extended a bit for  this purpose).   Finally backup can be done in truly hot way without  nasty “FLUSH TABLE WITH READLOCK” and hoping nobody is touching “mysql”  database any more.   Single Storage Engine server would be also a lot  easier to test and operate.</p>
<p>This also would not mean one has to give up flexibility completely,  for example one can imagine having Innodb tables which do not log the  changes, hence being faster for update operations.</p></blockquote>
<p>Looks like <a href="http://flowingdata.com/2010/04/28/twitter-data-buffet-is-back-in-business/" target="_blank">Twitter data buffet is back in business</a>. Some of the data is free. Enjoy it with moderation: too much data can make you slow.</p>
<p><a href="http://www.reddit.com/" target="_blank">Reddit</a>&#8217;s Steve Huffman gives a talk at Web Apps Miami 2010. Self-healing, separation of services, be stateless and cache like crazy, redundancy and yes, a little bit of Hadoop (<em>Amazon&#8217;s Hadoop</em> is <a href="http://aws.amazon.com/elasticmapreduce/" target="_blank">Elastic Map Reduce</a>). Read the full transcript <a href="http://carsonified.com/blog/dev/steve-huffman-on-lessons-learned-at-reddit/" target="_blank">on Carsonified</a>:</p>
<p style="text-align: center;"><p><a href="http://www.netuality.ro/linkdump-leaner-meaner-mysql-gulping-from-the-data-buffet-and-lessons-learned-at-reddit/linkdump/20100510"><em>Click here to view the embedded video.</em></a></p></p>
<blockquote><p>We’ve actually been using Hadoop, Amazon’s Hadoop implementation to  compute awards. If we need to do a complicated query like that, we store  the data, we dump our database, or at the right time we store it in a  way that will make those joins possible down the road. That being said;  we’ve tried to avoid doing joins as much as possible, and when the data  comes in we store it in the way we’re going to need it. That’s worked  much better than trying to do it at run time.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/linkdump-leaner-meaner-mysql-gulping-from-the-data-buffet-and-lessons-learned-at-reddit/linkdump/20100510/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>And now for something a little different: graphviz candy</title>
		<link>http://www.netuality.ro/and-now-for-something-a-little-different-graphviz-candy/presentations/20100506</link>
		<comments>http://www.netuality.ro/and-now-for-something-a-little-different-graphviz-candy/presentations/20100506#comments</comments>
		<pubDate>Thu, 06 May 2010 13:37:15 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Presentations]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=212</guid>
		<description><![CDATA[No self-respecting geek can resist either of these incredible  temptations:

Correct something wrong on &#8220;the internets&#8221;
Produce a little bit of graphviz candy  (what a fantastic tool)



Imagine my total happiness discovering that I can exercise both of  them simultaneously. I can only thank Digg&#8217;s John Quinn for creating the  opportunity in his otherwise [...]]]></description>
			<content:encoded><![CDATA[<p>No self-respecting geek can resist either of these incredible  temptations:</p>
<ul>
<li>Correct something wrong on &#8220;the internets&#8221;</li>
<li>Produce a little bit of <a href="http://www.graphviz.org/" target="_blank">graphviz</a> candy  (what a fantastic tool)</li>
</ul>
<p><span id="more-212"></span></p>
<p><img title="More..." src="http://hstack.org/wp-includes/js/tinymce/plugins/wordpress/img/trans.gif" alt="" /></p>
<p>Imagine my total happiness discovering that I can exercise both of  them simultaneously. I can only thank Digg&#8217;s John Quinn for creating the  opportunity in his <a href="http://bit.ly/b3zzjo" target="_blank">otherwise very  interesting presentation</a>, by including this directed graph on one of  the slides:</p>
<p style="text-align: center;"><img class="size-full wp-image-213 aligncenter" title="original" src="http://www.netuality.ro/wp-content/uploads/2010/05/original.jpg" alt="" width="600" height="388" /></p>
<p>You would get the impression that besides Cassandra the other &#8220;NoSQL&#8221;  solutions have a single major website using them. I cannot pronounce  for all the products with their logos embedded in the graph nodes, but  for HBase this is somewhat incorrect. Starting from the <a href="http://bit.ly/5CdRlm" target="_blank">Powered By</a> page for HBase I thought it&#8217;d be a good  idea to fill in the blanks and also add a few high-traffic sites (I&#8217;ve  only selected websites which are in the Alexa top 1000). Here&#8217;s the new  (and supposedly, more accurate) graph:</p>
<p style="text-align: center;"><img class="size-full wp-image-214 aligncenter" title="more_examples" src="http://www.netuality.ro/wp-content/uploads/2010/05/more_examples.jpg" alt="" width="600" height="881" /></p>
<p>While it looks like Cassandra got a few higher-profile clients lately  (even if for relatively mundane tasks such as persistent caching) this  does not mean HBase is only used by StumbleUpon. I think we&#8217;ll see more  and more success stories for both of these solutions and perhaps for  some un-announced future products as well. The domain is still in its  infancy, let&#8217;s not imply &#8220;consolidation&#8221; just yet.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/and-now-for-something-a-little-different-graphviz-candy/presentations/20100506/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linkdump: Coop, HBase performance and a bit of Warcraft</title>
		<link>http://www.netuality.ro/linkdump-coop-hbase-performance-and-a-bit-of-warcraft/linkdump/20100427</link>
		<comments>http://www.netuality.ro/linkdump-coop-hbase-performance-and-a-bit-of-warcraft/linkdump/20100427#comments</comments>
		<pubDate>Tue, 27 Apr 2010 19:35:11 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[blogs]]></category>
		<category><![CDATA[Coop]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Warcraft]]></category>
		<category><![CDATA[Yahoo]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=207</guid>
		<description><![CDATA[Riptano is to Cassandra what Cloudera is to Hadoop or Percona to MySQL. Mmmkey?
A great, insightful post from Pingdom (as usual) allows us to take a peek behind the doors at largest web sites in the world, just by reading selected stuff from their respective developer blogs.
Yahoo decreased data-center cooling costs compared to power costs [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://spyced.blogspot.com/2010/04/and-now-for-something-completely.html" target="_blank">Riptano</a> is to Cassandra what Cloudera is to Hadoop or Percona to MySQL. Mmmkey?</p>
<p>A great, insightful post from Pingdom (as usual) allows us to <a href="http://royal.pingdom.com/2010/04/14/peeking-behind-the-scenes-of-the-worlds-largest-sites/" target="_blank">take a peek</a> behind the doors at largest web sites in the world, just by reading selected stuff from their respective developer blogs.</p>
<p>Yahoo decreased data-center cooling costs compared to power costs from 50 cents/dollar to only one cent/dollar. This is obtained on their most recent <a href="http://www.datacenterknowledge.com/archives/2010/04/26/yahoo-computing-coop-the-shape-of-things-to-come/" target="_blank">Yahoo Computing Coop</a> data-center built in Lockport, New York.</p>
<blockquote><p>The data center operates with no chillers, and will require water for  only a handful of days each year. Yahoo projects that the new facility  will operate at a Power Usage Effectiveness (PUE) of 1.1, placing it  among the most efficient in the industry. [...]</p>
<p>If it looks like a chicken coop, it’s because some of the design  principles were adapted from …. well, chicken coops. “Tyson Foods has  done research  involving facilities with the heat source in the center  of the  facility, looking at how to evacuate the hot air,” said  Noteboom. “We applied a lot of similar  thought to our data center.”</p>
<p>The Lockport site is ideal for fresh air cooling, with a climate that  allows Yahoo to operate for nearly the entire year without using air  conditioning for its servers.</p></blockquote>
<p>High Scalability blog <a href="http://highscalability.com/blog/2010/4/27/paper-dapper-googles-large-scale-distributed-systems-tracing.html" target="_blank">dissects</a> a paper describing <strong>Dapper</strong>, Google&#8217;s tracing system used to instrument all the components of a software system in order to understand its behavior. Immensely interesting:</p>
<blockquote><p>As you might expect Google has produced and elegant and well thought out  tracing system. In many ways it is similar to other tracing systems,  but it has that unique Google twist. A tree structure, probabilistically  unique keys, sampling, emphasising common infrastructure insertion  points, technically minded data exploration tools, a global system  perspective, MapReduce integration, sensitivity to index size,  enforcement of system wide invariants, an open API—all seem very  Googlish.</p></blockquote>
<p>On my favorite blog <img src='http://www.netuality.ro/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  <a href="http://hstack.org/" target="_blank">HStack.org</a> Andrei wrote a <a href="http://hstack.org/hbase-performance-testing/" target="_blank">great post about real-life performance testing of HBase</a>:</p>
<blockquote><p>The numbers are the tip of the iceberg; things become <strong>really  interesting</strong> once we start looking under the hood, and  interpreting the results.</p>
<p>When investigating performance issues you have to assume that  “everybody lies”. It is crucial that you don’t stop at a simple capacity  or latency result; you need to investigate every layer: the performance  tool, your code, their code, third-party libraries, the OS and the  hardware. Here’s how we went about it:</p>
<p>The first potential liar is your test, then your test tool – they  could both have bugs so you need to double-check.</p></blockquote>
<p>But the most interesting distributed system of the week is World of Warcraft. Ars Technica <a href="http://arstechnica.com/gaming/news/2010/04/earning-your-sword-a-picture-tour-of-blizzards-offices.ars" target="_blank">describes a tour of the Blizzard campus</a> and here&#8217;s a peek at the best NOC screen ever:</p>
<p style="text-align: center;"><img class="size-full wp-image-208 aligncenter" title="wowactivity" src="http://www.netuality.ro/wp-content/uploads/2010/04/wowactivity.jpg" alt="" width="500" height="378" /></p>
<p style="text-align: left;">For the hooorde!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/linkdump-coop-hbase-performance-and-a-bit-of-warcraft/linkdump/20100427/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Linkdump: Twitter, Twitter, CAP and &#8230; iPad</title>
		<link>http://www.netuality.ro/linkdump-twitter-twitter-cap-and-ipad/linkdump/20100421</link>
		<comments>http://www.netuality.ro/linkdump-twitter-twitter-cap-and-ipad/linkdump/20100421#comments</comments>
		<pubDate>Wed, 21 Apr 2010 20:24:17 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[iPad]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=194</guid>
		<description><![CDATA[Well, not all Twitter runs on Cassandra   Alex Payne explains how they build Hawkwind, a distributed search system written in Scala. Take a look at the slide 18, where you can clearly see that they use HBase as backend:

Also from the great guys at Twitter: gizzard. Interesting and appropriate name for a database [...]]]></description>
			<content:encoded><![CDATA[<p>Well, not all Twitter runs on Cassandra <img src='http://www.netuality.ro/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Alex Payne explains how they build Hawkwind, a distributed search system written in Scala. Take a look at the slide 18, where you can clearly see that they use HBase as backend:</p>
<p style="text-align: center;"><object width="425" height="348"><param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=phillyetepayne-100411190225-phpapp01"/><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=phillyetepayne-100411190225-phpapp01"  type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="348"></embed></object><!-- ysttest:Array
(
    [id] => 3693668&amp;doc=phillyetepayne-100411190225-phpapp01
)
--></p>
<p>Also from the great guys at Twitter: <a href="http://github.com/twitter/gizzard" target="_blank">gizzard</a>. Interesting and appropriate name for a database sharding framework. Gizzard uses range-based partitioning and replication tree and knows to rely on a large range of data stores: RDBMSes, Lucene or Redis &#8211; you name it. But I wonder about the operational overhead when you have a really large gizzard cluster.</p>
<p>Michael Stonebraker has a <a href="http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-theorem/fulltext" target="_blank">short essay on CAP</a> published in the ACM blogs. He identifies a series of use cases where the CAP theorem simply does not apply and cannot be appealed to for guidance:</p>
<blockquote><p>Obviously, one should write software that can deal with load spikes  without failing; for example, by shedding load or operating in a  degraded mode. Also, good monitoring software will help identify such  problems early, since the real solution is to add more capacity. Lastly,  self-reconfiguring software that can absorb additional resources  quickly is obviously a good idea.</p>
<p>In summary, one should not throw out the C so quickly,  since there are real error scenarios where CAP does not apply and it  seems like a bad tradeoff in many of the other situations.</p></blockquote>
<p>Great <a href="http://nosql.mypopescu.com/post/535298743/nosql-eu-first-day#nosqleu-pres-1" target="_blank">nosqlEu coverage</a> on Alex Popescu&#8217;s blog MyNoSQL. <a href="http://nosql.mypopescu.com/tagged/NoSQL_event" target="_blank">Read it</a> to get all the presentations, tons of links and Twitter quotes.</p>
<p>Because every self-respecting blog should mention some info about the newly released iPad, here&#8217;s mine. According to the O&#8217;Reilly Radar, <a href="http://radar.oreilly.com/2010/04/ipad-falls-short-on-cloud-inte.html" target="_blank">iPad is not ready for the cloud integration</a>:</p>
<blockquote><p>I am hoping for a future where all I need to supply a device with is  my identity, and everything else falls into place. This doesn&#8217;t even  have to be me trusting in a third-party cloud: there&#8217;s no reason similar  mechanisms couldn&#8217;t be used privately in a home network setting.</p>
<p>I think the iPad is an amazing piece of hardware, and the most  pleasant web browsing experience available. It is still very much a 1.0  device though, and its best days certainly lie ahead of it. I hope part  of that improvement is a simple story for synchronization and cloud  access.</p></blockquote>
<p>Guess I&#8217;ll be waiting for the release of iPad Pro:</p>
<p style="text-align: center;"><a href="http://www.netuality.ro/wp-content/uploads/2010/04/ipadpro.jpg"><img class="size-full wp-image-203 aligncenter" title="ipadpro" src="http://www.netuality.ro/wp-content/uploads/2010/04/ipadpro.jpg" alt="" width="400" height="565" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/linkdump-twitter-twitter-cap-and-ipad/linkdump/20100421/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linkdump: using Hbase, CAP visuals, Farmville and more</title>
		<link>http://www.netuality.ro/linkdump-using-hbase-cap-visuals-farmville-and-more/uncategorized/20100317</link>
		<comments>http://www.netuality.ro/linkdump-using-hbase-cap-visuals-farmville-and-more/uncategorized/20100317#comments</comments>
		<pubDate>Wed, 17 Mar 2010 10:20:25 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[CAP]]></category>
		<category><![CDATA[Datacenter]]></category>
		<category><![CDATA[Farmville]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[replication]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=187</guid>
		<description><![CDATA[Two great posts from my colleagues about why Adobe is using HBase: part 1 and part 2. As I&#8217;ve experienced all these firsthand, I guarantee this is solid, relevant information. Both articles are highly recommended reads.
Speaking about HBase, there&#8217;s rumor on the street that they are taking HBASE-1295 (multi data center replication) very seriously and [...]]]></description>
			<content:encoded><![CDATA[<p>Two great posts from my colleagues about why Adobe is using HBase: <a href="http://hstack.org/why-were-using-hbase-part-1/" target="_blank">part 1</a> and <a href="http://hstack.org/why-were-using-hbase-part-2/" target="_blank">part 2</a>. As I&#8217;ve experienced all these firsthand, I guarantee this is solid, relevant information. Both articles are highly recommended reads.</p>
<p>Speaking about HBase, there&#8217;s <a href="http://blog.sematext.com/2010/02/28/hbase-digest-february-2010/" target="_blank">rumor on the street</a> that they are taking HBASE-1295 (<a href="https://issues.apache.org/jira/browse/HBASE-1295" target="_blank">multi data center replication</a>) very seriously and we&#8217;ll be seeing a new feature announcement relatively soon. Waiting forward!</p>
<p>An older but still interesting presentation on how RIPE NCC is using Hadoop and HBase to store and search through IP addresses for Europe, Middle East and Russia can be found <a href="http://www.scribd.com/doc/24334444/Scaling-Out-With-Hadoop-And-HBase" target="_blank">here</a>:</p>
<p style="text-align: center;"><img class="size-full wp-image-188 aligncenter" title="ripe_ncc" src="http://www.netuality.ro/wp-content/uploads/2010/03/ripe_ncc.jpg" alt="" width="490" height="370" /></p>
<p>It looks like Farmvile is <a href="http://highscalability.com/blog/2010/3/10/how-farmville-scales-the-follow-up.html" target="_blank">still in the MySQL+memcache phase</a>, according to the High Scalability blog. And they use PHP. When will they start looking into NoSQL? Hopefully soon enough to have a good crop.</p>
<p>Nathan&#8217;s <a href="http://blog.nahurst.com/visual-guide-to-nosql-systems" target="_blank">visual guide to NoSQL systems</a> while perhaps not entirely correct is a nice tentative to put all these projects on the same map. I would love to see a &#8220;patched&#8221; version of the visual guide taking into account all the information left in the comments&#8230;</p>
<p>Oh and Twitter is <a href="http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709" target="_blank">using Protocol Buffers to store information on Hadoop</a>. And they&#8217;re going to opensource their implementation.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/linkdump-using-hbase-cap-visuals-farmville-and-more/uncategorized/20100317/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linkdump: Cassandra lovers, blowing the circuit breaker and Oracle clouds</title>
		<link>http://www.netuality.ro/linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds/linkdump/20100304</link>
		<comments>http://www.netuality.ro/linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds/linkdump/20100304#comments</comments>
		<pubDate>Thu, 04 Mar 2010 18:31:13 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=181</guid>
		<description><![CDATA[Good points (as always) on Alexandru&#8217;s blog discussing the SQL scalability isn&#8217;t for everyone topic.
NoSQL as RDBMS are just tools for our job and there is nothing about the  death of one of the other. But as we’ve learned over years, every new  programming language is the death of all its precursors, every [...]]]></description>
			<content:encoded><![CDATA[<p>Good points (as always) on Alexandru&#8217;s blog discussing the <a href="http://nosql.mypopescu.com/post/424164220/sql-is-scalable-sql-scalability-isnt-for-everyone" target="_blank">SQL scalability isn&#8217;t for everyone</a> topic.</p>
<blockquote><p>NoSQL as RDBMS are just tools for our job and there is nothing about the  death of one of the other. But as we’ve learned over years, every new  programming language is the death of all its precursors, every new  programming paradigm is the death of everything that existed before and  so on. The part that some seem to be missing or ignoring deliberately is  that in most of these cases this death have never really happened.</p></blockquote>
<p>For large-scale performance testing of a production environment check out how <span style="text-decoration: line-through;">Facebook</span> MySpace <a href="http://highscalability.com/blog/2010/3/4/how-myspace-tested-their-live-site-with-1-million-concurrent.html" target="_blank">simulated 1 million concurrent users</a> with a huge EC2 cluster, described on the High Scalability blog. While the article is a guest post from a company selling &#8220;cloud testing&#8221; solutions and has a bit of &#8220;sales juice&#8221; in it, it&#8217;s still a very good read:</p>
<p style="text-align: center;"><img class="aligncenter" title="Large-scale testing using EC2" src="http://farm3.static.flickr.com/2776/4405976247_0fd13b6f26.jpg?__SQUARESPACE_CACHEVERSION=1267718646170" alt="Large-scale testing using EC2" width="500" height="342" /></p>
<p>Someone is <a href="https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/" target="_blank">in love with Cassandra</a> after only 4 months. Hoping Cassandra doesn&#8217;t get too fat after the wedding:</p>
<blockquote><p>Traditional sharding and replication with databases like MySQL and   PostgreSQL have been shown to work even on the largest scale websites —   but come at a large operational cost. Setting up replication for MySQL   can be done quickly, but there are many issues you need to be aware of,   such as slave replication lag. Sharding can be done once you reach  write  throughput limits, but you are almost always stuck writing your  own  sharding layer to fit how your data is created and operationally,  it  takes a lot of time to set everything up correctly. We skipped that  step  all together and added a couple hooks to make our data aggregation   service siphon to both PostgreSQL and Cassandra for the initial   integration.</p></blockquote>
<p><a href="http://www.anders.com/cms/282/Distributed.Data/Hadoop/Hbase/Hive" target="_blank">Distributed data war stories</a> from Anders @ bandwidth.com, HBase and Hadoop on commodity hardware:</p>
<blockquote><p>As mentioned before, the commodity machines I used were very basic but I  was able to insert conservatively about 500 records per second with  this setup. I kept blowing the circuit breaker at the office as well  forcing me to spread the machines across several power circuits but it  proved that the system was at least fault tolerant!</p></blockquote>
<p><a href="http://www.thebitsource.com/software-engineering/python/sourceforgenet-chooses-python-turbogears-and-mongodb-to-redesign-their-web-site/" target="_blank">SourceForge chooses Python, TurboGears and &#8230; MongoDB</a> for a new version of their website. Looks like Mongo is becoming quite mainstream.</p>
<p>Don&#8217;t believe the rumors, <a href="http://blogs.forrester.com/appdev/2010/03/oracle-has-a-cloud-strategy-after-all.html" target="_blank">Oracle is into cloud computing after all</a> &#8211; at least according to Forrester. Well, as long as the clouds are private. And as long as you can live with &#8220;coming soon&#8221; tooling. And it&#8217;s not like they really have a clear long-term strategy for cloud computing:</p>
<blockquote><p>I believe that cloud is a revolution for Oracle, IBM, SAP, and the other big  vendors with direct sales forces (despite what they say). Cloud computing has the  potential to undermine the account-management practices and pricing models these big companies are  founded on. I think it will take years for each of the big vendors to adapt to cloud computing. Oracle is just beginning this journey; I think other  vendors are further down the track.</p></blockquote>
<p>The igvita blog hits NoSQL in the groin by <a href="http://www.igvita.com/2010/03/01/schema-free-mysql-vs-nosql/" target="_blank">showing a simple way of having a schema-free data store</a> &#8230; in MySQL. It&#8217;s a sort of proxy that translates schemas into denormalized data placed in distinct tables:</p>
<blockquote><p>Instead of defining columns on a table, each attribute has its own table  (new tables are created on the fly), which means that we can add and  remove attributes at will. In turn, performing a select simply means  joining all of the tables on that individual key. To the client this is  completely transparent, and while the proxy server does the actual work,  this functionality could be easily extracted into a proper MySQL engine  &#8211; I’m just surprised that no one has done so already.</p></blockquote>
<p>While an interesting idea, not sure how effective this will be in practice, as joins are among the most time-consuming operations in the database world. I&#8217;m pretty sure that replacing a 10-column table get on the primary key with joins on 10 tables will add an important overhead.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds/linkdump/20100304/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Linkdump: Cassandra @Twitter, Forrester not grokking NoSQL</title>
		<link>http://www.netuality.ro/linkdump-cassandra-twitter-forrester-not-grokking-nosql/linkdump/20100224</link>
		<comments>http://www.netuality.ro/linkdump-cassandra-twitter-forrester-not-grokking-nosql/linkdump/20100224#comments</comments>
		<pubDate>Wed, 24 Feb 2010 20:18:27 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Forrester]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=174</guid>
		<description><![CDATA[Seven signs you need to accept NoSQL in your life according to the High Scalability blog. I especially like sign #6 &#8220;Maintaining a completely separate object caching system on top  of an already beefy table storage system&#8220;. There are companies making serious bucks from selling exactly this type of caching systems. I find that [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://highscalability.com/blog/2010/2/16/seven-signs-you-may-need-a-nosql-database.html" target="_blank">Seven signs you need to accept NoSQL in your life</a> according to the High Scalability blog. I especially like sign #6 &#8220;<strong>Maintaining a completely separate object caching system on top  of an already beefy table storage system</strong>&#8220;. There are companies making serious bucks from selling exactly this type of caching systems. I find that a bit ironic, don&#8217;t you?</p>
<p><a href="http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king" target="_blank">Twitter has just decided to adopt Cassandra</a> as their main storage. I roughly estimated the status table to having  more than 9 billion rows &#8211; it&#8217;s a good table size to start thinking about the benefits of NoSQL. I would have been interested in seeing a comparison with other existing solutions and a rationale of their choice. According to some sources, Ryan King rejected HBase because if  a region server is down, writes will be blocked for affected data until  the data is redistributed &#8211; unlike Cassandra&#8217;s &#8220;write never fail&#8221; policy. According to other sources, this will be solved in a future version of HBase but I think Twitter needed a solution sooner rather than later. I hope for two things:</p>
<ul>
<li>That the Twitter dudes will blog about their migration experience</li>
<li>That I&#8217;ll be able to access and search through all my older tweets, fer&#8217; God sake!</li>
</ul>
<p><a href="http://blogs.forrester.com/appdev/2010/02/nosql.html" target="_blank">Forrester Research thinks</a> that NoSQL and Elastic Caching Platforms are very similar. So similar that &#8220;<strong>NoSQL Wants To Be Elastic Caching When It Grows Up</strong>&#8220;. According to Forrester &#8220;<em>Ultimately, the real difference between NoSQL and elastic caching  now may be in-memory versus persistent storage on disk.</em>&#8221; Yeah sure: transactions, durability, indexing, security model &#8211; who needs this crap anyway?</p>
<p>Oh and let&#8217;s not forget about <a href="http://groups.google.com/group/google-appengine-downtime-notify/browse_thread/thread/b4ed491a8b9ccce2" target="_blank">today&#8217;s GAE unscheduled downtime</a>. Waiting forward for the post mortem, for sure there will be a thing or two to learn&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/linkdump-cassandra-twitter-forrester-not-grokking-nosql/linkdump/20100224/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>January 30 linkdump: cloud, cloud, cloud</title>
		<link>http://www.netuality.ro/january-30-linkdump-cloud-cloud-cloud/linkdump/20100130</link>
		<comments>http://www.netuality.ro/january-30-linkdump-cloud-cloud-cloud/linkdump/20100130#comments</comments>
		<pubDate>Sat, 30 Jan 2010 20:44:10 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[Cloudkick]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[private cloud]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=172</guid>
		<description><![CDATA[Yes there is such a thing as cloud management services and Cloudkick has a business model around them:
The San Francisco company’s existing features — including a dashboard  with an overview of your cloud infrastructure, email alerts, and graphs  that you help you visualize data like bandwidth requirements — will  always be free, [...]]]></description>
			<content:encoded><![CDATA[<p>Yes there is such a thing as cloud management services and <a href="http://venturebeat.com/2010/01/25/cloudkick/" target="_blank">Cloudkick has a business model around them</a>:</p>
<blockquote><p>The San Francisco company’s existing features — including a dashboard  with an overview of your cloud infrastructure, email alerts, and graphs  that you help you visualize data like bandwidth requirements — will  always be free, said co-founder and chief executive Alex Polvi. But  Cloudkick wants to charge for features on top of the basic service, such  as SMS alerts when your app has problems and a change-log tool where  sysadmins can communicate with each other, which Polvi described as  “Twitter for servers.”</p></blockquote>
<p>Great <a href="http://gojko.net/2010/01/25/designing-applications-for-cloud-deployment/" target="_blank">article on designing applications for the cloud</a> from Godjo Adzic who spent his last two years in projects deployed on the Amazon cloud:</p>
<blockquote><p>A very healthy way to look at this is that all your cloud applications  will run on a bunch of cheap web servers. It’s healthy because planning  for that in advance will help you keep your mental health when glitches  occur, and it will also force you to design for machine failure upfront  making the system more resilient.</p></blockquote>
<p><a href="http://www.royans.net/arch/private-clouds-not-the-future" target="_blank">Royans blog comments</a> James Hamilton critical post about <a href="http://perspectives.mvdirona.com/2010/01/17/PrivateCloudsAreNotTheFuture.aspx" target="_blank">private clouds not being the future</a>:</p>
<blockquote><p>Though I believe in most of his comments, I’m not convinced with the  generalization of the conclusions. In particular, what is the maximum  number of servers one need to own, beyond which outsourcing will become a  liability. I suspect this is not a very high number today, but will  grow over time.</p></blockquote>
<p>And a good detailed article about <a href="http://www.royans.net/arch/hive-facebook" target="_blank">Hive used at Facebook</a>:</p>
<blockquote><p>Facebook has a production Hive cluster which is primarily used for log  summarization, including aggregation of impressions, click counts and  statistics around user engagement. They have a separate cluster for “Ad  hoc analysis” which is free for all/most Facebook employees to use. And  over time they figured out how to use it for spam detection, ad  optimization and a host of other undocumented stuff.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/january-30-linkdump-cloud-cloud-cloud/linkdump/20100130/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>January 23 linkdump: grids, BuddyPoke and the state of Internet</title>
		<link>http://www.netuality.ro/january-23-linkdump-grids-buddypoke-and-the-state-of-internet/linkdump/20100123</link>
		<comments>http://www.netuality.ro/january-23-linkdump-grids-buddypoke-and-the-state-of-internet/linkdump/20100123#comments</comments>
		<pubDate>Sat, 23 Jan 2010 10:20:49 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[grid]]></category>
		<category><![CDATA[Internet]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=170</guid>
		<description><![CDATA[On Enterprise Storage a few experts look at grid computing and the future of cloud computing.
Can cloud computing  succeed where grid failed and find widespread acceptance in enterprise  data centers? And is there still room for grid computing in the brave  new world of cloud computing? We asked some grid computing pioneers [...]]]></description>
			<content:encoded><![CDATA[<p>On Enterprise Storage a few experts look at <a href="http://www.enterprisestorageforum.com/outsourcing/features/article.php/3859956" target="_blank">grid computing and the future of cloud computing</a>.</p>
<blockquote><p>Can cloud computing  succeed where grid failed and find widespread acceptance in enterprise  data centers? And is there still room for grid computing in the brave  new world of cloud computing? We asked some grid computing pioneers for  their views on the issue.</p>
<p>[...]</p>
<p>And when it comes to  IaaS [infrastructure as a service], I think in five years something like  80 to 90 percent of the computation we are doing could be cloud-based.</p></blockquote>
<p><a href="http://www.buddypoke.com/" target="_blank">BuddyPoke</a> cofounder Dave Westwood <a href="http://highscalability.com/blog/2010/1/22/how-buddypoke-scales-on-facebook-using-google-app-engine.html" target="_blank">explains on the High Scalability</a> blog how they achieved viral scale, Facebook viral scale to be more specific. BuddyPoke is today entirely hosted on GAE (Google AppEngine) and they some great insights and lessons learned.</p>
<blockquote><p>On the surface BuddyPoke seems simple, but under hood there&#8217;s some  intricate strategy going on. Minimizing costs while making it scale and  perform is not obvious. Who does what, when, why and how takes some  puzzling out. It&#8217;s certainly an approach a growing class of apps will  find themselves using in the future.</p></blockquote>
<p>Jamesh Varia from Amazon wrote a great <a href="http://jineshvaria.s3.amazonaws.com/public/cloudbestpractices-jvaria.pdf" target="_blank">Architecting for the Cloud: Best Practices [PDF]</a> paper:</p>
<blockquote><p>This paper is targeted towards cloud architects who are gearing up to move an enterprise-class application from a fixed physical environment to a virtualized cloud environment. The focus of this paper is to highlight concepts, principles and best practices in creating new cloud applications or migrating existing applications to the cloud.</p>
<p>The AWS cloud offers highly reliable pay-as-you-go infrastructure services. The AWS-specific tactics highlighted in the paper will help design cloud applications using these services. As a researcher, it is advised that you play with these commercial services, learn from the work of others, build on the top, enhance and further invent cloud computing.</p></blockquote>
<p>The Pingdom guys have another fantastic post on their blog about the <a href="http://bit.ly/7OZhhX" target="_blank">state of Internet in 2009</a>:</p>
<ul>
<blockquote>
<li><strong>90 trillion</strong> – The number of emails sent on the Internet  in 2009.</li>
<li><strong>92%</strong> – Peak spam levels late in the year.</li>
<li><strong>13.9%</strong> – The growth of Apache websites in 2009.</li>
<li><strong>-22.1%</strong> – The growth of IIS websites in 2009.</li>
</blockquote>
</ul>
<p>These and more interesting statistics <a href="http://bit.ly/7OZhhX" target="_blank">in their blog post</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/january-23-linkdump-grids-buddypoke-and-the-state-of-internet/linkdump/20100123/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Map/Reduce patent and impact on Hadoop: none expected</title>
		<link>http://www.netuality.ro/googles-mapreduce-patent-and-impact-on-hadoop-none-expected/articles/20100122</link>
		<comments>http://www.netuality.ro/googles-mapreduce-patent-and-impact-on-hadoop-none-expected/articles/20100122#comments</comments>
		<pubDate>Fri, 22 Jan 2010 16:39:02 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Map/Reduce]]></category>
		<category><![CDATA[patent]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=161</guid>
		<description><![CDATA[From the GigaOm analysis:
Fortunately, for them, it seems unlikely that Google will take to the courts to enforce its new intellectual property. A big reason is that “map” and “reduce” functions have been part of parallel programming for decades, and vendors with deep pockets certainly could make arguments that Google didn’t invent MapReduce at all.
Should [...]]]></description>
			<content:encoded><![CDATA[<p>From the <a href="http://bit.ly/4HKsLc" target="_blank">GigaOm analysis</a>:</p>
<blockquote><p>Fortunately, for them, it seems unlikely that Google will take to the courts to enforce its new intellectual property. A big reason is that “map” and “reduce” functions have been part of parallel programming for decades, and vendors with deep pockets certainly could make arguments that Google didn’t invent MapReduce at all.</p>
<p>Should Hadoop come under fire, any defendants (or interveners like Yahoo and/or IBM) could have strong technical arguments over whether the open-source Hadoop even is an infringement. Then there is the question of money: Google has been making plenty of it without the patent, so why risk the legal and monetary consequences of losing any hypothetical lawsuit? Plus, Google supports Hadoop, which lets university students learn webscale programming (so they can become future Googlers) without getting access to Google’s proprietary MapReduce language.</p>
<p>[...]</p>
<p>A Google spokeswoman emailed this in response to our questions about why Google sought the patent, and whether or not Google would seek to enforce its patent rights, attributing it to Michelle Lee, Deputy General Counsel:</p>
<p>“Like other responsible, innovative companies, Google files patent applications on a variety of technologies it develops. While we do not comment about the use of this or any part of our portfolio, we feel that our behavior to date has been inline with our corporate values and priorities.”</p></blockquote>
<p>From <a href="http://bit.ly/67HA0e" target="_blank">Ars Technica</a>:</p>
<blockquote><p>Hadoop isn&#8217;t the only open source project that uses MapReduce technology. As some readers may know, I&#8217;ve recently been experimenting with CouchDB, an open source database system that allows developers to perform queries with map and reduce functions. Another place where I&#8217;ve seen MapReduce is Nokia&#8217;s QtConcurrent framework, an extremely elegant parallel programming library for Qt desktop applications.</p>
<p>It&#8217;s unclear what Google&#8217;s patent will mean for all of these MapReduce adopters. Fortunately, Google does not have a history of aggressive patent enforcement. It&#8217;s certainly possible that the company obtained the patent for &#8220;defensive&#8221; purposes. Like virtually all major software companies, Google is frequently the target of patent lawsuits. Many companies in technical fields attempt to collect as many broad patents as they can so that they will have ammunition with which to retaliate when they are faced with patent infringement lawsuits.</p>
<p>Google&#8217;s MapReduce patent raises some troubling questions for software like Hadoop, but it looks unlikely that Google will assert the patent in the near future; Google itself uses Hadoop for its Code University program.</p>
<p>Even if Google takes the unlikely course of action and does decide to target Hadoop users with patent litigation, the company would face significant resistance from the open source project&#8217;s deep-pocketed backers—including IBM, which holds the industry&#8217;s largest patent arsenal.</p>
<p>Another dimension of this issue is the patent&#8217;s validity. On one hand, it&#8217;s unclear if taking age-old principles of functional software development and applying them to a cluster constitutes a patentable innovation.</p></blockquote>
<p>Still nothing from the big analysts, Gartner and the gang&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/googles-mapreduce-patent-and-impact-on-hadoop-none-expected/articles/20100122/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
