<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Netuality</title>
	<atom:link href="http://www.netuality.ro/feed" rel="self" type="application/rss+xml" />
	<link>http://www.netuality.ro</link>
	<description>Taming the big, bad, nasty websites</description>
	<lastBuildDate>Thu, 04 Mar 2010 22:56:10 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Linkdump: Cassandra lovers, blowing the circuit breaker and Oracle clouds</title>
		<link>http://www.netuality.ro/linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds/linkdump/20100304</link>
		<comments>http://www.netuality.ro/linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds/linkdump/20100304#comments</comments>
		<pubDate>Thu, 04 Mar 2010 18:31:13 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[HBase]]></category>
		<category><![CDATA[testing]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=181</guid>
		<description><![CDATA[Good points (as always) on Alexandru&#8217;s blog discussing the SQL scalability isn&#8217;t for everyone topic.
NoSQL as RDBMS are just tools for our job and there is nothing about the  death of one of the other. But as we’ve learned over years, every new  programming language is the death of all its precursors, every [...]]]></description>
			<content:encoded><![CDATA[<p>Good points (as always) on Alexandru&#8217;s blog discussing the <a href="http://nosql.mypopescu.com/post/424164220/sql-is-scalable-sql-scalability-isnt-for-everyone" target="_blank">SQL scalability isn&#8217;t for everyone</a> topic.</p>
<blockquote><p>NoSQL as RDBMS are just tools for our job and there is nothing about the  death of one of the other. But as we’ve learned over years, every new  programming language is the death of all its precursors, every new  programming paradigm is the death of everything that existed before and  so on. The part that some seem to be missing or ignoring deliberately is  that in most of these cases this death have never really happened.</p></blockquote>
<p>For large-scale performance testing of a production environment check out how <span style="text-decoration: line-through;">Facebook</span> MySpace <a href="http://highscalability.com/blog/2010/3/4/how-myspace-tested-their-live-site-with-1-million-concurrent.html" target="_blank">simulated 1 million concurrent users</a> with a huge EC2 cluster, described on the High Scalability blog. While the article is a guest post from a company selling &#8220;cloud testing&#8221; solutions and has a bit of &#8220;sales juice&#8221; in it, it&#8217;s still a very good read:</p>
<p style="text-align: center;"><img class="aligncenter" title="Large-scale testing using EC2" src="http://farm3.static.flickr.com/2776/4405976247_0fd13b6f26.jpg?__SQUARESPACE_CACHEVERSION=1267718646170" alt="Large-scale testing using EC2" width="500" height="342" /></p>
<p>Someone is <a href="https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/" target="_blank">in love with Cassandra</a> after only 4 months. Hoping Cassandra doesn&#8217;t get too fat after the wedding:</p>
<blockquote><p>Traditional sharding and replication with databases like MySQL and   PostgreSQL have been shown to work even on the largest scale websites —   but come at a large operational cost. Setting up replication for MySQL   can be done quickly, but there are many issues you need to be aware of,   such as slave replication lag. Sharding can be done once you reach  write  throughput limits, but you are almost always stuck writing your  own  sharding layer to fit how your data is created and operationally,  it  takes a lot of time to set everything up correctly. We skipped that  step  all together and added a couple hooks to make our data aggregation   service siphon to both PostgreSQL and Cassandra for the initial   integration.</p></blockquote>
<p><a href="http://www.anders.com/cms/282/Distributed.Data/Hadoop/Hbase/Hive" target="_blank">Distributed data war stories</a> from Anders @ bandwidth.com, HBase and Hadoop on commodity hardware:</p>
<blockquote><p>As mentioned before, the commodity machines I used were very basic but I  was able to insert conservatively about 500 records per second with  this setup. I kept blowing the circuit breaker at the office as well  forcing me to spread the machines across several power circuits but it  proved that the system was at least fault tolerant!</p></blockquote>
<p><a href="http://www.thebitsource.com/software-engineering/python/sourceforgenet-chooses-python-turbogears-and-mongodb-to-redesign-their-web-site/" target="_blank">SourceForge chooses Python, TurboGears and &#8230; MongoDB</a> for a new version of their website. Looks like Mongo is becoming quite mainstream.</p>
<p>Don&#8217;t believe the rumors, <a href="http://blogs.forrester.com/appdev/2010/03/oracle-has-a-cloud-strategy-after-all.html" target="_blank">Oracle is into cloud computing after all</a> &#8211; at least according to Forrester. Well, as long as the clouds are private. And as long as you can live with &#8220;coming soon&#8221; tooling. And it&#8217;s not like they really have a clear long-term strategy for cloud computing:</p>
<blockquote><p>I believe that cloud is a revolution for Oracle, IBM, SAP, and the other big  vendors with direct sales forces (despite what they say). Cloud computing has the  potential to undermine the account-management practices and pricing models these big companies are  founded on. I think it will take years for each of the big vendors to adapt to cloud computing. Oracle is just beginning this journey; I think other  vendors are further down the track.</p></blockquote>
<p>The igvita blog hits NoSQL in the groin by <a href="http://www.igvita.com/2010/03/01/schema-free-mysql-vs-nosql/" target="_blank">showing a simple way of having a schema-free data store</a> &#8230; in MySQL. It&#8217;s a sort of proxy that translates schemas into denormalized data placed in distinct tables:</p>
<blockquote><p>Instead of defining columns on a table, each attribute has its own table  (new tables are created on the fly), which means that we can add and  remove attributes at will. In turn, performing a select simply means  joining all of the tables on that individual key. To the client this is  completely transparent, and while the proxy server does the actual work,  this functionality could be easily extracted into a proper MySQL engine  &#8211; I’m just surprised that no one has done so already.</p></blockquote>
<p>While an interesting idea, not sure how effective this will be in practice, as joins are among the most time-consuming operations in the database world. I&#8217;m pretty sure that replacing a 10-column table get on the primary key with joins on 10 tables will add an important overhead.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds/linkdump/20100304/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Linkdump: Cassandra @Twitter, Forrester not grokking NoSQL</title>
		<link>http://www.netuality.ro/linkdump-cassandra-twitter-forrester-not-grokking-nosql/linkdump/20100224</link>
		<comments>http://www.netuality.ro/linkdump-cassandra-twitter-forrester-not-grokking-nosql/linkdump/20100224#comments</comments>
		<pubDate>Wed, 24 Feb 2010 20:18:27 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[Forrester]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=174</guid>
		<description><![CDATA[Seven signs you need to accept NoSQL in your life according to the High Scalability blog. I especially like sign #6 &#8220;Maintaining a completely separate object caching system on top  of an already beefy table storage system&#8220;. There are companies making serious bucks from selling exactly this type of caching systems. I find that [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://highscalability.com/blog/2010/2/16/seven-signs-you-may-need-a-nosql-database.html" target="_blank">Seven signs you need to accept NoSQL in your life</a> according to the High Scalability blog. I especially like sign #6 &#8220;<strong>Maintaining a completely separate object caching system on top  of an already beefy table storage system</strong>&#8220;. There are companies making serious bucks from selling exactly this type of caching systems. I find that a bit ironic, don&#8217;t you?</p>
<p><a href="http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king" target="_blank">Twitter has just decided to adopt Cassandra</a> as their main storage. I roughly estimated the status table to having  more than 9 billion rows &#8211; it&#8217;s a good table size to start thinking about the benefits of NoSQL. I would have been interested in seeing a comparison with other existing solutions and a rationale of their choice. According to some sources, Ryan King rejected HBase because if  a region server is down, writes will be blocked for affected data until  the data is redistributed &#8211; unlike Cassandra&#8217;s &#8220;write never fail&#8221; policy. According to other sources, this will be solved in a future version of HBase but I think Twitter needed a solution sooner rather than later. I hope for two things:</p>
<ul>
<li>That the Twitter dudes will blog about their migration experience</li>
<li>That I&#8217;ll be able to access and search through all my older tweets, fer&#8217; God sake!</li>
</ul>
<p><a href="http://blogs.forrester.com/appdev/2010/02/nosql.html" target="_blank">Forrester Research thinks</a> that NoSQL and Elastic Caching Platforms are very similar. So similar that &#8220;<strong>NoSQL Wants To Be Elastic Caching When It Grows Up</strong>&#8220;. According to Forrester &#8220;<em>Ultimately, the real difference between NoSQL and elastic caching  now may be in-memory versus persistent storage on disk.</em>&#8221; Yeah sure: transactions, durability, indexing, security model &#8211; who needs this crap anyway?</p>
<p>Oh and let&#8217;s not forget about <a href="http://groups.google.com/group/google-appengine-downtime-notify/browse_thread/thread/b4ed491a8b9ccce2" target="_blank">today&#8217;s GAE unscheduled downtime</a>. Waiting forward for the post mortem, for sure there will be a thing or two to learn&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/linkdump-cassandra-twitter-forrester-not-grokking-nosql/linkdump/20100224/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>January 30 linkdump: cloud, cloud, cloud</title>
		<link>http://www.netuality.ro/january-30-linkdump-cloud-cloud-cloud/linkdump/20100130</link>
		<comments>http://www.netuality.ro/january-30-linkdump-cloud-cloud-cloud/linkdump/20100130#comments</comments>
		<pubDate>Sat, 30 Jan 2010 20:44:10 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[Cloudkick]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[private cloud]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=172</guid>
		<description><![CDATA[Yes there is such a thing as cloud management services and Cloudkick has a business model around them:
The San Francisco company’s existing features — including a dashboard  with an overview of your cloud infrastructure, email alerts, and graphs  that you help you visualize data like bandwidth requirements — will  always be free, [...]]]></description>
			<content:encoded><![CDATA[<p>Yes there is such a thing as cloud management services and <a href="http://venturebeat.com/2010/01/25/cloudkick/" target="_blank">Cloudkick has a business model around them</a>:</p>
<blockquote><p>The San Francisco company’s existing features — including a dashboard  with an overview of your cloud infrastructure, email alerts, and graphs  that you help you visualize data like bandwidth requirements — will  always be free, said co-founder and chief executive Alex Polvi. But  Cloudkick wants to charge for features on top of the basic service, such  as SMS alerts when your app has problems and a change-log tool where  sysadmins can communicate with each other, which Polvi described as  “Twitter for servers.”</p></blockquote>
<p>Great <a href="http://gojko.net/2010/01/25/designing-applications-for-cloud-deployment/" target="_blank">article on designing applications for the cloud</a> from Godjo Adzic who spent his last two years in projects deployed on the Amazon cloud:</p>
<blockquote><p>A very healthy way to look at this is that all your cloud applications  will run on a bunch of cheap web servers. It’s healthy because planning  for that in advance will help you keep your mental health when glitches  occur, and it will also force you to design for machine failure upfront  making the system more resilient.</p></blockquote>
<p><a href="http://www.royans.net/arch/private-clouds-not-the-future" target="_blank">Royans blog comments</a> James Hamilton critical post about <a href="http://perspectives.mvdirona.com/2010/01/17/PrivateCloudsAreNotTheFuture.aspx" target="_blank">private clouds not being the future</a>:</p>
<blockquote><p>Though I believe in most of his comments, I’m not convinced with the  generalization of the conclusions. In particular, what is the maximum  number of servers one need to own, beyond which outsourcing will become a  liability. I suspect this is not a very high number today, but will  grow over time.</p></blockquote>
<p>And a good detailed article about <a href="http://www.royans.net/arch/hive-facebook" target="_blank">Hive used at Facebook</a>:</p>
<blockquote><p>Facebook has a production Hive cluster which is primarily used for log  summarization, including aggregation of impressions, click counts and  statistics around user engagement. They have a separate cluster for “Ad  hoc analysis” which is free for all/most Facebook employees to use. And  over time they figured out how to use it for spam detection, ad  optimization and a host of other undocumented stuff.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/january-30-linkdump-cloud-cloud-cloud/linkdump/20100130/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>January 23 linkdump: grids, BuddyPoke and the state of Internet</title>
		<link>http://www.netuality.ro/january-23-linkdump-grids-buddypoke-and-the-state-of-internet/linkdump/20100123</link>
		<comments>http://www.netuality.ro/january-23-linkdump-grids-buddypoke-and-the-state-of-internet/linkdump/20100123#comments</comments>
		<pubDate>Sat, 23 Jan 2010 10:20:49 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[Facebook]]></category>
		<category><![CDATA[grid]]></category>
		<category><![CDATA[Internet]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=170</guid>
		<description><![CDATA[On Enterprise Storage a few experts look at grid computing and the future of cloud computing.
Can cloud computing  succeed where grid failed and find widespread acceptance in enterprise  data centers? And is there still room for grid computing in the brave  new world of cloud computing? We asked some grid computing pioneers [...]]]></description>
			<content:encoded><![CDATA[<p>On Enterprise Storage a few experts look at <a href="http://www.enterprisestorageforum.com/outsourcing/features/article.php/3859956" target="_blank">grid computing and the future of cloud computing</a>.</p>
<blockquote><p>Can cloud computing  succeed where grid failed and find widespread acceptance in enterprise  data centers? And is there still room for grid computing in the brave  new world of cloud computing? We asked some grid computing pioneers for  their views on the issue.</p>
<p>[...]</p>
<p>And when it comes to  IaaS [infrastructure as a service], I think in five years something like  80 to 90 percent of the computation we are doing could be cloud-based.</p></blockquote>
<p><a href="http://www.buddypoke.com/" target="_blank">BuddyPoke</a> cofounder Dave Westwood <a href="http://highscalability.com/blog/2010/1/22/how-buddypoke-scales-on-facebook-using-google-app-engine.html" target="_blank">explains on the High Scalability</a> blog how they achieved viral scale, Facebook viral scale to be more specific. BuddyPoke is today entirely hosted on GAE (Google AppEngine) and they some great insights and lessons learned.</p>
<blockquote><p>On the surface BuddyPoke seems simple, but under hood there&#8217;s some  intricate strategy going on. Minimizing costs while making it scale and  perform is not obvious. Who does what, when, why and how takes some  puzzling out. It&#8217;s certainly an approach a growing class of apps will  find themselves using in the future.</p></blockquote>
<p>Jamesh Varia from Amazon wrote a great <a href="http://jineshvaria.s3.amazonaws.com/public/cloudbestpractices-jvaria.pdf" target="_blank">Architecting for the Cloud: Best Practices [PDF]</a> paper:</p>
<blockquote><p>This paper is targeted towards cloud architects who are gearing up to move an enterprise-class application from a fixed physical environment to a virtualized cloud environment. The focus of this paper is to highlight concepts, principles and best practices in creating new cloud applications or migrating existing applications to the cloud.</p>
<p>The AWS cloud offers highly reliable pay-as-you-go infrastructure services. The AWS-specific tactics highlighted in the paper will help design cloud applications using these services. As a researcher, it is advised that you play with these commercial services, learn from the work of others, build on the top, enhance and further invent cloud computing.</p></blockquote>
<p>The Pingdom guys have another fantastic post on their blog about the <a href="http://bit.ly/7OZhhX" target="_blank">state of Internet in 2009</a>:</p>
<ul>
<blockquote>
<li><strong>90 trillion</strong> – The number of emails sent on the Internet  in 2009.</li>
<li><strong>92%</strong> – Peak spam levels late in the year.</li>
<li><strong>13.9%</strong> – The growth of Apache websites in 2009.</li>
<li><strong>-22.1%</strong> – The growth of IIS websites in 2009.</li>
</blockquote>
</ul>
<p>These and more interesting statistics <a href="http://bit.ly/7OZhhX" target="_blank">in their blog post</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/january-23-linkdump-grids-buddypoke-and-the-state-of-internet/linkdump/20100123/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Map/Reduce patent and impact on Hadoop: none expected</title>
		<link>http://www.netuality.ro/googles-mapreduce-patent-and-impact-on-hadoop-none-expected/articles/20100122</link>
		<comments>http://www.netuality.ro/googles-mapreduce-patent-and-impact-on-hadoop-none-expected/articles/20100122#comments</comments>
		<pubDate>Fri, 22 Jan 2010 16:39:02 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Map/Reduce]]></category>
		<category><![CDATA[patent]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=161</guid>
		<description><![CDATA[From the GigaOm analysis:
Fortunately, for them, it seems unlikely that Google will take to the courts to enforce its new intellectual property. A big reason is that “map” and “reduce” functions have been part of parallel programming for decades, and vendors with deep pockets certainly could make arguments that Google didn’t invent MapReduce at all.
Should [...]]]></description>
			<content:encoded><![CDATA[<p>From the <a href="http://bit.ly/4HKsLc" target="_blank">GigaOm analysis</a>:</p>
<blockquote><p>Fortunately, for them, it seems unlikely that Google will take to the courts to enforce its new intellectual property. A big reason is that “map” and “reduce” functions have been part of parallel programming for decades, and vendors with deep pockets certainly could make arguments that Google didn’t invent MapReduce at all.</p>
<p>Should Hadoop come under fire, any defendants (or interveners like Yahoo and/or IBM) could have strong technical arguments over whether the open-source Hadoop even is an infringement. Then there is the question of money: Google has been making plenty of it without the patent, so why risk the legal and monetary consequences of losing any hypothetical lawsuit? Plus, Google supports Hadoop, which lets university students learn webscale programming (so they can become future Googlers) without getting access to Google’s proprietary MapReduce language.</p>
<p>[...]</p>
<p>A Google spokeswoman emailed this in response to our questions about why Google sought the patent, and whether or not Google would seek to enforce its patent rights, attributing it to Michelle Lee, Deputy General Counsel:</p>
<p>“Like other responsible, innovative companies, Google files patent applications on a variety of technologies it develops. While we do not comment about the use of this or any part of our portfolio, we feel that our behavior to date has been inline with our corporate values and priorities.”</p></blockquote>
<p>From <a href="http://bit.ly/67HA0e" target="_blank">Ars Technica</a>:</p>
<blockquote><p>Hadoop isn&#8217;t the only open source project that uses MapReduce technology. As some readers may know, I&#8217;ve recently been experimenting with CouchDB, an open source database system that allows developers to perform queries with map and reduce functions. Another place where I&#8217;ve seen MapReduce is Nokia&#8217;s QtConcurrent framework, an extremely elegant parallel programming library for Qt desktop applications.</p>
<p>It&#8217;s unclear what Google&#8217;s patent will mean for all of these MapReduce adopters. Fortunately, Google does not have a history of aggressive patent enforcement. It&#8217;s certainly possible that the company obtained the patent for &#8220;defensive&#8221; purposes. Like virtually all major software companies, Google is frequently the target of patent lawsuits. Many companies in technical fields attempt to collect as many broad patents as they can so that they will have ammunition with which to retaliate when they are faced with patent infringement lawsuits.</p>
<p>Google&#8217;s MapReduce patent raises some troubling questions for software like Hadoop, but it looks unlikely that Google will assert the patent in the near future; Google itself uses Hadoop for its Code University program.</p>
<p>Even if Google takes the unlikely course of action and does decide to target Hadoop users with patent litigation, the company would face significant resistance from the open source project&#8217;s deep-pocketed backers—including IBM, which holds the industry&#8217;s largest patent arsenal.</p>
<p>Another dimension of this issue is the patent&#8217;s validity. On one hand, it&#8217;s unclear if taking age-old principles of functional software development and applying them to a cluster constitutes a patentable innovation.</p></blockquote>
<p>Still nothing from the big analysts, Gartner and the gang&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/googles-mapreduce-patent-and-impact-on-hadoop-none-expected/articles/20100122/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Benchmarking the cloud: not simple</title>
		<link>http://www.netuality.ro/benchmarking-the-cloud-not-simple/datacenter/20100118</link>
		<comments>http://www.netuality.ro/benchmarking-the-cloud-not-simple/datacenter/20100118#comments</comments>
		<pubDate>Mon, 18 Jan 2010 07:02:31 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Datacenter]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[EBS]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[Rackspace]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=155</guid>
		<description><![CDATA[Understanding the impact of using virtualized servers instead of real ones is perhaps one of the most complex issues when migrating from a traditional configuration to a cloud-based setup. Especially because virtualized servers are created equal &#8230; but only on paper.
A Rackspace-funded &#8220;report&#8221; tries to find out the performance differences between Rackspace Cloud Servers and [...]]]></description>
			<content:encoded><![CDATA[<p>Understanding the impact of using virtualized servers instead of real ones is perhaps one of the most complex issues when migrating from a traditional configuration to a cloud-based setup. Especially because virtualized servers are created equal &#8230; but only on paper.</p>
<p>A Rackspace-funded &#8220;report&#8221; tries to find out <a href="http://www.thebitsource.com/2010/01/11/rackspace-cloud-servers-versus-amazon-ec2-performance-analysis/" target="_blank">the performance differences</a> between Rackspace Cloud Servers and Amazon EC2. I guess the only conclusion we can get from their so-called report is that Cloud Server disk throughput is better than EC2&#8217;s. As the &#8220;CPU test&#8221; is a kernel compile which also stresses the disk, I don&#8217;t think we can reliably get any conclusion from these.</p>
<p style="text-align: center;"><img class="size-full wp-image-156 aligncenter" title="rackspace_amazon_benchmark" src="http://www.netuality.ro/wp-content/uploads/2010/01/rackspace_amazon_benchmark.gif" alt="" width="600" height="275" /></p>
<p>An <a href="http://www.thebitsource.com/2010/01/11/rackspace-cloud-servers-versus-amazon-ec2-performance-analysis/#IDComment52135232" target="_blank">intrepid commenter</a> ran a CPU-only test (Geekbench) and found out that <a href="http://browse.geekbench.ca/geekbench2/view/203592" target="_blank">EC2</a> performs slightly better than <a href="http://browse.geekbench.ca/geekbench2/view/187589" target="_blank">Rackspace</a> in terms of raw processor performance. The same commenter, affiliated with <a href="http://cloudharmony.com/status" target="_blank">Cloud Harmony</a>,  mentions that a simple hdparm test shows that Rackspace hdd has more than twice the throughput of EC2 hdd, at least in terms of buffered reads. Last but not least, don&#8217;t forget that for better disk performance Amazon recommends <a href="http://blog.rightscale.com/2008/08/20/amazon-ebs-explained/" target="_blank">EBS</a> instead of the VM disk.</p>
<p>We cannot reliably make an informed cloud vendor choice just using VM benchmarks. Ideally, you should benchmark your own app on each cloud infrastructure and choose the one which gives you the best user-facing performance, because at the end of the day this is what matters most. Sadly, today this means experimenting with sometimes wildly different APIs and provisioning models.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/benchmarking-the-cloud-not-simple/datacenter/20100118/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>January 13 linkdump: KDD, EC2 congested, Coherence, Zimbra</title>
		<link>http://www.netuality.ro/january-13-linkdump-kdd-ec2-congested-coherence-zimbra/linkdump/20100113</link>
		<comments>http://www.netuality.ro/january-13-linkdump-kdd-ec2-congested-coherence-zimbra/linkdump/20100113#comments</comments>
		<pubDate>Wed, 13 Jan 2010 17:23:08 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Coherence]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[KDD]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Yahoo]]></category>
		<category><![CDATA[Zimbra]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=152</guid>
		<description><![CDATA[Call to arms for the annual ACM KDD Conference. KDD stands for Knowledge Discovery and Data Mining, so if you&#8217;re looking for some hardcore use cases and new algorithms to apply, this is definitely the place to be (Washington, July 25-28):
KDD-2010 will feature keynote presentations, oral paper presentations, 			poster sessions, workshops, tutorials, panels, exhibits, demonstrations, [...]]]></description>
			<content:encoded><![CDATA[<p>Call to arms for the annual <a href="http://www.kdd2010.com/" target="_blank">ACM KDD Conference</a>. KDD stands for Knowledge Discovery and Data Mining, so if you&#8217;re looking for some hardcore use cases and new algorithms to apply, this is definitely the place to be (Washington, July 25-28):</p>
<blockquote><p>KDD-2010 will feature keynote presentations, oral paper presentations, 			poster sessions, workshops, tutorials, panels, exhibits, demonstrations, 			and the KDD Cup competition.</p></blockquote>
<p>There&#8217;s rumor on the street that Amazon EC2 is over-subscribed. <a href="http://alan.blog-city.com/has_amazon_ec2_become_over_subscribed.htm#" target="_blank">From the trenches</a> it appears that their scalability is &#8230; well, duh &#8230; not infinite and elasticity is a tiny bit rigid:</p>
<blockquote><p>Anyone that uses virtualized computing, whether it is in the cloud or in their own private setup (VMWare for example) knows you take a performance hit. These performance hits can be considerable, but on the whole, are tolerable and can be built into an architecture from the start.</p>
<p>The problems that we are starting to see from Amazon, are more than just the overhead of a virtualized environment. They are deep rooted scalability problems at their end that need to be addressed sooner rather than later.</p></blockquote>
<p>My Adobe colleague <a href="http://horicky.blogspot.com" target="_blank">Ricky Ho</a> has <a href="http://horicky.blogspot.com/2010/01/notes-on-oracle-coherence.html" target="_blank">posted some notes on Oracle&#8217;s Coherence</a> (formerly Tangosol), a distributed Java cache rich in features. A great read especially if you want a technical intro to the product (code snippets and everything).</p>
<p>The acquisition of the day is <a href="http://paidcontent.org/article/419-confirmed-yahoo-sells-zimbra-to-vmware/" target="_blank">Zimbra being bought by VMWare</a>. Yahoo is selling Zimbra a loss, it seems. Analysts wonder what exactly is VMWare planning to do, well they&#8217;re probably going up the stack and working on providing their own cloud ecosystem and related services. &#8220;VMWare Applications&#8221;, soon?</p>
<blockquote><p>Under the terms of the agreement, Yahoo can continue to use Zimbra technology in its communications services.  					<a name="#keep_reading"></a> VMWare’s interest in Zimbra is a bit of a mystery since VMWare focuses on selling virtualization technology; in the release, VMWare offers somewhat of an explanation saying that the purchase furthers its “mission of taking complexity out of the datacenter, desktop, application development and core IT services”</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/january-13-linkdump-kdd-ec2-congested-coherence-zimbra/linkdump/20100113/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>January 12 linkdump: Reddit on Hadoop on steroids, Hadoop lessons learned</title>
		<link>http://www.netuality.ro/january-12-linkdump-reddit-on-hadoop-on-steroids-hadoop-lessons-learned/linkdump/20100112</link>
		<comments>http://www.netuality.ro/january-12-linkdump-reddit-on-hadoop-on-steroids-hadoop-lessons-learned/linkdump/20100112#comments</comments>
		<pubDate>Tue, 12 Jan 2010 18:25:23 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Linkdump]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[myNoSQL]]></category>
		<category><![CDATA[ReadPath]]></category>
		<category><![CDATA[Reddit]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=148</guid>
		<description><![CDATA[Great Hadoop story, and a great read too, from Lau Jensen on Best In Class blog:
Hadoop opens a world of fun with the promise of some heavy lifting and in order to feed the beast I’ve written a Reddit-scraper in just 30 lines of Clojure.
[...]
Now that we’re sitting with almost unlimited insight into the posts [...]]]></description>
			<content:encoded><![CDATA[<p>Great Hadoop story, and a great read too, from Lau Jensen on <a href="http://www.bestinclass.dk/index.php/2010/01/hadoop-feeding-reddit-to-hadoop/" target="_blank">Best In Class blog</a>:</p>
<blockquote><p>Hadoop opens a world of fun with the promise of some heavy lifting and in order to feed the beast I’ve written a Reddit-scraper in just 30 lines of Clojure.</p>
<p>[...]</p>
<p>Now that we’re sitting with almost unlimited insight into the posts which make Redditors tick, we can think of many stats that would be fun to compute. Since this is a tutorial I’ll go with the simplest version, ie. something like calculating total number of upvotes per domain/author, but for a future experiment it would be fun to pull out the top authors/posts and also scrape the URLs they link, categorizing them after content length, keywords, number of graphical elements etc, just to get the recipe for a succesful post.</p></blockquote>
<p>Alex Popescu has <a href="http://nosql.mypopescu.com/post/330657421/lessons-learned-from-using-hadoop-and-hbase-in" target="_blank">a few notes and questions</a> about <a href="http://www.readpath.com/" target="_blank">ReadPath</a> <a href="http://blog.readpath.com/2009/12/28/hadoop-and-hbase-in-production/" target="_blank">usage of Hadoop</a> in production:</p>
<blockquote><p>If you thought using NoSQL solutions would automatically address and solve backup and restore policies, you were wrong. [...]</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/january-12-linkdump-reddit-on-hadoop-on-steroids-hadoop-lessons-learned/linkdump/20100112/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>M/R vs DBMS benchmark paper rebutted</title>
		<link>http://www.netuality.ro/mr-vs-dbms-benchmark-paper-rebutted/articles/20100107</link>
		<comments>http://www.netuality.ro/mr-vs-dbms-benchmark-paper-rebutted/articles/20100107#comments</comments>
		<pubDate>Thu, 07 Jan 2010 06:53:37 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[ACM]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Map/Reduce]]></category>
		<category><![CDATA[RDBMS]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=146</guid>
		<description><![CDATA[In a recent ACM article, Jeffrey Dean and Sanjay Ghemawat are discussing some pitfalls in the Hadoop vs DBMS comparison benchmarks that I&#8217;ve mentioned in one of my previous posts. They are clarifying three M/R misconceptions from the article:

MapReduce cannot use indexes and implies a full scan of all       [...]]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://cacm.acm.org/browse-by-subject/data-storage-and-retrieval/55744-mapreduce-a-flexible-data-processing-tool/fulltext" target="_blank">recent ACM article</a>, Jeffrey Dean and Sanjay Ghemawat are discussing some pitfalls in the <a href="http://www.netuality.ro/hadoop_map_reduce_vs_dbms_benchmarks/articles/20100103" target="_blank">Hadoop vs DBMS comparison benchmarks</a> that I&#8217;ve mentioned in one of my previous posts. They are clarifying three M/R misconceptions from the article:</p>
<ul>
<li>MapReduce cannot use indexes and implies a full scan of all               input data;</li>
<li>MapReduce input and outputs are always simple files in a file               system;</li>
<li>MapReduce requires the use of inefficient textual data               formats.</li>
</ul>
<p>and also they emphasize some Hadoop strong points not covered by the benchmark paper.</p>
<p>The biggest drawback which is lack of indexes, while partially compensated in certain use cases by the range query feature, is typically solved by using an external indexing service such as Lucene/SOLR or even a dedicated RDBMS. One can employ vertical and horizontal sharding techniques on indexes in order to answer queries on these pre-canned indexes, instead of scanning the whole data-set as the authors of the comparison paper imply.</p>
<p>Some performance assumptions are also discussed in the second part of the paper. While the benchmarks results were not challenged per se, here&#8217;s Jeffrey and Sanjay&#8217;s conclusion:</p>
<p><em>&#8220;In our experience,               MapReduce is a highly effective and efficient tool for               large-scale fault-tolerant data analysis.</em></p>
<p><em>[...]</em></p>
<p><em>MapReduce provides many significant advantages over parallel               databases. First and foremost, it provides fine-grain fault               tolerance for large jobs; failure in the middle of a multi-hour               execution does not require restarting the job from scratch.               Second, MapReduce is very useful for handling data processing and               data loading in a heterogenous system with many different storage               systems. Third, MapReduce provides a good framework for the               execution of more complicated functions than are supported               directly in SQL.&#8221;</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/mr-vs-dbms-benchmark-paper-rebutted/articles/20100107/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How big is your meat cloud? The golden number for servers</title>
		<link>http://www.netuality.ro/how-big-is-your-meat-cloud-the-golden-number-for-servers/datacenter/20100105</link>
		<comments>http://www.netuality.ro/how-big-is-your-meat-cloud-the-golden-number-for-servers/datacenter/20100105#comments</comments>
		<pubDate>Tue, 05 Jan 2010 16:16:42 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Datacenter]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[servers]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=134</guid>
		<description><![CDATA[Just went through a recent thread on Slashdot discussing &#8220;how many admins per user computer&#8221; or how many desktops per admin to be more specific. While the client desktop subject is totally uninteresting, I found in the comment noise a few interesting tidbits about the meat cloud size in different server environments.
On the low non-automated [...]]]></description>
			<content:encoded><![CDATA[<p>Just went through a <a href="http://ask.slashdot.org/story/09/12/30/148224/How-Many-Admins-Per-UserComputer-Have-You-Seen?art_pos=3" target="_blank">recent thread on Slashdot</a> discussing &#8220;how many admins per user computer&#8221; or how many desktops per admin to be more specific. While the client desktop subject is totally uninteresting, I found in the comment noise a few interesting tidbits about the meat cloud size in different <em>server</em> environments.</p>
<p>On the low non-automated end there were figures <a href="http://ask.slashdot.org/comments.pl?sid=1493436&amp;cid=30594440" target="_blank">such as</a> &#8220;1 admin per 70 Linux boxes or 30 Windows machines&#8221; (are Windows servers really twice as dificult to manage than Linux servers?) &#8211; confirmed by another commenter working for a <a href="http://ask.slashdot.org/comments.pl?sid=1493436&amp;cid=30594496" target="_blank">Government facility</a>. Of course, it depends on how many different hardware brands and software services you have to manage&#8230;</p>
<p>Another allegedly 12-year experienced sysadmin <a href="http://ask.slashdot.org/comments.pl?sid=1493436&amp;cid=30594832" target="_blank">commented</a> that the larger the organization, the bigger the ratio. Going from 50 server per sysadmin on small organizations to 250 on corporations (but his company revenue &#8220;definitions&#8221; are a bit weird). An insightful comment <a href="http://ask.slashdot.org/comments.pl?sid=1493436&amp;cid=30596052" target="_blank">mentions</a> Facebook&#8217;s Jeff Rotschild according to which Facebook has roughly 130 servers per admin or (interesting metric) 1 million or more users per engineer.</p>
<p>Of course in specific cases this number can go way higher. Especially when you have to deal with quasi-identical hardware and software configurations running in a very large cluster. On the extreme scale there&#8217;s the <a href="http://www.datacenterknowledge.com/inside-microsofts-chicago-data-center/" target="_blank">Microsoft container data center in Chicago</a> which supposedly has a total of 30 employees supporting some 300,000 servers. That&#8217;s 10,000 servers/employee! At this point I suspect they basically only change faulty hardware and wire new capacity when needed, everything else should be fully automated.</p>
<p style="text-align: center;"><img class="alignnone size-full wp-image-141" title="golden_number_sysadmin" src="http://www.netuality.ro/wp-content/uploads/2010/01/golden_number_sysadmin1.gif" alt="" width="472" height="277" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/how-big-is-your-meat-cloud-the-golden-number-for-servers/datacenter/20100105/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadoop Map/Reduce versus DBMS, benchmarks</title>
		<link>http://www.netuality.ro/hadoop_map_reduce_vs_dbms_benchmarks/articles/20100103</link>
		<comments>http://www.netuality.ro/hadoop_map_reduce_vs_dbms_benchmarks/articles/20100103#comments</comments>
		<pubDate>Sun, 03 Jan 2010 19:40:13 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Map/Reduce]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Vertica]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/?p=123</guid>
		<description><![CDATA[Here’s a recent benchmark published at SIGMOD ’09 by a team of researchers and students from Brown, M.I.T. and Wisconsin-Madison universities. The details of their setup here and this is the paper (PDF).
They ran a few simple tasks such as loading, „grepping” (as described in the original M/R paper), aggregation, selection and join on a [...]]]></description>
			<content:encoded><![CDATA[<p>Here’s a recent benchmark published at SIGMOD ’09 by a team of researchers and students from Brown, M.I.T. and Wisconsin-Madison universities. The details of their setup <a href="http://database.cs.brown.edu/projects/mapreduce-vs-dbms/" target="_blank">here</a> and <a href="http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf" target="_blank">this is the paper</a> (PDF).</p>
<p>They ran a few simple tasks such as loading, „grepping” (as described in the original M/R paper), aggregation, selection and join on a total of 1TB of data. On the same 100-nodes RedHat cluster they compared <a href="http://en.wikipedia.org/wiki/Vertica" target="_blank">Vertica</a> (a well-known MPP), „plain” Hadoop with custom-coded Map/Reduce tasks and an unnamed DBMS-X (probably Oracle Exadata, which is mentioned in the article).</p>
<p>The final result shows Vertica and DBMS-X being (not astonishing at all!) 2, respectively 3 times faster than the brute M/R approach. What they also mention is that Hadoop was surprisingly easy to install and run, while the DBMS-X installation process was a relatively complex one, followed by tuning. Parallel databases were using space more efficiently due to compression, while Hadoop needed at least 3 times the space due to redundancy mechanism. A good point for Hadoop was the failure model allowing for quick recovery from faults and uninterrupted long-running jobs.</p>
<p><img class="size-full wp-image-128  alignnone" title="grep_benchmark" src="http://www.netuality.ro/wp-content/uploads/2010/01/grep_benchmark.gif" alt="" width="600" height="249" /></p>
<p>The authors recommend parallel DBMS-es against „brute force” models.   “<em>[…] we are wary of devoting huge computational clusters and “brute force” approaches to computation when sophisticated software would could do the same processing with far less hardware and consume far less energy, or in less time, thereby obviating the need for a sophisticated fault tolerance model. A multithousand- node cluster of the sort Google, Microsoft, and Yahoo! run uses huge amounts of energy, and as our results show, for many data processing tasks a parallel DBMS can often achieve the same performance using far fewer nodes. As such, the desirable approach is to use high-performance algorithms with modest parallelism rather than brute force approaches on much larger clusters.</em>”</p>
<p>What do you think, dear reader? I would be curious to see the same benchmark replicated on other NoSQL systems. Also, I find 1TB too low for most web-scale apps today.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/hadoop_map_reduce_vs_dbms_benchmarks/articles/20100103/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google: sorry, but Lisp/Ruby/Erlang not on the menu</title>
		<link>http://www.netuality.ro/google-sorry-but-lisprubyerlang-not-on-the-menu/tools/20080529</link>
		<comments>http://www.netuality.ro/google-sorry-but-lisprubyerlang-not-on-the-menu/tools/20080529#comments</comments>
		<pubDate>Wed, 28 May 2008 21:35:00 +0000</pubDate>
		<dc:creator>Adrian</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[Erlang]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://www.netuality.ro/google-sorry-but-lisprubyerlang-not-on-the-menu/tools/20080529</guid>
		<description><![CDATA[Yes, language propaganda again. Ain&#8217;t it fun ?
Here comes a nice quote from the latest Steve Yegge post (read it entirely if you have the time, it&#8217;s both fun and educational &#8211; at least for me). So, there:
I made the famously, horribly, career-shatteringly bad mistake of trying to use Ruby at Google, for this project. [...]]]></description>
			<content:encoded><![CDATA[<p>Yes, language propaganda again. Ain&#8217;t it fun ?</p>
<p>Here comes a nice quote from the <a href="http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html" target="_blank">latest Steve Yegge post</a> (read it entirely if you have the time, it&#8217;s both fun and educational &#8211; at least for me). So, there:</p>
<p><em>I made the famously, horribly, career-shatteringly bad mistake of trying to use Ruby at Google, for this project. And I became, very quickly, I mean almost overnight, the Most Hated Person At Google. And, uh, and I&#8217;d have arguments with people about it, and they&#8217;d be like Nooooooo, WHAT IF&#8230; And ultimately, you know, ultimately they actually convinced me that they were right, in the sense that there actually <strong><em>were</em> a few things</strong>. There were some taxes that I was imposing on the systems people, where they were gonna have to have some maintenance issues that they wouldn&#8217;t have. [...] But, you know, <strong>Google&#8217;s all about getting stuff done</strong>.</em></p>
<p><em>[...]</em></p>
<p><em><strong>Is it allowed at Google to use Lisp and other languages?</strong></p>
<p>No. No, it&#8217;s not OK. <strong>At Google you can use C++, Java, Python, JavaScript</strong>&#8230; I actually found a legal loophole and used server-side JavaScript for a project.</em></p>
<p>Mmmmm &#8230; key ?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.netuality.ro/google-sorry-but-lisprubyerlang-not-on-the-menu/tools/20080529/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
