Netuality

Taming the big, bad, nasty websites

Archive for the ‘Linkdump’ Category

January 30 linkdump: cloud, cloud, cloud

leave a comment

Yes there is such a thing as cloud management services and Cloudkick has a business model around them:

The San Francisco company’s existing features — including a dashboard with an overview of your cloud infrastructure, email alerts, and graphs that you help you visualize data like bandwidth requirements — will always be free, said co-founder and chief executive Alex Polvi. But Cloudkick wants to charge for features on top of the basic service, such as SMS alerts when your app has problems and a change-log tool where sysadmins can communicate with each other, which Polvi described as “Twitter for servers.”

Great article on designing applications for the cloud from Godjo Adzic who spent his last two years in projects deployed on the Amazon cloud:

A very healthy way to look at this is that all your cloud applications will run on a bunch of cheap web servers. It’s healthy because planning for that in advance will help you keep your mental health when glitches occur, and it will also force you to design for machine failure upfront making the system more resilient.

Royans blog comments James Hamilton critical post about private clouds not being the future:

Though I believe in most of his comments, I’m not convinced with the generalization of the conclusions. In particular, what is the maximum number of servers one need to own, beyond which outsourcing will become a liability. I suspect this is not a very high number today, but will grow over time.

And a good detailed article about Hive used at Facebook:

Facebook has a production Hive cluster which is primarily used for log summarization, including aggregation of impressions, click counts and statistics around user engagement. They have a separate cluster for “Ad hoc analysis” which is free for all/most Facebook employees to use. And over time they figured out how to use it for spam detection, ad optimization and a host of other undocumented stuff.

Written by Adrian

January 30th, 2010 at 11:44 pm

Posted in Linkdump

Tagged with , , ,

January 23 linkdump: grids, BuddyPoke and the state of Internet

leave a comment

On Enterprise Storage a few experts look at grid computing and the future of cloud computing.

Can cloud computing succeed where grid failed and find widespread acceptance in enterprise data centers? And is there still room for grid computing in the brave new world of cloud computing? We asked some grid computing pioneers for their views on the issue.

[...]

And when it comes to IaaS [infrastructure as a service], I think in five years something like 80 to 90 percent of the computation we are doing could be cloud-based.

BuddyPoke cofounder Dave Westwood explains on the High Scalability blog how they achieved viral scale, Facebook viral scale to be more specific. BuddyPoke is today entirely hosted on GAE (Google AppEngine) and they some great insights and lessons learned.

On the surface BuddyPoke seems simple, but under hood there’s some intricate strategy going on. Minimizing costs while making it scale and perform is not obvious. Who does what, when, why and how takes some puzzling out. It’s certainly an approach a growing class of apps will find themselves using in the future.

Jamesh Varia from Amazon wrote a great Architecting for the Cloud: Best Practices [PDF] paper:

This paper is targeted towards cloud architects who are gearing up to move an enterprise-class application from a fixed physical environment to a virtualized cloud environment. The focus of this paper is to highlight concepts, principles and best practices in creating new cloud applications or migrating existing applications to the cloud.

The AWS cloud offers highly reliable pay-as-you-go infrastructure services. The AWS-specific tactics highlighted in the paper will help design cloud applications using these services. As a researcher, it is advised that you play with these commercial services, learn from the work of others, build on the top, enhance and further invent cloud computing.

The Pingdom guys have another fantastic post on their blog about the state of Internet in 2009:

  • 90 trillion – The number of emails sent on the Internet in 2009.
  • 92% – Peak spam levels late in the year.
  • 13.9% – The growth of Apache websites in 2009.
  • -22.1% – The growth of IIS websites in 2009.

These and more interesting statistics in their blog post.

Written by Adrian

January 23rd, 2010 at 1:20 pm

Posted in Linkdump

Tagged with , , ,

January 13 linkdump: KDD, EC2 congested, Coherence, Zimbra

leave a comment

Call to arms for the annual ACM KDD Conference. KDD stands for Knowledge Discovery and Data Mining, so if you’re looking for some hardcore use cases and new algorithms to apply, this is definitely the place to be (Washington, July 25-28):

KDD-2010 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition.

There’s rumor on the street that Amazon EC2 is over-subscribed. From the trenches it appears that their scalability is … well, duh … not infinite and elasticity is a tiny bit rigid:

Anyone that uses virtualized computing, whether it is in the cloud or in their own private setup (VMWare for example) knows you take a performance hit. These performance hits can be considerable, but on the whole, are tolerable and can be built into an architecture from the start.

The problems that we are starting to see from Amazon, are more than just the overhead of a virtualized environment. They are deep rooted scalability problems at their end that need to be addressed sooner rather than later.

My Adobe colleague Ricky Ho has posted some notes on Oracle’s Coherence (formerly Tangosol), a distributed Java cache rich in features. A great read especially if you want a technical intro to the product (code snippets and everything).

The acquisition of the day is Zimbra being bought by VMWare. Yahoo is selling Zimbra a loss, it seems. Analysts wonder what exactly is VMWare planning to do, well they’re probably going up the stack and working on providing their own cloud ecosystem and related services. “VMWare Applications”, soon?

Under the terms of the agreement, Yahoo can continue to use Zimbra technology in its communications services. VMWare’s interest in Zimbra is a bit of a mystery since VMWare focuses on selling virtualization technology; in the release, VMWare offers somewhat of an explanation saying that the purchase furthers its “mission of taking complexity out of the datacenter, desktop, application development and core IT services”

Written by Adrian

January 13th, 2010 at 8:23 pm

Posted in Linkdump

Tagged with , , , , , ,

January 12 linkdump: Reddit on Hadoop on steroids, Hadoop lessons learned

leave a comment

Great Hadoop story, and a great read too, from Lau Jensen on Best In Class blog:

Hadoop opens a world of fun with the promise of some heavy lifting and in order to feed the beast I’ve written a Reddit-scraper in just 30 lines of Clojure.

[...]

Now that we’re sitting with almost unlimited insight into the posts which make Redditors tick, we can think of many stats that would be fun to compute. Since this is a tutorial I’ll go with the simplest version, ie. something like calculating total number of upvotes per domain/author, but for a future experiment it would be fun to pull out the top authors/posts and also scrape the URLs they link, categorizing them after content length, keywords, number of graphical elements etc, just to get the recipe for a succesful post.

Alex Popescu has a few notes and questions about ReadPath usage of Hadoop in production:

If you thought using NoSQL solutions would automatically address and solve backup and restore policies, you were wrong. [...]

Written by Adrian

January 12th, 2010 at 9:25 pm

Posted in Linkdump

Tagged with , , ,