HBase at Netuality

Taming the big, bad, nasty websites

Archive for the ‘HBase’ tag

And now for something a little different: graphviz candy

Linkdump: using Hbase, CAP visuals, Farmville and more

Linkdump: Cassandra lovers, blowing the circuit breaker and Oracle clouds

Good points (as always) on Alexandru’s blog discussing the SQL scalability isn’t for everyone topic.

NoSQL as RDBMS are just tools for our job and there is nothing about the death of one of the other. But as we’ve learned over years, every new programming language is the death of all its precursors, every new programming paradigm is the death of everything that existed before and so on. The part that some seem to be missing or ignoring deliberately is that in most of these cases this death have never really happened.

For large-scale performance testing of a production environment check out how Facebook MySpace simulated 1 million concurrent users with a huge EC2 cluster, described on the High Scalability blog. While the article is a guest post from a company selling “cloud testing” solutions and has a bit of “sales juice” in it, it’s still a very good read:

Large-scale testing using EC2

Someone is in love with Cassandra after only 4 months. Hoping Cassandra doesn’t get too fat after the wedding:

Traditional sharding and replication with databases like MySQL and PostgreSQL have been shown to work even on the largest scale websites — but come at a large operational cost. Setting up replication for MySQL can be done quickly, but there are many issues you need to be aware of, such as slave replication lag. Sharding can be done once you reach write throughput limits, but you are almost always stuck writing your own sharding layer to fit how your data is created and operationally, it takes a lot of time to set everything up correctly. We skipped that step all together and added a couple hooks to make our data aggregation service siphon to both PostgreSQL and Cassandra for the initial integration.

Distributed data war stories from Anders @ bandwidth.com, HBase and Hadoop on commodity hardware:

As mentioned before, the commodity machines I used were very basic but I was able to insert conservatively about 500 records per second with this setup. I kept blowing the circuit breaker at the office as well forcing me to spread the machines across several power circuits but it proved that the system was at least fault tolerant!

SourceForge chooses Python, TurboGears and … MongoDB for a new version of their website. Looks like Mongo is becoming quite mainstream.

Don’t believe the rumors, Oracle is into cloud computing after all – at least according to Forrester. Well, as long as the clouds are private. And as long as you can live with “coming soon” tooling. And it’s not like they really have a clear long-term strategy for cloud computing:

I believe that cloud is a revolution for Oracle, IBM, SAP, and the other big vendors with direct sales forces (despite what they say). Cloud computing has the potential to undermine the account-management practices and pricing models these big companies are founded on. I think it will take years for each of the big vendors to adapt to cloud computing. Oracle is just beginning this journey; I think other vendors are further down the track.

The igvita blog hits NoSQL in the groin by showing a simple way of having a schema-free data store … in MySQL. It’s a sort of proxy that translates schemas into denormalized data placed in distinct tables:

Instead of defining columns on a table, each attribute has its own table (new tables are created on the fly), which means that we can add and remove attributes at will. In turn, performing a select simply means joining all of the tables on that individual key. To the client this is completely transparent, and while the proxy server does the actual work, this functionality could be easily extracted into a proper MySQL engine – I’m just surprised that no one has done so already.

While an interesting idea, not sure how effective this will be in practice, as joins are among the most time-consuming operations in the database world. I’m pretty sure that replacing a 10-column table get on the primary key with joins on 10 tables will add an important overhead.

Written by Adrian

March 4th, 2010 at 9:31 pm

Posted in Linkdump

Tagged with Cassandra, EC2, Hadoop, HBase, testing

About

My name is Adrian Spinei and I am a software professional from Bucharest, Romania. My work in the last few years was mostly in the SaaS Infrastructure and Services. I am passionate about large, scalable web applications and their related challenges.

Follow me on Twitter

RT @DanaDanger: HTTP response codes for dummies. 50x: we fucked up. 40x: you fucked up. 30x: ask that dude over there. 20x: cool. 2012/03/24
Linus turned down an offer from Jobs http://t.co/dDyVnRCc 2012/03/22
RT @edward_ribeiro: The list of accepted papers @Sigmod2012 is officially out: http://t.co/jO9Db3Gs 2012/03/16
Android and iOS developer jobs demand - the infographic http://t.co/ZLABEPKq 2012/03/16
RT @cloudera: Learn how to real-time your #Hadoop at #HBaseCon, May 22 http://t.co/QSdYUA2f #HBase 2012/03/15
Evolution: dll hell -> jar hell -> js hell 2012/03/15
is @pypi down? 2012/03/11
For the record django-mongodb-engine >> django-mongoengine - more djangoistic and has a quasi-working admin (as opposed to no admin) 2012/03/10
RT @wattersjames: RT @newsycombinator: npm (Node's package manager) leaks all user password hashes and salts http://t.co/Z8HtU8rF <-n ... 2012/03/08
RT @rk: 6+ years after launching, @dhh realizes that Rails is slow: http://t.co/mQ449X90 via @evan. 2012/02/17

Netuality

Archive for the ‘HBase’ tag

And now for something a little different: graphviz candy

Linkdump: using Hbase, CAP visuals, Farmville and more

Linkdump: Cassandra lovers, blowing the circuit breaker and Oracle clouds

About

Recent Posts

Recent Comments

Follow me on Twitter

Archives