Netuality

Taming the big, bad, nasty websites

Archive for the ‘Twitter’ tag

Linkdump: leaner meaner MySQL, gulping from the data buffet and lessons learned at Reddit

leave a comment

The Percona guys are pleading for a MySQL strongly optimized for a single type of storage engine:

We could save a lot of CPU cycles by having storage format same as processing format. We could tune Optimizer to handle Innodb specifics well. We could get rid of SQL level table locks and using Innodb internal data dictionary instead of Innodb files. We would use Innodb transactional log for replication (which could be extended a bit for this purpose). Finally backup can be done in truly hot way without nasty “FLUSH TABLE WITH READLOCK” and hoping nobody is touching “mysql” database any more. Single Storage Engine server would be also a lot easier to test and operate.

This also would not mean one has to give up flexibility completely, for example one can imagine having Innodb tables which do not log the changes, hence being faster for update operations.

Looks like Twitter data buffet is back in business. Some of the data is free. Enjoy it with moderation: too much data can make you slow.

Reddit‘s Steve Huffman gives a talk at Web Apps Miami 2010. Self-healing, separation of services, be stateless and cache like crazy, redundancy and yes, a little bit of Hadoop (Amazon’s Hadoop is Elastic Map Reduce). Read the full transcript on Carsonified:

We’ve actually been using Hadoop, Amazon’s Hadoop implementation to compute awards. If we need to do a complicated query like that, we store the data, we dump our database, or at the right time we store it in a way that will make those joins possible down the road. That being said; we’ve tried to avoid doing joins as much as possible, and when the data comes in we store it in the way we’re going to need it. That’s worked much better than trying to do it at run time.

Written by Adrian

May 10th, 2010 at 8:58 pm

Posted in Linkdump

Tagged with , ,

Linkdump: Twitter, Twitter, CAP and … iPad

leave a comment

Well, not all Twitter runs on Cassandra :) Alex Payne explains how they build Hawkwind, a distributed search system written in Scala. Take a look at the slide 18, where you can clearly see that they use HBase as backend:



Also from the great guys at Twitter: gizzard. Interesting and appropriate name for a database sharding framework. Gizzard uses range-based partitioning and replication tree and knows to rely on a large range of data stores: RDBMSes, Lucene or Redis – you name it. But I wonder about the operational overhead when you have a really large gizzard cluster.

Michael Stonebraker has a short essay on CAP published in the ACM blogs. He identifies a series of use cases where the CAP theorem simply does not apply and cannot be appealed to for guidance:

Obviously, one should write software that can deal with load spikes without failing; for example, by shedding load or operating in a degraded mode. Also, good monitoring software will help identify such problems early, since the real solution is to add more capacity. Lastly, self-reconfiguring software that can absorb additional resources quickly is obviously a good idea.

In summary, one should not throw out the C so quickly, since there are real error scenarios where CAP does not apply and it seems like a bad tradeoff in many of the other situations.

Great nosqlEu coverage on Alex Popescu’s blog MyNoSQL. Read it to get all the presentations, tons of links and Twitter quotes.

Because every self-respecting blog should mention some info about the newly released iPad, here’s mine. According to the O’Reilly Radar, iPad is not ready for the cloud integration:

I am hoping for a future where all I need to supply a device with is my identity, and everything else falls into place. This doesn’t even have to be me trusting in a third-party cloud: there’s no reason similar mechanisms couldn’t be used privately in a home network setting.

I think the iPad is an amazing piece of hardware, and the most pleasant web browsing experience available. It is still very much a 1.0 device though, and its best days certainly lie ahead of it. I hope part of that improvement is a simple story for synchronization and cloud access.

Guess I’ll be waiting for the release of iPad Pro:

Written by Adrian

April 21st, 2010 at 11:24 pm

Posted in Linkdump

Tagged with , , ,

Linkdump: Cassandra @Twitter, Forrester not grokking NoSQL

one comment

Seven signs you need to accept NoSQL in your life according to the High Scalability blog. I especially like sign #6 “Maintaining a completely separate object caching system on top of an already beefy table storage system“. There are companies making serious bucks from selling exactly this type of caching systems. I find that a bit ironic, don’t you?

Twitter has just decided to adopt Cassandra as their main storage. I roughly estimated the status table to having  more than 9 billion rows – it’s a good table size to start thinking about the benefits of NoSQL. I would have been interested in seeing a comparison with other existing solutions and a rationale of their choice. According to some sources, Ryan King rejected HBase because if a region server is down, writes will be blocked for affected data until the data is redistributed – unlike Cassandra’s “write never fail” policy. According to other sources, this will be solved in a future version of HBase but I think Twitter needed a solution sooner rather than later. I hope for two things:

  • That the Twitter dudes will blog about their migration experience
  • That I’ll be able to access and search through all my older tweets, fer’ God sake!

Forrester Research thinks that NoSQL and Elastic Caching Platforms are very similar. So similar that “NoSQL Wants To Be Elastic Caching When It Grows Up“. According to Forrester “Ultimately, the real difference between NoSQL and elastic caching now may be in-memory versus persistent storage on disk.” Yeah sure: transactions, durability, indexing, security model – who needs this crap anyway?

Oh and let’s not forget about today’s GAE unscheduled downtime. Waiting forward for the post mortem, for sure there will be a thing or two to learn…

Written by Adrian

February 24th, 2010 at 11:18 pm

Posted in Linkdump

Tagged with , , ,