Netuality http://www.netuality.ro Taming the big bad websites Mon, 22 Jul 2013 11:00:00 +0000 en hourly 1 http://wordpress.org/?v=3.3.1 Rezumat Twitter 22-07-2013 http://www.netuality.ro/rezumat-twitter-22-07-2013/linkdump/20130722/?utm_source=rss&utm_medium=rss&utm_campaign=rezumat-twitter-22-07-2013 http://www.netuality.ro/rezumat-twitter-22-07-2013/linkdump/20130722/#comments Mon, 22 Jul 2013 11:00:00 +0000 Adrian http://www.netuality.ro/rezumat-twitter-22-07-2013/linkdump/20130722/
  • My next HTC phone is HTC None. Or HTC None Max, for the infinite battery. ->
  • 3D printing will explode in 2014, thanks to the expiration of key patents (via @Pocket) http://t.co/xNenVO6sq7 ->
  • RT @mirceap: It's back! Comedians In Cars Getting Coffee by Jerry Seinfeld – "Kids Need Bullying" with Chris Rock http://t.co/9P2THnXRbO ->
  • 1 gram per kilo. Come on Dubai, you can do better. ->
  • RT @stevesilberman: 15 years after a bogus #autism scare, a plague of measles descends on a generation. http://t.co/1JrGWby8OE ->
  • Running both Apache and lighty, because you cannot have too many http servers… ->
  • Playing with my new cheap Synology storage. It's got postgres running on an ARM v5 CPU (Feroceon :) ) And Python 2.7.3… ->
  • RT @lrz: http://t.co/mNpRyGL3ZN should redirect to http://t.co/MBWpDtOmRt, it’s better maintained ->
  • Cheers To You Google And Microsoft: Ballmer And Page Walked Into A Bar… http://t.co/uFOoDQBoCa ->
  • RT @verge: VLC for iOS returns to App Store, plays all your video formats for free http://t.co/JtejcbZFWF ->
  • Smartphone sales by platform

    Found at http://t.co/wBUXoVDZLD http://t.co/WoU31HGDW7 ->

  • RT @horatzica: Google Play Books Now Available In Romania http://t.co/rmoMVNxgBm via @AndroidPolice ->
  • In a packet loss situation, blame PRISM. ->
  • If you look hard enough into the metadata, the metadata will look back to you ->
  • RT @verge: Google Maps arrives on iPad with 2.0 update http://t.co/2l222ddyjP ->
  • RT @joshuatopolsky: Photo: Vroom. http://t.co/IVlODmyMIq ->
  • ]]>
    http://www.netuality.ro/rezumat-twitter-22-07-2013/linkdump/20130722/feed/ 0
    Rezumat Twitter 15-07-2013 http://www.netuality.ro/rezumat-twitter-15-07-2013-2/linkdump/20130716/?utm_source=rss&utm_medium=rss&utm_campaign=rezumat-twitter-15-07-2013-2 http://www.netuality.ro/rezumat-twitter-15-07-2013-2/linkdump/20130716/#comments Tue, 16 Jul 2013 11:00:00 +0000 Adrian http://www.netuality.ro/rezumat-twitter-15-07-2013-2/linkdump/20130716/ RT @timpratt: Any book you pick up, really, best to assume it was secretly written by J.K. Rowling. Buy accordingly. -> RT @gabiagu: Myth busted. What swords really sound like when they are [...]]]>
  • Ok folks, we've all read the latest from Paul Graham. You can stop re-tweeting and go back to your normal activities. ->
  • RT @timpratt: Any book you pick up, really, best to assume it was secretly written by J.K. Rowling. Buy accordingly. ->
  • RT @gabiagu: Myth busted. What swords really sound like when they are unsheathed. http://t.co/RWuRD67N5p (via @SerbanCarjan ) ->
  • RT @timbray: It’s not generally a good idea to store important information in a browser tab. ->
  • RT @codinghorror: "The first problem is that 10x more people think that they are 10xers than actually are" http://t.co/ZiRpsqTwlI ->
  • RT @Moochava: Yearly reminder: unless you're over 60, you weren't promised flying cars. You were promised an oppressive cyberpunk dystopia.… ->
  • RT @mariusgheorghe: The real Iron Throne http://t.co/Wmwhah9D3Q #got . Looks bad ass and …uncomfortable. ->
  • RT @NeinQuarterly: A gentle reminder that ö is the German emoticon used to express shock at your ignorance of dialectical materialism. ->
  • Travelling through Europe? This is crucial information: http://t.co/ODcMWdE2Lz ->
  • RT @johnallsopp: "In five to 10 years coders wonʼt even be needed". people who cant' develop. Every day. For the last 60 years. ->
  • "It's time for rich people to gentrify Android." http://t.co/CRLDg1LI9O ->
  • RT @prymal81: Galaxy Note III photo just leaked – http://t.co/yLFhk3zIXW ->
  • ]]>
    http://www.netuality.ro/rezumat-twitter-15-07-2013-2/linkdump/20130716/feed/ 0
    Links of the week: moving away from the Oracles http://www.netuality.ro/links-of-the-week-moving-away-from-the-oracles/linkdump/20120608/?utm_source=rss&utm_medium=rss&utm_campaign=links-of-the-week-moving-away-from-the-oracles http://www.netuality.ro/links-of-the-week-moving-away-from-the-oracles/linkdump/20120608/#comments Fri, 08 Jun 2012 20:17:23 +0000 Adrian http://www.netuality.ro/?p=256 Siddharth Annand from LinkedIn explains in an interview how to move away from the Oracles to the wonderful world of open-source and NoSQL. Now I’m waiting for them to opensource Databus and Espresso. And “Oracle is not web scale” – sorry but I had to quote this.

    It seems that HAProxy has restarted development with a nice frequency, no less than one release each 10 days recently. I love this software, used it and will use it again. And to quote the author “Users should really upgrade, as I don’t want to waste time trying to spot stupid bugs in configs that are notoriously broken.“. Amen, bro! Always Be Upgrading.

    Not related to scalability or anything, news of the week it’s about the Flame “Middle Eastern” malware:

    These are the real pros at work. One can only wonder how many pieces of software like Flame are harbored by our innocent laptops.

    In order to avoid ending on such a nasty note, here’s an autotune clip which I’m sure you’ll greatly enjoy:

    Click here to view the embedded video.

    ]]>
    http://www.netuality.ro/links-of-the-week-moving-away-from-the-oracles/linkdump/20120608/feed/ 0
    Joke of the day: Amazon Cloud Drive app … http://www.netuality.ro/joke-of-the-day-amazon-cloud-drive-app/tools/20120505/?utm_source=rss&utm_medium=rss&utm_campaign=joke-of-the-day-amazon-cloud-drive-app http://www.netuality.ro/joke-of-the-day-amazon-cloud-drive-app/tools/20120505/#comments Sat, 05 May 2012 21:24:17 +0000 Adrian http://www.netuality.ro/?p=253 … which according to VentureBeat and various media outlets is a Dropbox competitor that “allows you to seamlessly drag and drop files from your computer to the cloud with little work“. Not!

    Guess what, it’s about file synchronization. File. Synchronization. Fully-fledged both-ways file syncing from all my laptops and devices to the cloud and back, not just “uploading a file to the cloud with little work“.

    I guess the new SkyDrive and Google Drive tools are putting a lot of pressure, but if this app is the answer then it’s a bad one. Come on, Amazon, you can do better!

     

    ]]>
    http://www.netuality.ro/joke-of-the-day-amazon-cloud-drive-app/tools/20120505/feed/ 0
    Linkdump: nodevincing the boss, and probabilistic data structures http://www.netuality.ro/linkdump-nodevincing-the-boss-and-probabilistic-data-structures/linkdump/20120504/?utm_source=rss&utm_medium=rss&utm_campaign=linkdump-nodevincing-the-boss-and-probabilistic-data-structures http://www.netuality.ro/linkdump-nodevincing-the-boss-and-probabilistic-data-structures/linkdump/20120504/#comments Fri, 04 May 2012 20:23:27 +0000 Adrian http://www.netuality.ro/?p=243 Great pragmatic considerations in Felix Geisendörfer’s Node.js: convincing the boss guide:

    • Not for CPU-heavy apps
    • There’s no Django for it (yet?) so don’t expect mindblowing productivity
    • Don’t do it for the nerdy buzzword bingo …
    • … do it for single-page js apps that feed with JSON from the server
    • Do it if real-time is important for your app
    Oh, and the convincing part? Just build a cool prototype and find a local community from where you could hire smart developers. I can say this about any other technology, and bosses seem to be rather predictable in their reasoning…
    Ilya Katsov hits again with an excellent post about probabilistic data structures for web analytics and data mining. If you’re at least a bit familiar with stuff such as Bloom filter and the Loglog counter, you’re going to love this article.
    But don’t stop there, Ilya’s blog is highly recommended reading.
    ]]>
    http://www.netuality.ro/linkdump-nodevincing-the-boss-and-probabilistic-data-structures/linkdump/20120504/feed/ 0
    Cloudy with a chance of MongoDb http://www.netuality.ro/cloudy-with-a-chance-of-mongodb/scalability/20111107/?utm_source=rss&utm_medium=rss&utm_campaign=cloudy-with-a-chance-of-mongodb http://www.netuality.ro/cloudy-with-a-chance-of-mongodb/scalability/20111107/#comments Mon, 07 Nov 2011 13:48:07 +0000 Adrian http://www.netuality.ro/?p=238 An interesting – and anonymous – article popped up this weekend on Hackernews about why you shouldn’t use MongoDb (copied over at MyNoSQL for posterity). There’s a nasty tone in that letter which must be stemming from some painful first-hand experience with the thingie. One can understand why they want to be anonymous – to draw attention on Mongo’s issues and not on their project or team. The rebuttals that came up right away are pretty funny though, especially this one containing interesting advice:

    • RTFM/”use the latest version”
    • “Sometimes losing data doesn’t matter”
    • “Software has bugs”
    • “We experienced database corruption so keep at least 3 copies of everything you consider important”
    • “Backup often”.  Which means according to Mongo docs one of three recommended backup strategies:
    1. LVM/EBS snapshot
    2. Shutdown and backup
    3. Backup through a replicated slave (note that said replication can stop without apparent reason)

    Easy peasy, right?

    I’ve been using all sorts of data stores and worked with tens or hundreds of million rows data sets, intensive writes – you name it. I’ve seen actual data corruption once, maybe twice in 12 years. But I have to confess that I haven’t seriously used MongoDB until now. Maybe I should.

    ]]>
    http://www.netuality.ro/cloudy-with-a-chance-of-mongodb/scalability/20111107/feed/ 3
    Comic strips and contextual advertising http://www.netuality.ro/comic-strips-and-contextual-advertising/andeverythingelse/20100613/?utm_source=rss&utm_medium=rss&utm_campaign=comic-strips-and-contextual-advertising http://www.netuality.ro/comic-strips-and-contextual-advertising/andeverythingelse/20100613/#comments Sun, 13 Jun 2010 07:44:22 +0000 Adrian http://www.netuality.ro/?p=229 As seen today on Google Reader. A strip is a strip is a strip is a strip:

    ]]>
    http://www.netuality.ro/comic-strips-and-contextual-advertising/andeverythingelse/20100613/feed/ 1
    What to do when the meteor strikes http://www.netuality.ro/what-to-do-when-the-meteor-strikes/datacenter/20100511/?utm_source=rss&utm_medium=rss&utm_campaign=what-to-do-when-the-meteor-strikes http://www.netuality.ro/what-to-do-when-the-meteor-strikes/datacenter/20100511/#comments Tue, 11 May 2010 15:23:26 +0000 Adrian http://www.netuality.ro/?p=223

    There’s nothing quite like a good Single Point of Failure (SPOF) during a holiday dinner.

    says John Farmer on his blog, and I couldn’t agree more. Start with a meteor strike scenario for a change, just imagine a giant rock crushing your measly SPOF-ridden infrastructure in one unlucky data center. Waiting for the black swan to appear learn to keep calm and react normally using the tips from a triple post about incidents, outages and systems maintenance:

    Simple problems can easily become large complicated problems after a few bad decisions made in haste. Take a breath before continuing. This is especially important with a page at 3AM or if a panicky client is in your office. Tell the client you’ll handle the problem and run through your normal procedure.

    [...]

    Remember the prime directive – your job is to restore service as quickly as possible. You are not there to debug interesting problems with your service.

    Recommended reading!

    ]]>
    http://www.netuality.ro/what-to-do-when-the-meteor-strikes/datacenter/20100511/feed/ 0
    Linkdump: leaner meaner MySQL, gulping from the data buffet and lessons learned at Reddit http://www.netuality.ro/linkdump-leaner-meaner-mysql-gulping-from-the-data-buffet-and-lessons-learned-at-reddit/linkdump/20100510/?utm_source=rss&utm_medium=rss&utm_campaign=linkdump-leaner-meaner-mysql-gulping-from-the-data-buffet-and-lessons-learned-at-reddit http://www.netuality.ro/linkdump-leaner-meaner-mysql-gulping-from-the-data-buffet-and-lessons-learned-at-reddit/linkdump/20100510/#comments Mon, 10 May 2010 17:58:12 +0000 Adrian http://www.netuality.ro/?p=216 The Percona guys are pleading for a MySQL strongly optimized for a single type of storage engine:

    We could save a lot of CPU cycles by having storage format same as processing format. We could tune Optimizer to handle Innodb specifics well. We could get rid of SQL level table locks and using Innodb internal data dictionary instead of Innodb files. We would use Innodb transactional log for replication (which could be extended a bit for this purpose). Finally backup can be done in truly hot way without nasty “FLUSH TABLE WITH READLOCK” and hoping nobody is touching “mysql” database any more. Single Storage Engine server would be also a lot easier to test and operate.

    This also would not mean one has to give up flexibility completely, for example one can imagine having Innodb tables which do not log the changes, hence being faster for update operations.

    Looks like Twitter data buffet is back in business. Some of the data is free. Enjoy it with moderation: too much data can make you slow.

    Reddit‘s Steve Huffman gives a talk at Web Apps Miami 2010. Self-healing, separation of services, be stateless and cache like crazy, redundancy and yes, a little bit of Hadoop (Amazon’s Hadoop is Elastic Map Reduce). Read the full transcript on Carsonified:

    Click here to view the embedded video.

    We’ve actually been using Hadoop, Amazon’s Hadoop implementation to compute awards. If we need to do a complicated query like that, we store the data, we dump our database, or at the right time we store it in a way that will make those joins possible down the road. That being said; we’ve tried to avoid doing joins as much as possible, and when the data comes in we store it in the way we’re going to need it. That’s worked much better than trying to do it at run time.

    ]]>
    http://www.netuality.ro/linkdump-leaner-meaner-mysql-gulping-from-the-data-buffet-and-lessons-learned-at-reddit/linkdump/20100510/feed/ 0
    And now for something a little different: graphviz candy http://www.netuality.ro/and-now-for-something-a-little-different-graphviz-candy/presentations/20100506/?utm_source=rss&utm_medium=rss&utm_campaign=and-now-for-something-a-little-different-graphviz-candy http://www.netuality.ro/and-now-for-something-a-little-different-graphviz-candy/presentations/20100506/#comments Thu, 06 May 2010 13:37:15 +0000 Adrian http://www.netuality.ro/?p=212 No self-respecting geek can resist either of these incredible temptations:

    • Correct something wrong on “the internets”
    • Produce a little bit of graphviz candy (what a fantastic tool)

    Imagine my total happiness discovering that I can exercise both of them simultaneously. I can only thank Digg’s John Quinn for creating the opportunity in his otherwise very interesting presentation, by including this directed graph on one of the slides:

    You would get the impression that besides Cassandra the other “NoSQL” solutions have a single major website using them. I cannot pronounce for all the products with their logos embedded in the graph nodes, but for HBase this is somewhat incorrect. Starting from the Powered By page for HBase I thought it’d be a good idea to fill in the blanks and also add a few high-traffic sites (I’ve only selected websites which are in the Alexa top 1000). Here’s the new (and supposedly, more accurate) graph:

    While it looks like Cassandra got a few higher-profile clients lately (even if for relatively mundane tasks such as persistent caching) this does not mean HBase is only used by StumbleUpon. I think we’ll see more and more success stories for both of these solutions and perhaps for some un-announced future products as well. The domain is still in its infancy, let’s not imply “consolidation” just yet.

    ]]>
    http://www.netuality.ro/and-now-for-something-a-little-different-graphviz-candy/presentations/20100506/feed/ 0
    Linkdump: Coop, HBase performance and a bit of Warcraft http://www.netuality.ro/linkdump-coop-hbase-performance-and-a-bit-of-warcraft/linkdump/20100427/?utm_source=rss&utm_medium=rss&utm_campaign=linkdump-coop-hbase-performance-and-a-bit-of-warcraft http://www.netuality.ro/linkdump-coop-hbase-performance-and-a-bit-of-warcraft/linkdump/20100427/#comments Tue, 27 Apr 2010 19:35:11 +0000 Adrian http://www.netuality.ro/?p=207 Riptano is to Cassandra what Cloudera is to Hadoop or Percona to MySQL. Mmmkey?

    A great, insightful post from Pingdom (as usual) allows us to take a peek behind the doors at largest web sites in the world, just by reading selected stuff from their respective developer blogs.

    Yahoo decreased data-center cooling costs compared to power costs from 50 cents/dollar to only one cent/dollar. This is obtained on their most recent Yahoo Computing Coop data-center built in Lockport, New York.

    The data center operates with no chillers, and will require water for only a handful of days each year. Yahoo projects that the new facility will operate at a Power Usage Effectiveness (PUE) of 1.1, placing it among the most efficient in the industry. [...]

    If it looks like a chicken coop, it’s because some of the design principles were adapted from …. well, chicken coops. “Tyson Foods has done research involving facilities with the heat source in the center of the facility, looking at how to evacuate the hot air,” said Noteboom. “We applied a lot of similar thought to our data center.”

    The Lockport site is ideal for fresh air cooling, with a climate that allows Yahoo to operate for nearly the entire year without using air conditioning for its servers.

    High Scalability blog dissects a paper describing Dapper, Google’s tracing system used to instrument all the components of a software system in order to understand its behavior. Immensely interesting:

    As you might expect Google has produced and elegant and well thought out tracing system. In many ways it is similar to other tracing systems, but it has that unique Google twist. A tree structure, probabilistically unique keys, sampling, emphasising common infrastructure insertion points, technically minded data exploration tools, a global system perspective, MapReduce integration, sensitivity to index size, enforcement of system wide invariants, an open API—all seem very Googlish.

    On my favorite blog :) HStack.org Andrei wrote a great post about real-life performance testing of HBase:

    The numbers are the tip of the iceberg; things become really interesting once we start looking under the hood, and interpreting the results.

    When investigating performance issues you have to assume that “everybody lies”. It is crucial that you don’t stop at a simple capacity or latency result; you need to investigate every layer: the performance tool, your code, their code, third-party libraries, the OS and the hardware. Here’s how we went about it:

    The first potential liar is your test, then your test tool – they could both have bugs so you need to double-check.

    But the most interesting distributed system of the week is World of Warcraft. Ars Technica describes a tour of the Blizzard campus and here’s a peek at the best NOC screen ever:

    For the hooorde!

    ]]>
    http://www.netuality.ro/linkdump-coop-hbase-performance-and-a-bit-of-warcraft/linkdump/20100427/feed/ 1
    Linkdump: Twitter, Twitter, CAP and … iPad http://www.netuality.ro/linkdump-twitter-twitter-cap-and-ipad/linkdump/20100421/?utm_source=rss&utm_medium=rss&utm_campaign=linkdump-twitter-twitter-cap-and-ipad http://www.netuality.ro/linkdump-twitter-twitter-cap-and-ipad/linkdump/20100421/#comments Wed, 21 Apr 2010 20:24:17 +0000 Adrian http://www.netuality.ro/?p=194 Well, not all Twitter runs on Cassandra :) Alex Payne explains how they build Hawkwind, a distributed search system written in Scala. Take a look at the slide 18, where you can clearly see that they use HBase as backend:



    Also from the great guys at Twitter: gizzard. Interesting and appropriate name for a database sharding framework. Gizzard uses range-based partitioning and replication tree and knows to rely on a large range of data stores: RDBMSes, Lucene or Redis – you name it. But I wonder about the operational overhead when you have a really large gizzard cluster.

    Michael Stonebraker has a short essay on CAP published in the ACM blogs. He identifies a series of use cases where the CAP theorem simply does not apply and cannot be appealed to for guidance:

    Obviously, one should write software that can deal with load spikes without failing; for example, by shedding load or operating in a degraded mode. Also, good monitoring software will help identify such problems early, since the real solution is to add more capacity. Lastly, self-reconfiguring software that can absorb additional resources quickly is obviously a good idea.

    In summary, one should not throw out the C so quickly, since there are real error scenarios where CAP does not apply and it seems like a bad tradeoff in many of the other situations.

    Great nosqlEu coverage on Alex Popescu’s blog MyNoSQL. Read it to get all the presentations, tons of links and Twitter quotes.

    Because every self-respecting blog should mention some info about the newly released iPad, here’s mine. According to the O’Reilly Radar, iPad is not ready for the cloud integration:

    I am hoping for a future where all I need to supply a device with is my identity, and everything else falls into place. This doesn’t even have to be me trusting in a third-party cloud: there’s no reason similar mechanisms couldn’t be used privately in a home network setting.

    I think the iPad is an amazing piece of hardware, and the most pleasant web browsing experience available. It is still very much a 1.0 device though, and its best days certainly lie ahead of it. I hope part of that improvement is a simple story for synchronization and cloud access.

    Guess I’ll be waiting for the release of iPad Pro:

    ]]>
    http://www.netuality.ro/linkdump-twitter-twitter-cap-and-ipad/linkdump/20100421/feed/ 0
    Linkdump: using Hbase, CAP visuals, Farmville and more http://www.netuality.ro/linkdump-using-hbase-cap-visuals-farmville-and-more/scalability/20100317/?utm_source=rss&utm_medium=rss&utm_campaign=linkdump-using-hbase-cap-visuals-farmville-and-more http://www.netuality.ro/linkdump-using-hbase-cap-visuals-farmville-and-more/scalability/20100317/#comments Wed, 17 Mar 2010 10:20:25 +0000 Adrian http://www.netuality.ro/?p=187 Two great posts from my colleagues about why Adobe is using HBase: part 1 and part 2. As I’ve experienced all these firsthand, I guarantee this is solid, relevant information. Both articles are highly recommended reads.

    Speaking about HBase, there’s rumor on the street that they are taking HBASE-1295 (multi data center replication) very seriously and we’ll be seeing a new feature announcement relatively soon. Waiting forward!

    An older but still interesting presentation on how RIPE NCC is using Hadoop and HBase to store and search through IP addresses for Europe, Middle East and Russia can be found here:

    It looks like Farmvile is still in the MySQL+memcache phase, according to the High Scalability blog. And they use PHP. When will they start looking into NoSQL? Hopefully soon enough to have a good crop.

    Nathan’s visual guide to NoSQL systems while perhaps not entirely correct is a nice tentative to put all these projects on the same map. I would love to see a “patched” version of the visual guide taking into account all the information left in the comments…

    Oh and Twitter is using Protocol Buffers to store information on Hadoop. And they’re going to opensource their implementation.

    ]]>
    http://www.netuality.ro/linkdump-using-hbase-cap-visuals-farmville-and-more/scalability/20100317/feed/ 0
    Linkdump: Cassandra lovers, blowing the circuit breaker and Oracle clouds http://www.netuality.ro/linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds/linkdump/20100304/?utm_source=rss&utm_medium=rss&utm_campaign=linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds http://www.netuality.ro/linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds/linkdump/20100304/#comments Thu, 04 Mar 2010 18:31:13 +0000 Adrian http://www.netuality.ro/?p=181 Good points (as always) on Alexandru’s blog discussing the SQL scalability isn’t for everyone topic.

    NoSQL as RDBMS are just tools for our job and there is nothing about the death of one of the other. But as we’ve learned over years, every new programming language is the death of all its precursors, every new programming paradigm is the death of everything that existed before and so on. The part that some seem to be missing or ignoring deliberately is that in most of these cases this death have never really happened.

    For large-scale performance testing of a production environment check out how Facebook MySpace simulated 1 million concurrent users with a huge EC2 cluster, described on the High Scalability blog. While the article is a guest post from a company selling “cloud testing” solutions and has a bit of “sales juice” in it, it’s still a very good read:

    Large-scale testing using EC2

    Someone is in love with Cassandra after only 4 months. Hoping Cassandra doesn’t get too fat after the wedding:

    Traditional sharding and replication with databases like MySQL and PostgreSQL have been shown to work even on the largest scale websites — but come at a large operational cost. Setting up replication for MySQL can be done quickly, but there are many issues you need to be aware of, such as slave replication lag. Sharding can be done once you reach write throughput limits, but you are almost always stuck writing your own sharding layer to fit how your data is created and operationally, it takes a lot of time to set everything up correctly. We skipped that step all together and added a couple hooks to make our data aggregation service siphon to both PostgreSQL and Cassandra for the initial integration.

    Distributed data war stories from Anders @ bandwidth.com, HBase and Hadoop on commodity hardware:

    As mentioned before, the commodity machines I used were very basic but I was able to insert conservatively about 500 records per second with this setup. I kept blowing the circuit breaker at the office as well forcing me to spread the machines across several power circuits but it proved that the system was at least fault tolerant!

    SourceForge chooses Python, TurboGears and … MongoDB for a new version of their website. Looks like Mongo is becoming quite mainstream.

    Don’t believe the rumors, Oracle is into cloud computing after all – at least according to Forrester. Well, as long as the clouds are private. And as long as you can live with “coming soon” tooling. And it’s not like they really have a clear long-term strategy for cloud computing:

    I believe that cloud is a revolution for Oracle, IBM, SAP, and the other big vendors with direct sales forces (despite what they say). Cloud computing has the potential to undermine the account-management practices and pricing models these big companies are founded on. I think it will take years for each of the big vendors to adapt to cloud computing. Oracle is just beginning this journey; I think other vendors are further down the track.

    The igvita blog hits NoSQL in the groin by showing a simple way of having a schema-free data store … in MySQL. It’s a sort of proxy that translates schemas into denormalized data placed in distinct tables:

    Instead of defining columns on a table, each attribute has its own table (new tables are created on the fly), which means that we can add and remove attributes at will. In turn, performing a select simply means joining all of the tables on that individual key. To the client this is completely transparent, and while the proxy server does the actual work, this functionality could be easily extracted into a proper MySQL engine – I’m just surprised that no one has done so already.

    While an interesting idea, not sure how effective this will be in practice, as joins are among the most time-consuming operations in the database world. I’m pretty sure that replacing a 10-column table get on the primary key with joins on 10 tables will add an important overhead.

    ]]>
    http://www.netuality.ro/linkdump-cassandra-lovers-blowing-the-circuit-breaker-and-oracle-clouds/linkdump/20100304/feed/ 2
    Linkdump: Cassandra @Twitter, Forrester not grokking NoSQL http://www.netuality.ro/linkdump-cassandra-twitter-forrester-not-grokking-nosql/linkdump/20100224/?utm_source=rss&utm_medium=rss&utm_campaign=linkdump-cassandra-twitter-forrester-not-grokking-nosql http://www.netuality.ro/linkdump-cassandra-twitter-forrester-not-grokking-nosql/linkdump/20100224/#comments Wed, 24 Feb 2010 20:18:27 +0000 Adrian http://www.netuality.ro/?p=174 Seven signs you need to accept NoSQL in your life according to the High Scalability blog. I especially like sign #6 “Maintaining a completely separate object caching system on top of an already beefy table storage system“. There are companies making serious bucks from selling exactly this type of caching systems. I find that a bit ironic, don’t you?

    Twitter has just decided to adopt Cassandra as their main storage. I roughly estimated the status table to having  more than 9 billion rows – it’s a good table size to start thinking about the benefits of NoSQL. I would have been interested in seeing a comparison with other existing solutions and a rationale of their choice. According to some sources, Ryan King rejected HBase because if a region server is down, writes will be blocked for affected data until the data is redistributed – unlike Cassandra’s “write never fail” policy. According to other sources, this will be solved in a future version of HBase but I think Twitter needed a solution sooner rather than later. I hope for two things:

    • That the Twitter dudes will blog about their migration experience
    • That I’ll be able to access and search through all my older tweets, fer’ God sake!

    Forrester Research thinks that NoSQL and Elastic Caching Platforms are very similar. So similar that “NoSQL Wants To Be Elastic Caching When It Grows Up“. According to Forrester “Ultimately, the real difference between NoSQL and elastic caching now may be in-memory versus persistent storage on disk.” Yeah sure: transactions, durability, indexing, security model – who needs this crap anyway?

    Oh and let’s not forget about today’s GAE unscheduled downtime. Waiting forward for the post mortem, for sure there will be a thing or two to learn…

    ]]>
    http://www.netuality.ro/linkdump-cassandra-twitter-forrester-not-grokking-nosql/linkdump/20100224/feed/ 1