Netuality

Taming the big, bad, nasty websites

Archive for the ‘Ruby’ tag

Google: sorry, but Lisp/Ruby/Erlang not on the menu

7 comments

Yes, language propaganda again. Ain’t it fun ?

Here comes a nice quote from the latest Steve Yegge post (read it entirely if you have the time, it’s both fun and educational – at least for me). So, there:

I made the famously, horribly, career-shatteringly bad mistake of trying to use Ruby at Google, for this project. And I became, very quickly, I mean almost overnight, the Most Hated Person At Google. And, uh, and I’d have arguments with people about it, and they’d be like Nooooooo, WHAT IF… And ultimately, you know, ultimately they actually convinced me that they were right, in the sense that there actually were a few things. There were some taxes that I was imposing on the systems people, where they were gonna have to have some maintenance issues that they wouldn’t have. [...] But, you know, Google’s all about getting stuff done.

[...]

Is it allowed at Google to use Lisp and other languages?

No. No, it’s not OK. At Google you can use C++, Java, Python, JavaScript… I actually found a legal loophole and used server-side JavaScript for a project.

Mmmmm … key ?

Written by Adrian

May 29th, 2008 at 12:35 am

Posted in Tools

Tagged with , , , ,

Java going down, Python way up, and more …

8 comments

According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%. C# is selling more books from year to year and will probably level up with Java in 2008. Javascript is on the rise (due to AJAX, for sure) and PHP is on a surprising decrease path (although the job statistics indicate quite the contrary).

According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%

In 2007, the number of sold Ruby books was larger than the number of Python books. In their article they qualify Ruby as being a “mid-major programming language” and Python as “mid-minor programming language”. However, after the announcement of Google App Engine the number of Python downloads from ActiveState has tripled in May. This should become visible in the book sales statistics, pretty soon.

Written by Adrian

May 24th, 2008 at 5:36 pm

Posted in Tools

Tagged with , , , , ,

Monitor everything on your Linux servers – with SNMP and Cacti

5 comments

Two free open-source tools are running the show for network and server-activity monitoring. The oldest and quite popular among network and system administrators is Nagios. Nagios does not only do monitoring, but also event traps, escalation and notification. The younger challenger is called Cacti. Unlike Nagios, it’s written in a scripting language [PHP] so no compiling is necessary – it just runs out of the box1. Cacti’s problem is that – at its current version – is missing lots of real-time features such as monitoring and notification. All these features are scheduled to be integrated in future versions of the product, but as with any open-source roadmap nothing is guaranteed, Anyway, this article is focusing on Cacti integration because it’s what I am currently using.

Cacti is built upon an open-source graphing tool called MRTG and a communication protocol SNMP. SNMP is not exactly a developer’s cup of tea, being more of a network administrator’s tool2. However, a monitoring server comes extremely handy in performance measurement and tuning, especially for complex performance behavior which can only be benchmarked long-term : such as large caches impact on a web application, or performance of long-running operations.

But is that specific variable you need to monitor, available with SNMP out of the box ? There is a strong chance it is. SNMP being an extensible protocol, lots of organization have recorded their own MIBs and respective implementations. Basically, a MIB is a group of unique identifiers called OIDs. An OID is a sequence of numbers separated by dots, for instance ‘.1.3.6.1.4.1.2021.11′; each number has a special meaning in a standard object tree – this example, the meaning of ‘.1.3.6.1.4.1.2021.11′ is ‘.iso.org.dod.internet.private.enterprises.ucdavis.systemStats’. Even you can have your own MIB in the ‘.iso.org.dod.internet.private.enterprises’ tree, by applying on this page at IANA.

Most probably you don’t really need your own MIB, no matter how ‘exotic’ your monitoring is, because:

a) it’s already there, in the huge list of existing MIBs and implementations

and

b) you are not bound to the existing official MIBs, in fact you can create your own MIB as long as you replicate it in the snmp configuration on all the servers that you want to monitor.

To take a look at existing MIBs, free tools are available on the net, IMHO the best one being MibBrowser. This multiplatform [Java] MIB browser has a free version which should be more than enough for our basic task. The screen capture shown here depicts a “Get Subtree” operation on the ‘.1.3.6.1.4.1.2021.11′ MIB; the result is a list of single value MIBs, such for instance ‘.1.3.6.1.4.1.2021.11.11.0′ which has the alias ’ssCpuIdle.0′ and value 97 [meaning that the CPU is 97% idle]. You can see the alias by loading the corresponding MIB file [select File/Load MIB then choose 'UCD-SNMP-MIB.txt' from the list of predefined MIBs].

From command line, in order to display existing MIB values, you can use snmpwalk:

snmpwalk -Os -c [community_name] -v 1 [hostname] .1.3.6.1.4.1.111111.1

3 and the result is:

.1.3.6.1.4.1.2021.11 OID (.iso.org.dod.internet.private.enterprises.ucdavis.systemStats)
snmpwalk -v 1 -c sncq localhost .1.3.6.1.4.1.2021.11
UCD-SNMP-MIB::ssIndex.0 = INTEGER: 1
UCD-SNMP-MIB::ssErrorName.0 = STRING: systemStats
UCD-SNMP-MIB::ssSwapIn.0 = INTEGER: 0
UCD-SNMP-MIB::ssSwapOut.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOSent.0 = INTEGER: 4
UCD-SNMP-MIB::ssIOReceive.0 = INTEGER: 2
UCD-SNMP-MIB::ssSysInterrupts.0 = INTEGER: 4
UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 1
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 2
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 1
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 96
UCD-SNMP-MIB::ssCpuRawUser.0 = Counter32: 17096084
UCD-SNMP-MIB::ssCpuRawNice.0 = Counter32: 24079
UCD-SNMP-MIB::ssCpuRawSystem.0 = Counter32: 6778580
UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 599169454
UCD-SNMP-MIB::ssCpuRawKernel.0 = Counter32: 6778580
UCD-SNMP-MIB::ssIORawSent.0 = Counter32: 998257634
UCD-SNMP-MIB::ssIORawReceived.0 = Counter32: 799700984
UCD-SNMP-MIB::ssRawInterrupts.0 = Counter32: 711143737
UCD-SNMP-MIB::ssRawContexts.0 = Counter32: 1163331309
UCD-SNMP-MIB::ssRawSwapIn.0 = Counter32: 23015
UCD-SNMP-MIB::ssRawSwapOut.0 = Counter32: 13730

Each of this values has its own significance, like for instance ’ssCpuIdle.0′ which announces that the CPU is 96% idle.
In order to retrieve just a single value of the list, use its alias as a parameter to the snmpget command, for instance

snmpget -Os -c [community_name] -v 1 [hostname] UCD-SNMP-MIB::ssCpuIdle.0

Sometimes, you want to monitor something which you do not seem to find in the list of MIBs. Say, for instance, the performance of a MySQL database that your’re pounding pretty hard with your webapp4. The easiest way of doing this is to pass through a script – snmp implementations can take the result of any script and expose it through the protocol, line by line.

Supposing you want to keep track of the values obtained with the following script:

#!/bin/sh
/usr/bin/mysqladmin -uroot status | /usr/bin/awk '{printf("%f\n%d\n%d\n",$4/
10,$6/1000,$9)}'

The mysqladmin command and a bit of simple awk magic display the following three values, each on a separate line:

  • number of opened connections / 10
  • number of queries / 1000
  • number of slow queries

It is interesting to not that, while the first value is instantaneous gauge-like, the following two are incremental, growing and growing as long as new queries and new slow queries are recorded. Will keep this in mind for later, when we will track these values.

But for now, let’s see how these three values are exposed through snmp. The first step is to tell the SNMP daemon that the script has an associated MIB. This is done in the configuration file, usually located at /etc/snmp/snmp.d. The following line attaches the script [for example /home/user/myscript.sh] execution to a certain OID:

exec .1.3.6.1.4.1.111111.1 MySQLParameters /home/user/myscript.sh

the ‘.1.3.6.1.4.1.111111.1′ OID is a branch of ‘.1.3.6.1.4.1′ [meaning '.iso.org.dod.internet.private.enterprises']. We tried to make it look ‘legitimate’ but obviously you can use here any sequence you want to.

After restarting the daemon, let’s interrogate Mibbrowser for the freshly created OID, see the following image snmpwalk -Os -c [community_name] -v 1 [hostname] .1.3.6.1.4.1.111111.1 ; the result is:

enterprises.111111.1.1.1 = INTEGER: 1
enterprises.111111.1.2.1 = STRING: "MySQLParameters"
enterprises.111111.1.3.1 = STRING: "/etc/snmp/mysql_params.sh"
enterprises.111111.1.100.1 = INTEGER: 0
enterprises.111111.1.101.1 = STRING: "0.900000"
enterprises.111111.1.101.2 = STRING: "18551"
enterprises.111111.1.101.3 = STRING: "108"
enterprises.111111.1.102.1 = INTEGER: 0
enterprises.111111.1.103.1 = ""

Great ! Now we have the proof that it really works and our specific values extracted with a custom script are visible through SNMP. Let’s go back to Cacti and see how we can make some nice charts out of them5.

Cacti has this nice feature of defining ‘templates’ that you can reuse afterwards. My strategy is to define a data template for each one of the 3 parameters I want to chart, using the ‘Duplicate’ function applied to the ‘SNMP – Generic OID Template’.

On the duplicate datasource template, you have to change the datasource title, name to display in charts, data source type [use DERIVE for incremental counters and GAUGE for instantaneous values], specific OID and the snmp community. Do it for the three values.

Using the three new datasource templates, create a chart template for ‘MySQL Activity’. That’s a bit more complicated, but it boils down to the following procedure, repeated for each of the 3 data sources:

  • add a data source and associate a graph [I always use AREA for the first graph as a background and LINE3 for the other, but it's just a matter of taste]
  • associate labels with current or computed values: CURRENT, AVERAGE, MAX in this example

All the rest is really fine tuning – deciding for better colors, wether to use autoscale or fixed scale and so on. By now, your graph template should be ready to use.

Note that for the incremental values ['DERIVE' type data sources] I’ve used titles such as ‘Thousands queries/5 min’ – the 5 minutes come from the Cacti poller which is set to query for data each 5 minutes. The end result is something like this one :

On this real production chart you’ll see a few interesting patterns. For instance, at 3 o’clock in the morning, there is a huge spike in all the charted parameters – indeed, a cron’ed script was provoking this spike. From time to time, a small burst of slow queries is recorded – still under investigation. What is interesting here is that these spikes were previously undetectable on the load average chart, which look clean and innocuous:

To conclude, SNMP is a valuable resource for server performance monitoring. Often, investigating specific parameters and displaying them in tools such as Cacti can bring interesting insights upon the behavior of servers.

Some SNMP implementations in different programming languages:

  • Java: Westhawk’s Java SNMP stack [free w commercial support], AdventNet SNMP API [commercial, with a feature-restricted un-expiring free version], iREASONING SNMP API [commercial implementation], SNMP4J [free and feature-rich implementation - thank you Mathias for the tip]
  • PHP: client-only supported by the php-snmp extension, part of the PHP distribution [free]
  • Python: PySNMP is a Python SNMP framework, client+agents [free].
  • Ruby: client-only implementation Ruby SNMP [free]

1 If you’re running Debian, Cacti comes with apt so it’s a breeze to install and run [apt-get install cacti]

2 a bit out of the scope of this article, SNMP also allows writing values on remote servers, not only retrieving monitored values.

3 Replace [hostname] with the server hostname and [community_name] with the SNMP community – default being ‘public’. The SNMP community is a way of authenticating a client to a SNMP server; although the system can be used for pretty sophisticated stuff, most of the time the servers have a read-only passwordless community, visible only in the internal network for monitoring purposes.

4 In fact, a commercial implementation of SNMP for MySQL does exist.

5 The procedure described here applies to Cacti v0.8.6.c

Written by Adrian

March 5th, 2006 at 5:27 pm

Posted in Tools

Tagged with , , , , , , , ,

HTTP compression filter on servlets : good idea, wrong layer

3 comments

The Servlet 2.3 specifications introduced the notion of servlet filters, powerful tools but unfortunately used in quite unimaginative ways. Let’s take for instance this ONJava article (“Two Servlet Filters Every Web Application Should Have”) written by one of the coauthors to Servlets and JavaServer Pages; the J2EE Web Tier (a well-known servlets and JSP book from O’Reilly), Jayson Falkner*. This article has loads of trackbacks, it became so popular that the filters eventually got published on JavaPerformanceTuning along with an (otherwise very sensible and pragmatic) interview of the author. However, there is a more efficient way of performing these tasks, as undiscriminated page compression and simple time-based caching do not necessarily belong in the servlet container**. As one of the comments (on ONJava) put it : ‘good idea, wrong layer !’. Let’s see why…

There is a simple way to compress pages from any kind of site (be it Java, PHP, or Ruby on Rails), natively, in Apache web server. The trick consists in chaining two Apache modules : mod_proxy and mod_gzip.Via mod_proxy, it becomes possible to configure a certain path on one of your virtual hosts to proxy all requests to the servlet container, then you may selectively compress pages using mod_gzip.

Supposing that the two modules are compiled and loaded in the configuration, and your servlet is located at http://local_address:8080/b2b. You want to make it visible at http://external_address/b2b. To activate the proxy, add the following two lines :

ProxyPass /b2b/ http://local_address:8080/b2b/
ProxyPassReverse /b2b/ http://local_address:8080/b2b/

You can add as many directives as you like, proxy-ing all the servlets for the server (for instance, one of the configuration I’ve looked at has a special servlet for dynamic image generation and one for dynamic PDF documents generation – the output will not be compressed, but they all had to be proxy-ed). Time-based caching is also possible with mod_proxy, but this subject deserves a little article by itself. For the moment, we’ll stick to simple transparent proxying and compression.

Congratulations, just restart Apache and you have a running proxy. Mod_gzip is a little bit trickier. I’ve adapted a little bit the configuration from the article Getting mod_gzip to compress Zope pages proxied by Apache (haven’t been able to find anything better concerning integration with Java servlet containers) and here’s the result :

#module settings
mod_gzip_on Yes
mod_gzip_can_negotiate Yes
mod_gzip_send_vary Yes
mod_gzip_dechunk Yes
mod_gzip_add_header_count Yes
mod_gzip_minimum_file_size 512
mod_gzip_maximum_file_size	5000000
mod_gzip_maximum_inmem_size	100000
mod_gzip_temp_dir /tmp
mod_gzip_keep_workfiles No
mod_gzip_update_static No
mod_gzip_static_suffix .gz
#includes
mod_gzip_item_include mime ^text/*$
mod_gzip_item_include mime httpd/unix-directory
mod_gzip_item_include handler proxy-server
mod_gzip_item_include handler cgi-script
#excludes
mod_gzip_item_exclude reqheader  "User-agent: Mozilla/4.0[678]"
mod_gzip_item_exclude mime ^image/*$
#log settings
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" mod_gzip: %{mod_gzip_result}n In:%{mod_gzip_input_size}n Out:%{mod_gzip_output_size}n:%{mod_gzip_compression_ratio}npct." mod_gzip_info
CustomLog /var/log/apache/mod_gzip.log mod_gzip_info

Short explanation. The module is activated and allowed to negotiate (see if a static or cached file was already compressed and reuse it). The Vary header is useful for client-side caches to work, dechunking eliminates the ‘Transfer-encoding: chunked’ HTTP header and joins the page into one big packet before compressing. Header length is added for traffic measuring purposes (we’ll see the ‘right’ figures in the log). Minimum size of a file to be compressed is 512 bytes, setting maximum is also a good idea because a) compressing a huge file will stump your server and b) the limitation guards against infinite loops. Maximum file size to compress in memory is 100KB in my setting, but you should tune this value for optimum performance. Temporary directory is /tmp and workfiles should be kept only if you need to debug mod_gzip. Which you don’t.

We’ll include in the files to be gzipped everything that’s text type, directory listing and … the magic line is the one that specifies that everything coming from the proxy-server is susceptible to be compressed: this will assure the compression of your generated pages. And while you’re at it, why not add the cgi scripts…

The includes specified here are quite generous, let’s now filter some of it: we’ll exclude all the images because they SHOULD be already compressed and optimized for web. And last but not least, we’ll decide the format of the line to be added and the location of the compression log – it will allow us to see whether the filter is effectively running and compute how much bandwidth we have saved.

A compelling reason to use mod_gzip is its maturity. Albeit complex, this Apache module is stable and relatively bug free, which can hardly be said about the various compression filters found on the web. The original code from the O’Reilly article was behaving incorrectly under certain circumstances (corrected later on the book’s site, I’ve tested the code and it works fine). I also had some issues with Amy Roh’s filter (from Sun). Amy’s compression filter can be found in a lot of places on the web (JavaWorld, Sun), but unfortunately does not set the correct ‘Content-Length’ header, thus disturbing httpunit, which in turn has ‘turned 100% red’ my web tests suite – as soon as the compression filter was on. Argh.

For the final word, let’s compare the performance of the two solutions (servlet filter agains mod_proxy+mod_gzip). I’ve used a single machine to install both Apache and the servlet container (Jetty), and Amy Roh’s compression filter. A mildly complex navigation scenario was recorded in TestMaker (a cool free testing tool written in Java), then played a certain number of times (100, to be more specific). The results are expressed in TPS (transactions per second): the bigger, the better. The following median values were obtained : 3.10TPS direct connection to the servlet container, 2.64TPS via the compression filter and 2.81TPS via Apache mod_proxy+mod_gzip. That means a 5% performance hit between the Apache and the filter solution. Of course the figure is highly dependent on my test setup, the specific webapp and a lot of other parameters, however I am confident that Apache is superior in any configuration. You also have to consider that using a proxy has some nice bonuses. For instance, Apache HTTPS virtual sites may encrypt your content in a transparent manner. Apache has very good and fast logging, so it’d be cool to completely disable HTTP requests logging in your servlet container. Moreover, the Apache log format is understood by a myriad of traffic analyzer tools. Load balancing is possible using mod_proxy and another remarkably useful Apache module, mod_rewrite. As Apache runs in a completely different process, you might expect slightly better scalability on multiple processor boxes.

Nota bene: in all the articles I’ve read on the subject of compression, there is this strange statement that compression cannot be detected client-side. Of course you can do it… Supposing you use Firefox (which you should, if you’re serious about web browsing !) with the Web Developer plugin (which you should, if you’re serious about web development !). As depicted in the figure, the plugin helps you to “View Response Headers” (in “Information” menu): the presence or absence of Content-Encoding: gzip is what you’re looking for. Voila ! Just for kicks, look at the response headers on a few well-known sites, and prepare to be surprized (try Microsoft, for instance or Slashdot for some funny random quotes).

* Jayson Falkner has also authored this article (“Another Java Servlet Filter Most Web Applications Should Have”) which explains how to control the client-side cache via HTTP response headers. While the example is very simple, one can easily extend it to do more complex stuff such as caching according to rules (for instance, caching dynamically generated documents or images according to the context). This _is_ a pragmatic example of servlet filter.

** Unless of course – as one of the commenters explains here – you have some specific constraints against being able to use Apache, such as : embedded environment, forced to use another web server than Apache (alternative solutions might exist for those servers but I am not aware of them), mod_gzip unavailable on the target platform, etc.

Written by Adrian

February 2nd, 2005 at 8:28 am

Posted in Tools

Tagged with , , , , , ,

Scripting languages not just for toys (a Ruby web framework)

leave a comment

According to David Heinemeier Hansson, the RAILS 1KLOC (!) web framework written in Ruby was used to develop Basecamp, a mildly complex project management webapp. What's fascinating is the Ruby code used to develop Basecamp is only 4KLOC (according to the RAILS document) and was developed in less than 2man*month (including 212 test cases and all the bells and gingles). Although I strongly suspect that there was a lot of templating going on behind the hoods and I doubt that template development time was included, the efficiency is amazing, especially if you consider that they have started almost from scratch. Makes you wonder what would be possible to perform in Ruby with a comprehensive library such as PHP's PEAR (well, duh, ignoring the fact that it should be called REAR which is a very very very nasty name) ?

Written by Adrian

March 24th, 2004 at 2:07 pm

Posted in Tools

Tagged with , ,