Netuality

Taming the big, bad, nasty websites

Archive for the ‘book’ tag

Java going down, Python way up, and more …

8 comments

According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%. C# is selling more books from year to year and will probably level up with Java in 2008. Javascript is on the rise (due to AJAX, for sure) and PHP is on a surprising decrease path (although the job statistics indicate quite the contrary).

According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%

In 2007, the number of sold Ruby books was larger than the number of Python books. In their article they qualify Ruby as being a “mid-major programming language” and Python as “mid-minor programming language”. However, after the announcement of Google App Engine the number of Python downloads from ActiveState has tripled in May. This should become visible in the book sales statistics, pretty soon.

Written by Adrian

May 24th, 2008 at 5:36 pm

Posted in Tools

Tagged with , , , , ,

Java Persistence with Hibernate – the book, my review

leave a comment

You have to know that I’ve tried. Honestly, I did. I hoped to be able to read each and every page of “Java persistence with Hibernate” (revised edition of “Hibernate in action”), by Christian Bauer and Gavin King. But, I gave up before reading a third of it, then I continued only reading some sections. First, because I know Hibernate, I’ve used Hibernate in all the Java projects I’ve been involved with – in the last 5 years or so. Second, because the content from the first edition is more than familiar to me. And third, because this second edition is a massive > 800 pages book (double the number of pages in the first edition). At that rate, the fourth edition will be sold together with some sort of transportation device, because a mere human will not be able to carry that amount of paper. How did this happened ?

Hibernate is the perfect example of a successful Java open-source project. Initially started as a free alternative to commercial object-relational mapping tools, it quickly became mainstream. Lots of Java developers around the world use Hibernate for the data layer inside their projects. It’s very comfortable, just set some attributes or ask for a business object instance and Hibernate does all the ugly SQL for you. As a developer, you are then comfortably protected from that nasty relational database, and gently swim in a sea of nicely bound objects. Right ? No, not exactly. Each object-relationship mapping tool has its own ways of being handled efficiently, and this is where books like “Java persistence with Hibernate” come into play. This book teaches you how to work with Hibernate, with a “real-world” example: the Caveat-Emptor online auction application.

Since the first edition of the book was written, lots of things happened in the Hibernate world and you can see their impact in “Java persistence with Hibernate”. Most important is the release of the 3.x version line and its different ameliorations and new features: code annotations used as mapping descriptors, package naming reorganization inside the API, but most important the standardization under the umbrella of JPA (Java Persistence API) for a smooth integration with EJB 3 inside Java EE 5 servers. And this, is a little bit funny. Yesterday, Hibernate was the main proof that it is possible to make industrial-quality projects within a “J2EE-less” environment, today Hibernate has put a suit and a tie, joined the ranks of Jboss, then Redhat, and it lures the unsuspecting Java developers towards the wonderful and (sometimes) expensive world of Java EE 5 application servers. Which is not necessarily a bad move for the Hibernate API, but it’s a proof that in order to thrive as an open-source project, you need so much more than the Sourceforge account and some passion …

Enough with that, let’s take a look at the book content. Some 75% if it is in fact the content of the first edition, updated and completed. You learn what object-relational mapping is, the advantages, the quirks, the recommended way of developing with Hibernate. For a better understanding, single chapters from the initial book were expanded into 2, sometimes more, chapters. The “unit of work” is now called “a conversation” and you’ve got a whole new chapter (11) about conversations, which is in fact pretty good stuff about session and transaction management. Christian and Gavin done some great writing about concurrency and isolation in the relatively small 10-th chapter – which is a must read even if you’re not interested in Hibernate, but you want to understand once and for all what are these concurrent transaction behaviors everyone is talking about. The entire 13th chapter is dedicated to fetching strategy and caching, which is a must read if you want performance and optimization from your application. There is also a good deal of EJB, JPA and EE 5 – related stuff scattered in multiple chapters. And finally, a solid 50-pages chapter is pimping the JSF (Java Server Faces) compliant web development framework, Jboss Seam. I have only managed to read a few pages of this final chapter, so cannot really comment. Note to self: play a little bit with that Seam thing.

To conclude, is this a fun book ? No. Is this a perfect book to convert young open-source fanatics to the wonders of Hibernate API ? Nope. Is this a book to read cover to cover during a weekend ? Not even close. Then, what is this ? First, it’s the best book out there about Hibernate (and there are quite a few on the market right now), maybe even the best book about ORM in Java, in general. It has lots of references to EJB, JPA and EE, it will help you to easily sell a Hibernate project to the management. Even if the final implementation uses Spring … And finally, it’s the best Hibernate reference money can buy. When you have an issue, open the darn index and search, there are 90% chances your problem will be solved. And that’s a nice accomplishment. Don’t get this book because it’s funny, because it’s a nice read, about a new innovative open-source project. Buy it because it helps you grok ORM, write better code, deliver quality projects.

Written by Adrian

December 17th, 2006 at 2:00 pm

Posted in Books

Tagged with , , , ,

Unicode in Python micro-recipe : from MySQL to webpage via Cheetah

leave a comment

Very easy:

  • start by adding the default-character-set=utf8 in your MySQL configuration file and restart the database server
  • apply this recipe from Activestate Python Cookbook (“guaranteed conversion to unicode or byte string”)
  • inside the Cheetah template, use the ReplaceNone filter:


#filter ReplaceNone
${myUnicodeString}
#end filter

in order to prevent escaping non-ASCII characters.

Now. That’s better.

Written by Adrian

April 14th, 2006 at 11:42 pm

Posted in Tools

Tagged with , , , ,

HTTP compression filter on servlets : good idea, wrong layer

3 comments

The Servlet 2.3 specifications introduced the notion of servlet filters, powerful tools but unfortunately used in quite unimaginative ways. Let’s take for instance this ONJava article (“Two Servlet Filters Every Web Application Should Have”) written by one of the coauthors to Servlets and JavaServer Pages; the J2EE Web Tier (a well-known servlets and JSP book from O’Reilly), Jayson Falkner*. This article has loads of trackbacks, it became so popular that the filters eventually got published on JavaPerformanceTuning along with an (otherwise very sensible and pragmatic) interview of the author. However, there is a more efficient way of performing these tasks, as undiscriminated page compression and simple time-based caching do not necessarily belong in the servlet container**. As one of the comments (on ONJava) put it : ‘good idea, wrong layer !’. Let’s see why…

There is a simple way to compress pages from any kind of site (be it Java, PHP, or Ruby on Rails), natively, in Apache web server. The trick consists in chaining two Apache modules : mod_proxy and mod_gzip.Via mod_proxy, it becomes possible to configure a certain path on one of your virtual hosts to proxy all requests to the servlet container, then you may selectively compress pages using mod_gzip.

Supposing that the two modules are compiled and loaded in the configuration, and your servlet is located at http://local_address:8080/b2b. You want to make it visible at http://external_address/b2b. To activate the proxy, add the following two lines :

ProxyPass /b2b/ http://local_address:8080/b2b/
ProxyPassReverse /b2b/ http://local_address:8080/b2b/

You can add as many directives as you like, proxy-ing all the servlets for the server (for instance, one of the configuration I’ve looked at has a special servlet for dynamic image generation and one for dynamic PDF documents generation – the output will not be compressed, but they all had to be proxy-ed). Time-based caching is also possible with mod_proxy, but this subject deserves a little article by itself. For the moment, we’ll stick to simple transparent proxying and compression.

Congratulations, just restart Apache and you have a running proxy. Mod_gzip is a little bit trickier. I’ve adapted a little bit the configuration from the article Getting mod_gzip to compress Zope pages proxied by Apache (haven’t been able to find anything better concerning integration with Java servlet containers) and here’s the result :

#module settings
mod_gzip_on Yes
mod_gzip_can_negotiate Yes
mod_gzip_send_vary Yes
mod_gzip_dechunk Yes
mod_gzip_add_header_count Yes
mod_gzip_minimum_file_size 512
mod_gzip_maximum_file_size	5000000
mod_gzip_maximum_inmem_size	100000
mod_gzip_temp_dir /tmp
mod_gzip_keep_workfiles No
mod_gzip_update_static No
mod_gzip_static_suffix .gz
#includes
mod_gzip_item_include mime ^text/*$
mod_gzip_item_include mime httpd/unix-directory
mod_gzip_item_include handler proxy-server
mod_gzip_item_include handler cgi-script
#excludes
mod_gzip_item_exclude reqheader  "User-agent: Mozilla/4.0[678]"
mod_gzip_item_exclude mime ^image/*$
#log settings
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" mod_gzip: %{mod_gzip_result}n In:%{mod_gzip_input_size}n Out:%{mod_gzip_output_size}n:%{mod_gzip_compression_ratio}npct." mod_gzip_info
CustomLog /var/log/apache/mod_gzip.log mod_gzip_info

Short explanation. The module is activated and allowed to negotiate (see if a static or cached file was already compressed and reuse it). The Vary header is useful for client-side caches to work, dechunking eliminates the ‘Transfer-encoding: chunked’ HTTP header and joins the page into one big packet before compressing. Header length is added for traffic measuring purposes (we’ll see the ‘right’ figures in the log). Minimum size of a file to be compressed is 512 bytes, setting maximum is also a good idea because a) compressing a huge file will stump your server and b) the limitation guards against infinite loops. Maximum file size to compress in memory is 100KB in my setting, but you should tune this value for optimum performance. Temporary directory is /tmp and workfiles should be kept only if you need to debug mod_gzip. Which you don’t.

We’ll include in the files to be gzipped everything that’s text type, directory listing and … the magic line is the one that specifies that everything coming from the proxy-server is susceptible to be compressed: this will assure the compression of your generated pages. And while you’re at it, why not add the cgi scripts…

The includes specified here are quite generous, let’s now filter some of it: we’ll exclude all the images because they SHOULD be already compressed and optimized for web. And last but not least, we’ll decide the format of the line to be added and the location of the compression log – it will allow us to see whether the filter is effectively running and compute how much bandwidth we have saved.

A compelling reason to use mod_gzip is its maturity. Albeit complex, this Apache module is stable and relatively bug free, which can hardly be said about the various compression filters found on the web. The original code from the O’Reilly article was behaving incorrectly under certain circumstances (corrected later on the book’s site, I’ve tested the code and it works fine). I also had some issues with Amy Roh’s filter (from Sun). Amy’s compression filter can be found in a lot of places on the web (JavaWorld, Sun), but unfortunately does not set the correct ‘Content-Length’ header, thus disturbing httpunit, which in turn has ‘turned 100% red’ my web tests suite – as soon as the compression filter was on. Argh.

For the final word, let’s compare the performance of the two solutions (servlet filter agains mod_proxy+mod_gzip). I’ve used a single machine to install both Apache and the servlet container (Jetty), and Amy Roh’s compression filter. A mildly complex navigation scenario was recorded in TestMaker (a cool free testing tool written in Java), then played a certain number of times (100, to be more specific). The results are expressed in TPS (transactions per second): the bigger, the better. The following median values were obtained : 3.10TPS direct connection to the servlet container, 2.64TPS via the compression filter and 2.81TPS via Apache mod_proxy+mod_gzip. That means a 5% performance hit between the Apache and the filter solution. Of course the figure is highly dependent on my test setup, the specific webapp and a lot of other parameters, however I am confident that Apache is superior in any configuration. You also have to consider that using a proxy has some nice bonuses. For instance, Apache HTTPS virtual sites may encrypt your content in a transparent manner. Apache has very good and fast logging, so it’d be cool to completely disable HTTP requests logging in your servlet container. Moreover, the Apache log format is understood by a myriad of traffic analyzer tools. Load balancing is possible using mod_proxy and another remarkably useful Apache module, mod_rewrite. As Apache runs in a completely different process, you might expect slightly better scalability on multiple processor boxes.

Nota bene: in all the articles I’ve read on the subject of compression, there is this strange statement that compression cannot be detected client-side. Of course you can do it… Supposing you use Firefox (which you should, if you’re serious about web browsing !) with the Web Developer plugin (which you should, if you’re serious about web development !). As depicted in the figure, the plugin helps you to “View Response Headers” (in “Information” menu): the presence or absence of Content-Encoding: gzip is what you’re looking for. Voila ! Just for kicks, look at the response headers on a few well-known sites, and prepare to be surprized (try Microsoft, for instance or Slashdot for some funny random quotes).

* Jayson Falkner has also authored this article (“Another Java Servlet Filter Most Web Applications Should Have”) which explains how to control the client-side cache via HTTP response headers. While the example is very simple, one can easily extend it to do more complex stuff such as caching according to rules (for instance, caching dynamically generated documents or images according to the context). This _is_ a pragmatic example of servlet filter.

** Unless of course – as one of the commenters explains here – you have some specific constraints against being able to use Apache, such as : embedded environment, forced to use another web server than Apache (alternative solutions might exist for those servers but I am not aware of them), mod_gzip unavailable on the target platform, etc.

Written by Adrian

February 2nd, 2005 at 8:28 am

Posted in Tools

Tagged with , , , , , ,

Review : Holub on Patterns

leave a comment

In order to read Allen Holub's new book, you'll certainly need some programming skills (Java, OOP and patterns to be more specific). On the back of the book, there's specified 'Intermediate to Advanced'. It certainly depends on what you mean by 'Intermediate'… because the book is not exactly a light read. But then again, we don't expect that from Allen Holub. We want interesting, insightful books from him, and 'Holub on Patterns' falls nicely into that cathegory. However, some 'intermediates' should prepare themselves for a harsh ride.

The volume is structured in 4 chapters. The first one contains some 'preliminaries'. Meaning : short explanations about why OOP is still incorrectly used, design patterns are not fully understood, plus a bonus of controversial statements like 'getters and setters are evil' and 'Swing is an over-engineered piece of junk' [well, maybe not exactly these words]. As a direct consequence of reading this chapter, the 'intermediates' will start banging their heads on the closest wall available : “My code sucks ! I swear I'll never blindly copy/paste again !”.

In the second chapter things really start to heat up. Allen explains why 'extends is evil' and interfaces are not evil. In case we needed an example of fragile-base-class problem, here we go with some MFC bashing (usual stuff). The chapter focuses also on some creational patterns such as Factory and (at great lenghts) Singleton. I especially liked the cool explanations of how to shut down a Singleton.

The third chapter discusses an [overly complex, on purpose] implementation of the 'Game of Life'. Between huge chunks of code (a bit much for my taste) scattered throughout the chapter, the author explains all the implementation choices: from Visitor to Flyweight. Some 60% of the GoF patterns are encountered in this chapter's code.

The fourth and last chapter contains 'production code', as the author declares. It's a small in-memory database, with and embedded SQL interpreter and a JDBC driver. Very solid example, but it'll probably scare away a few 'intermediates'.

It all ends with an Appendix containing a great 'Design-Pattern Quick Reference', presenting the most used patterns in a very pragmatic format. Each pattern is explained via a diagram, some Java code snippets, its motivation, pros and cons, and a very original 'Often Confused With' paragraph.

Unlike all the other pattern books you've read before, this is not a reference. It's a real programming book that you'll have to read from cover to cover. You'll also need solid programming skills in order to understand the last two chapters (and especially the last one).

My gripes:

  • too much code. Probably more than 1/3 of the pages are just printed code.
  • typos. There is a slightly disturbing amount of typos in the book, even in some code snippets [like for instance 'Sting' instead of 'String'].

However, these problems should not scare away any potential readers. Because of its original pragmatic approach, 'Holub on Patterns' is surely in the Top 10 Java books for 2005.

Written by Adrian

January 4th, 2005 at 10:48 am

Posted in Books

Tagged with , ,