Netuality

Taming the big, bad, nasty websites

Archive for the ‘programming’ tag

If programming is like gardening …

leave a comment

… then a software team is like an aquarium.

“Programming is Gardening, not Engineering” says Andy Hunt (of Pragmatic Programmer fame) in one of his well-known Artima conversations.

Inspired by such an interesting ‘organical’ comparison, it’s my metaphor of a software team which behaves quite like an aquarium. I assume not all my blog readers are aquaria hobbists, so let me explain:

  • Permanent monitoring and adjusting. Left alone and unsupervised, an aquarium apparently manages to ‘survive’ by itself. However, subtle changes in water chemistry will slowly start to build up. Interesting fact is that fishes seem to cope well with these changes – until a certain balance is reached and they get sick and eventually die. In my experience, the threshold is rather thin, one day everything seems ok and the next day it’s a major disaster. The effort necessary to clean up the situation is significantly bigger than the effort spared by not taking care of the aquarium. The parallel here is quite obvious : you can’t manage what you can’t measure, you can’t control what you can’t manage. Software metrics, code reviews, frequent releases, testing and feedback, these practices are vital if you want a ‘healthy’ project and a ‘living’ team. Otherwise, beware, the inflexion point might be just a few days away*.
  • However, changes must be done gradually. Supposing that a major shift in water parameters was detected, taking immediate and radical measures will generally worsen the situation (unless the catastrophy is already there). It is highly recommended to distribute the change over a reasonable period of time and generally never try to influence two major water parameters at the same time (Ph and Gh for instance). Explanation: all these parameters are interconnected in intricate ways, by changing one you’ll automatically influence the others. By changing two or more, the outcome is hard to predict and might open the path to a disaster. There’s a nice parallel here. A major change in methodology with sudden introduction of multiple new/modified development practices, will only make the team unstable. Even if, globally speaking, the change is a highly beneficial one. ‘Good things come to those who wait’ … and measure … and change … and wait … and measure … and change …
  • A beautiful aquarium is a visible one. Transparent glass, lights and everything. You wouldn’t feed and keep your fishes if they were living in a black box and you are afraid to look inside it ?

*Of course [and fortunately], the developers do not get sick because of a reeking team/project, they simply leave.

Written by Adrian

November 4th, 2004 at 7:45 pm

Posted in Process

Tagged with ,

Review : Hibernate in Action

leave a comment

Disclaimer : this review is based on the MEAP draft. Things might be (a little) different in the final version.

From a documentation point of view, Hibernate is one of the most notable exception in the world of open-source LGPL'ed projects. Its website offers a plethora of information, from solid documentation (the reference has no less than 141 pages) and various FAQs to sample projects and third-party resources. The forum is quite active and you may get answers to tricky questions. Or a little bit of rough treatment in case you haven't RTFM – but that is understandable, given the number of questions that the authors have to answer every day.

Under these circumstances, one might wonder what Gavin King (Hibernate founder) and Christian Bauer (documentation/website maintainer and Hibernate core developer) can add in order to be able to write a 400-pages book about Hibernate. I mean – sure – only by joining the reference documentation, different FAQs and guides, one can easily 'extract' a hefty 'manuscript' with more than 200 pages.

Well, I am extremely glad to tell you that this is not the case. The book not only gets you up to speed with Hibernate and its features (which the documentation does quite well). It also introduces you to the right way of developing and tuning an industrial-quality Hibernate application. I consider myself a pretty seasoned Hibernate developer, being familiar with the API since its 1.2 version in Q1-2002 (if I remember well the first app when we used Hibernate). However, I was proved wrong by “Hibernate in action” which describes best practices and even API features that were unknown or vaguely known to me. That is, until now.

The first chapter, in the good tradition of all first chapters in the world, is an introduction. It's a very well written introduction about why do we need ORM solutions in OO applications. The chapter explains the O/R impedance mismatch, while declaring quickly that OODB suck (immature and not widely adopted). Wel'll also find out that EJB also suck from a persistence point of view (for various reasons). Which can be quite a surprise knowing that Gavin is one of the authors of EJB3.0 specs. Or, on the contrary, this will explain a lot of things in the new EJB specs.

Now that we have cleared the “why Hibernate” issue, let's continue to the second chapter. Which – tradition obliged – is a “Hello, world” and a “Let's get started” chapter. Here you go, almost 50 pages later you should be able to write simple Hibernate-based persistence layers and integrate within an application server, like for instance … Jboss ! Humm, well, why not ? They are sponsors of the Hibernate project, after all.

In the 3rd chapter, our fresh knowledge will be put to good use by starting the development of an online auction application called CaveatEmptor. This app will follow our reading progression and will grow bigger and smarter chapter by chapter. But for the moment, we are at the inception phase. What gives : a little bit of analysis, a stylish class diagram of the domain model and the resulting mapping file. And if you thought (based on 2nd chapter) that the mapping file is very intuitive and simple, you're in for a big surprise : it is, indeed, intuitive and simple ! Quite bizarre for an open-source project. As a matter of fact, the mapping file is one of the pivotal elements of Hibernate, since it addresses directly the O/R impedance mismatch, a recipy for transparent linking your POJOs and the constrained relational model. No wonder that a big part of this chapter is aimed at explaining why and how the mapping works in Hibernate. You'll see how class associations and inheritance translate at the metadata and mapping level. You'll start to understand the things that you took for granted in the previous chapter and you'll have that pleasant “uuh, I see” chain reaction. Hold on, it's just the beginning.

Because chapter 4 is going to explain once and for all the lifecycle of persistent object in Hibernate, their behavior from a persistence point of view as well as the available fetching strategies. And if you thought you already knew everything by heart from the documentation … well, maybe you do know everything by heart. Nevertheless, it's very well synthetized in chapter 4 and I'll recommend it anytime to a coworker eager for Hibernate knowlege.

In the next chapter (the 5th) the rollercoaster slows down a bit. That is, if you already know the behavior associated with the four possible isolation modes in transactions, what are the different types of locking, what (the hell) MVCC means and the importance of transaction scopes. Chances are you already know some of this stuff quite well, but everybody needs a refresher from time to time, especially when it's well explained and when it comes with versioning and caching (1st and 2nd level) in Hibernate as a desert. By the way, I thought that OSCache supports clustering, not only SwarmCache and JbossCache, as stated in the book. There's even a thoroughly explained example of using JbossCache as a level 2 clustered cache for Hibernate, but it shouldn't be too hard to convert to other types of caching systems.

Now, if I were the author of the book, I would have placed chapter 6 before chapter 5. But I am not the author, which is quite fortunate for you dear readers since Christian and Gavin are much more competent than me at writing books about Hibernate (and probably at some other unrelated domains). They have decided to go back to mapping in chapter 6, after the short transaction/caching intermezzo. Well, they should know better… it's time for a serious dose of advanced mapping. This chapter is attacking interesting subjects such as custom mapping types (simple or composite) and (finally) the mapping of collections. Special guests stars: the whole gang of “sets, bags, lists and maps”, together with explanations about their relational equivalent (associations, associations and associations !). Oh and yes “polymorphic association” (section 6.4.3) – I wasn't even aware that Hibernate is able to do that… guess I'm not that 'seasoned' (as a Hibernate developer) after all.

The 7th chapter is about “Retrieving objects efficiently” : about 45 pages for the 'retrieving' part and 6 pages for the 'efficiently' part. Fair enough ! You'll learn how to master basic HQL queries (parameters, pagination …). You'll get a grip on the query by criteria API, as well as on advanced stuff such as dynamic queries, filters, subqueries and native SQL (very powerful). At the end of the chapter there's the Hibernate-specific solution for the n+1 selects problem, query caching and result iterators.

Following this wealth of useful knowledge, the 8th chapter starts a bit dry. Nevertheless, after a short introduction about Hibernate in managed environments, you'll find yourself again in the land of advanced programming techniques : application-level transaction implementation ! This is mostly new stuff (at least for me) – a great collection of best practices for transactional behavior management in industrial-quality apps. Somewhat unrelated but still interesting, the chapter ends with legacy schemas integration and a smart implementation example for audit logging.

The 9th (and last) chapter is about the roundtrip development in Hibernate using the classical toolset : Middlegen and/or hbm2java and/or XDoclet. All the available techniques are presented in a very detailed, step-by-step manner.

Wait : don't close the book, there's more ! Ignore Appendix A (a short and rather uninteresting document about SQL fundamentals – that is, if you know SQL). Appendix B contains mildly un-fascinating ORM implementation strategies pour les connaisseurs (come on guys, I'm just a dumb user). But – Appendix C is a great collection of real-world stories and by all means read them all ! Especially the last one, a treasure of hard to find knowledge (no spoilers, please…).

In the end, I have to confess that there is something truly interesting about 'Hibernate In Action' : albeit very technical, it reads astonishingly easy – and this kind of books is unfortunately very rare nowadays. My congratulations to the authors for this excellent piece of work – it was worth the wait.

As for you dear potential reader, if you already know all the information detailed in the book, I bow before you, great Hibernate wizard. But if you don't, what are you waiting for ? Because, if you're going to read only one technical book this summer, make sure that it's 'Hibernate In Action' (or, at least chapters 6,7 and 8, if you are that good !).

Written by Adrian

August 5th, 2004 at 10:42 pm

Posted in Books

Tagged with , , , ,

Junit : it's not [only] about the API

leave a comment

Being extremely busy lately, I arrive a bit late at the Junit destruction feast. While it is probably true that some guys with a certain gift for writing blog articles may “come up with something far more useful in a couple of days”, I think the discussion is missing an important point: there's a whole ecosystem living around Junit. We have Ant integration, we have the choice between code coverage tools (both commercial and open-source), plugins for mainstream IDEs and a certain number of useful or less-useful extensions. We have extensive documentation and a plethora of examples to feed the small fishes. Throwing Junit down the drain means throwing all these down the drain. Or, at least: write your own Ant integration, adapt a code coverage tool and rewrite the IDE integration, rewrite documentation and examples – this is not going to be done in “a couple of days”.

Another Junit advantage is that this little simplistic API is ubicuous. I mean, every developer heard about it and knows how to use it, unless of course he/she was living under a rock for the last few years. And I don't mean every Java developer, but just about every developer for a language under the xunit umbrella. Meaning : all the programming languages (unless you consider “languages” such as Whitespace, Brainfuck and INTERCAL).

Beck and Gamma have not only written some “crappy” classes and put the few “laughable” chunks of code on Sourceforge, they have done it first. Now, there is some well-founded criticism about the lack of evolution in Junit, but one thing is undeniable : it really did fill a niche, back then in 2000. The code may not be beautiful (and this is not good coming from XPers) but it serves its purpose : to provide a simple framework for unit testing.

Competition is the key here and smart newcomers on this “market” are good news for us programmers. But, it's gonna take some time and a lot of work to build a similar ecosystem, a similar mindshare and usurp Junit's kingdom. That would be of course more interesting to see than denial of four years of Junit influence in a few well-rounded, but futile phrases.

Written by Adrian

July 14th, 2004 at 9:55 am

Posted in AndEverythingElse

Tagged with , , , ,

Hallowed be thy tablename !

leave a comment

If you haven't had the opportunity to work on a really big project, naming is probably not on your top list of programming best practices. And you are certainly going to regret that when your project grows.

Of course, everybody, including good old Scott, knows that CUST signifies CUSTOMER and DEPT signifies DEPARTMENT. And statistically speaking, the chances for these abreviations to mean something else is very small – as long as your domain model is, also, quite small. But, when the number of classes in the domain is in the hundreds or even in the thousands you'll suddenly find out that CUST may signify CUSTOMS (as in 'Customs Tax'), CUSTOMIZATION or even CUSTARD. I am working right now in the development team of an ERP for agro-food industry and wouldn't be amazed to see such an attribute name. I've seen worst, some details of the implemented business model are a total blasphemy for human logic and common sense.

Anyway, the problem is even worse in these big projects because domain model classes are not written by hand, they are generated. While this is hardly a novelty for you (please don't laugh in the audience), it also means that analysts are composing the datamodel, then classes/mappings/SQL schema/docs are generated, finally programmers will write the business logic and infrastructure integration using the generated artifacts. Names are usually propagated all along the generation chain. And when a programmer finds 'Cust' in the name of an attribute, how does she know it's a 'Customer' and not some 'Custard' ? Especially when the documentation is scarce and the author analyst is in a well-deserved six-months sabbatical in Anctarctica.

Hence, the need for standardization. This is usually done via a dictionary containing the abbreviations and their meaning(s). The rule is very simple : every word in the datamodel must be composed of abbreviations from the dictionary. Some programmers might argue that there is no need for abbreviations and full words are ok – lovely code such as '.getSecondaryBillingAddressForService(currentBill.getBillableServicesList(i).get(currentService)).getStreet().getName()'. This is perfectly understandable, however let's not forget that some databases (Oracle, Sap DB, etc.) have issues with table and column names longer than 32 characters, like for instance refusing to create it in the first place. Which is mildly bothersome if you use a relational database*.

And the golden rules of domain model naming are :

  • Be a pedantic bastard. Don't just throw the dictionary in the wild and tell people 'yeah, pleease follow this standard'. Make automatic checking on every piece of datamodel feeding the code generator. The automatic checking should be done at each save operation if possible. I have implemented this inside an Eclipse plugin used by the project analysts: when hitting save on an entity containing invalid names, a window will immediately pop up and inform about the errors. Don't just display the errors, but completely forbid saving if the entity has naming issues. This will keep the naming absolutely pristine, however the analysts might be tempted to create a lynch mob. Do not give up.
  • Avoid synonyms, plurals, etc. This is a software product, not a grammar contest.
  • Throw some stats on the mail from time to time to tell how well the model is named. People will like that.

My current gig involves, among other interesting stuff, managing the naming tools in the various Java projects that we are developing. Unfortunately, the naming rules were not really enforced (they had no pedantic bastards before me ?), so the domain model is only partially compliant. Hence, I'm in the midst of developing tools for automatic renaming of model and the new code is going to disrupt the activity for a while (thank God for autocompletion features in modern IDE's !). Things would have been much smoother if the naming was enforced from the beginning. I think there is not such thing as 'too late' to put naming in order in a big project. And it'll absolutely be done, because there's very strong managerial support for this kind of tasks (main company shareholder and CEO is a former programmer himself, as well as a quality buff – 'when time permits'™).

Unfortunately, I had to allow some 'non-compliant' islands of code in the modules which are already deployed at customers. But, have no false hopes, sooner or later I'm gonna get that code too. I'm a pedantic bastard, and proud of it.

* Now, if you're using a wanabee storage solution like Prevayler to store gigabytes of business data (or more!), you have much bigger problems than naming. Please stop reading this article and do something about it.

Written by Adrian

June 26th, 2004 at 11:54 pm

Posted in Process

Tagged with , ,

Using jython to internationalize a PHP app

leave a comment

At first, this might seem a mind-boggling combination. What do
jython and PHP have in common (excepting the fact that I am a Python fan
and my current consulting task is in a PHP project) ?
Well, internationalizing a PHP app is pretty much a trivial task.
If you are a sensible PHP programmer insisting to use PEAR instead of randomly choosing a script from the tons of snippets
populating the “scripting websites”, I18N is probably the
safest choice.
Maybe – for you – application maintainability and performance are not exactly important concerns.
For me, they are. This is why I chose to store internationalized texts in files rather than database.
I'd rather keep the database for real data, which is created, modified, aggregated and such.
And I'd rather like to have an internationalized error message on the screen even if the database is down.
Now we know that we'll use I18N and text will be kept in some php files. However, I am no professional translator and
have no desire to translate or to manually maintain the correspondence between translators files and PHP files
(no, translators won't modify PHP code, stop this nonsense right away).
Code generation comes immediately in mind.
Basically, my first idea was to investigate wether the files used by the translators can be quickly transformed to PHP,
and if I am able to generate their formats from my own files (aka. “roundtrip internationalization process” ?).
Unfortunately, this is not an easy task – as the only clue was that the translators use Office tools such as Word or Excel, because they
rely upon some specialized translation software integrated with these products.
The easy choice is Excel, since it allows a better organization of data than having to search for tables in a Word document.
The hard choice is the tool that I'd use for automatically reading and even generating Excel files.

The difficulty comes from the fact I don't have Windows with Office installed on my desktop, just Gentoo Linux and OpenOffice.
Thus, I am unable to write a simple Python script which could perform my generation tasks via automation.
Fortunately, this is not the first time I am confronted with the issue.
I happen to know that there is a very nice Java tool that I wholeheartedly
recommend for your Excel processing needs :
JExcelApi.

Still, Java is a heavyweight programming language – it would be a really bad idea to fire up the

monster just for some easy processing of Excel files.
Here's why Jython comes naturally into equation. Four hours and about
100 lines of debugged code later, here I am sitting on top of a perfectly functional internationalization tool which :

  • generates PHP code from a big xls file (the root vocabulary) which centralizes all the internationalization texts
  • generates 2-language xls files for translators usage
  • updates the root vocabulary starting from the files modified by the translators
  • Automation scripts are already in cron and there's also a nice text document explaining translators where to get
    their files and where to put them after modification. The resulting script is not exactly fast, but this is tooling
    and not production so this should not be a problem after all.

    Whatever your project contraints are, give Jython a try and you'll be amazed … As they put it on the
    Useless Python site – If it were any simpler, it would be illegal.
    Finally there's a trick not quite related with Jyhon, nevertheless interesting.
    There is an easy way of solving the problem of translating phrases with real data inside them, with easy parameter swapping.
    We'll use the good old sprintf but not directly. We'll pass through a not so popular but extremely useful function,
    call_user_func_array. Suppose that our example needs the
    user name and authorization profile description to display inside a nice message. All you have to do is to define placeholders
    in I18N files which would fit as the first argument for sprintf. The following example should make it clearer:

    localization/en/login.php
    $messages = array(
    'loggedin'=>'You are authenticated successfully as user %1$s with profile %2$s.'
    );
    $this->set($messages);
    
    localization/fr/login.php
    $messages = array(
    'loggedin'=>'Vous avez le profile %2$s en tant qu'utilisateur %1$s.'
    );
    $this->set($messages);
    
    Simple passing of multiple parameters to I18N in PHP. Example function without error processing or data domain checking.
    #this is the multiple parameter function
    function complexTranslation($i18n, $label, $params)
    {
      return call_user_func_array('sprintf',array_merge(array($i18n->_($label)),$params));
    }
    
    Then, you have to initialize your I18N object. This can be done in a generic manner for all pages.
    #specific I18N initialization stuff
    require_once 'I18N/Messages/File.php';
    $g_language_dir = dirname($_SERVER['PATH_TRANSLATED']).'/localization/';
    $i18n =& new I18N_Messages_File($g_langCode,$script_name,$g_language_dir);
    
    Finally, use the function.
    #translate the successfull login message
    $loginbox = Tools::complexTranslation($i18n,'loggedin',array($operator->name,$profile->description));
    

    Written by Adrian

    March 1st, 2004 at 5:20 pm

    Posted in Tools

    Tagged with , , , , , ,