Netuality

Taming the big, bad, nasty websites

Archive for March, 2004

Effective testing of database schema – the missing link

leave a comment

There is a certain contradiction which appears in modern projects concerning the unit testing strategy. On one hand, there is a powerful assertion stating that business logic testing should be completely disconnected from the database. This makes perfect sense in a certain way : the tests should check the business logic, not the database and/or the persistence layer. Then, generally the persistence layer is a fully-fledged product (such as the excellent Hibernate) or other JDO-esque solution which has its own testing suite – no need to check that it really works. Usually, the link between business objects and persistence is “faked” using mock objects. Basically, this means that testing the code doesn't need a running database (well, code testing doesn't need a database at all).

The database schema should also be tested – the only tool I am aware of is the excellent DbUnit. Although more targeted towards data testing, it copes quite well with schema testing. Nicely integrated with Ant, DbUnit is the right solution for your database testing needs. And yes you do need to test your database since it is supposed to evolve along with the code (there's a great article about Evolutionary database design on Martin Fowler's site).

Somehow, we instinctively feel that something is missing from this picture. We are testing the code, disconnected from the database – and also the database, in a independent manner. But how can we be sure that the persistence layer between the application model and the database is ok ? And I'm not talking about the persistence mechanics, but the data model itself. Basically, this goes down to mapping testing. I am aware of the fact that some special O/R bindings do not need mappings and there is a direct object-table correspondence, but I feel that this is generally a BadIdeaTM since it hampers the flexibility of both the application model structure and database schema.

In the small-to-medium-sized projects I've been working lately we didn't feel the need of mapping testing. This has a very simple reason : the person which is performing the change on the database schema is usually the same person which needs a certain modification in the application model. After performing the modification, quite often this same person starts the application and makes a functional test which implicitly checks the mapping. Most of the time this works just fine.

However, some nasty problems might appear when the project starts to grow :

  • changing the mapping is more difficult, some kind of testing might give indications about the nature of the problem.
  • there is a certain “schema decay” when some foreign keys cannot be created at a certain point, then their creation is forgotten when the data is finally consistent. Further with schema evolution, more and more objectual data model relations will not be backed up by integrity constraints.
  • you may sometimes end up with unmapped and unused tables/views/columns.

A really useful testing tool should be able to check one or multiple mapping files against a database schema (via DbUnit, why not). The tool should :

  • a) recognize different mapping formats (Hibernate, Castor, etc.) and different database types
  • b) match the mapping declarations with the tables from the database, check their existence also the type of primitive columns
  • c) warn if some constraints are wrong or missing (based on simple aggregation, cardinality or other hints from the mapping structure).
  • d) warn for unmapped tables/views/columns.

Here's the good news : a tool which is able to perform a) and b) does exist ! And the bad news (purists will jump with disgust) : just for a moment, you should forgot about testing your code without the database. The solution is quite simple, build a unit test which fires up the persistence layer and retrieves at least one of each type of mapped object from a test database. If no exceptions are encountered, the test is ok. This is a basic but effective approach and :

  • be prepared to have a testing database different from the development database but with schema automatically synchronized.
  • harden your test case by inserting the most “exotic” test data you can find. If the data goes in via SQL (dbunit) but you have problems retrieving it via persistence layer, then look for missing schema constraints and sometimes some subtle mapping problems.

You could go one step further by performing update and deletion operations and check them via dbunit, but we have found that if the retrieval works, the persistence layer is perfectly able to perform updates and deletions. Now if your data layer is more complex, then just use some mock objects to test it – because it's a code issue and not a mapping issue.

If you are interested in the topic, just let me know by mail (still waiting for comment integration with FreeRoller). And yes, I'm still looking for a tool able to do a), b), c) and d).

Note : There is a simple technique that we are using currently. The idea is that, when the application starts, a simple retrieval is performed via the persistence layer for some objects that we know for sure must exist in all test and production datbases. It this succeeds you may be sure of two things : that the database connection really works and that the mapping is probably fine. This way, you don't have to wait the first persistence operation in order to see an error. Coupled with a nightly build and rerun, this little trick proved quite effective at keeping the mapping clean.

Written by Adrian

March 1st, 2004 at 5:21 pm

Posted in Process

Tagged with

Using jython to internationalize a PHP app

leave a comment

At first, this might seem a mind-boggling combination. What do
jython and PHP have in common (excepting the fact that I am a Python fan
and my current consulting task is in a PHP project) ?
Well, internationalizing a PHP app is pretty much a trivial task.
If you are a sensible PHP programmer insisting to use PEAR instead of randomly choosing a script from the tons of snippets
populating the “scripting websites”, I18N is probably the
safest choice.
Maybe – for you – application maintainability and performance are not exactly important concerns.
For me, they are. This is why I chose to store internationalized texts in files rather than database.
I'd rather keep the database for real data, which is created, modified, aggregated and such.
And I'd rather like to have an internationalized error message on the screen even if the database is down.
Now we know that we'll use I18N and text will be kept in some php files. However, I am no professional translator and
have no desire to translate or to manually maintain the correspondence between translators files and PHP files
(no, translators won't modify PHP code, stop this nonsense right away).
Code generation comes immediately in mind.
Basically, my first idea was to investigate wether the files used by the translators can be quickly transformed to PHP,
and if I am able to generate their formats from my own files (aka. “roundtrip internationalization process” ?).
Unfortunately, this is not an easy task – as the only clue was that the translators use Office tools such as Word or Excel, because they
rely upon some specialized translation software integrated with these products.
The easy choice is Excel, since it allows a better organization of data than having to search for tables in a Word document.
The hard choice is the tool that I'd use for automatically reading and even generating Excel files.

The difficulty comes from the fact I don't have Windows with Office installed on my desktop, just Gentoo Linux and OpenOffice.
Thus, I am unable to write a simple Python script which could perform my generation tasks via automation.
Fortunately, this is not the first time I am confronted with the issue.
I happen to know that there is a very nice Java tool that I wholeheartedly
recommend for your Excel processing needs :
JExcelApi.

Still, Java is a heavyweight programming language – it would be a really bad idea to fire up the

monster just for some easy processing of Excel files.
Here's why Jython comes naturally into equation. Four hours and about
100 lines of debugged code later, here I am sitting on top of a perfectly functional internationalization tool which :

  • generates PHP code from a big xls file (the root vocabulary) which centralizes all the internationalization texts
  • generates 2-language xls files for translators usage
  • updates the root vocabulary starting from the files modified by the translators
  • Automation scripts are already in cron and there's also a nice text document explaining translators where to get
    their files and where to put them after modification. The resulting script is not exactly fast, but this is tooling
    and not production so this should not be a problem after all.

    Whatever your project contraints are, give Jython a try and you'll be amazed … As they put it on the
    Useless Python site – If it were any simpler, it would be illegal.
    Finally there's a trick not quite related with Jyhon, nevertheless interesting.
    There is an easy way of solving the problem of translating phrases with real data inside them, with easy parameter swapping.
    We'll use the good old sprintf but not directly. We'll pass through a not so popular but extremely useful function,
    call_user_func_array. Suppose that our example needs the
    user name and authorization profile description to display inside a nice message. All you have to do is to define placeholders
    in I18N files which would fit as the first argument for sprintf. The following example should make it clearer:

    localization/en/login.php
    $messages = array(
    'loggedin'=>'You are authenticated successfully as user %1$s with profile %2$s.'
    );
    $this->set($messages);
    
    localization/fr/login.php
    $messages = array(
    'loggedin'=>'Vous avez le profile %2$s en tant qu'utilisateur %1$s.'
    );
    $this->set($messages);
    
    Simple passing of multiple parameters to I18N in PHP. Example function without error processing or data domain checking.
    #this is the multiple parameter function
    function complexTranslation($i18n, $label, $params)
    {
      return call_user_func_array('sprintf',array_merge(array($i18n->_($label)),$params));
    }
    
    Then, you have to initialize your I18N object. This can be done in a generic manner for all pages.
    #specific I18N initialization stuff
    require_once 'I18N/Messages/File.php';
    $g_language_dir = dirname($_SERVER['PATH_TRANSLATED']).'/localization/';
    $i18n =& new I18N_Messages_File($g_langCode,$script_name,$g_language_dir);
    
    Finally, use the function.
    #translate the successfull login message
    $loginbox = Tools::complexTranslation($i18n,'loggedin',array($operator->name,$profile->description));
    

    Written by Adrian

    March 1st, 2004 at 5:20 pm

    Posted in Tools

    Tagged with , , , , , ,

    Ant goodies : extracting info from Eclipse .classpath

    leave a comment

    IMPORTANT UPDATE: Please note that 'antclipse' is now part of the ant-contrib at Sourceforge, under Apache licence.

    Original blogpost:

    I hate duplicating information manually – besides, it's a known fact that duplication is classic code smell that tells you to refactor. This time it's not Java code, but something somewhat different : Ant used in Eclipse context. The issue here is that .classpath files generated by Eclipse have important information which is usually duplicated by hand in the build.xml script. SO many times I've changed libraries in my project in Eclipse just to discover that Ant task was broken…

    There surely are some workarounds like the task written by Tom Davies but unfortunately:

    • It's an Eclipse plugin. I want to be able to build my project standalone, we don't need no stinkin' plugin.
    • I's rather old and with a Nazi style checking of tags so it pukes on my 3.0M3 complaining about a certain attribute of type “con” in the .classpath file (lesson learned: don't be picky about tags and attributes names, if you want the plugin to work with future versions of the software which produced the XML document, especially when you do not have a schema or DTD to rely on)
    • It's Friday evening, dark weather outside, I'm alone in the house and the TV is broken (and even if it worked, there's nothing to see on TV anyway). Boys and girls, let's write an Ant task !

    From the documentation, it appears that writing an Ant task should be an easy task :) . And yes, it is, once you go past all the little idiosyncracies. Like mandatory “to” string in a RegexpPatternMapper, although all you want to do is matching, not replacing. Like having completely different mechanisms for Path and FileSet (I've always thought a Path is a “dumbed down” FileSet, but I was completely wrong, a fileset is somewhat “smarter” but it only has a single directory).

    The result is here, and everything you have to do is to download and put the antclipse.jar (7kB) in your ant/lib library and you're set (just remember to refresh Ant classpath if you're launching Ant from Eclipse).

    What does it do ? Well, it creates classpaths or filesets based on your current .classpath file generated by Eclipse, according to the following parameters :

    Attribute Description Required
    produce This parameter tells the task wether to produce a “classpath” or a “fileset” (multiple filesets, as a matter of fact). Yes
    idcontainer The refid which will serve to identify the deliverables. When multiple filesets are produces, their refid is a concatenation between this value and something else (usually obtained from a path). Default “antclipse” No
    includelibs Boolean, whether to include or not the project libraries. Default is true. No
    includesource Boolean, whether to include or not the project source directories. Default is false. No
    includeoutput Boolean, whether to include or not the project output directories. Default is false. No
    verbose Boolean, telling the app to throw some info during each step. Default is false. No
    includes A regexp for files to include. It is taken into account only when producing a classpath, doesn't work on source or output files. It is a real regexp, not a “*” expression. No
    excludes A regexp for files to exclude. It is taken into account only when producing a classpath, doesn't work on source or output files. It is a real regexp, not a “*” expression. No

    Classpath creation is simple, it just produces a classpath that you can subsequently retrieve by its refid. The filesets are a little trickier, because the task is producing a fileset per directory in the case of sources and another separate fileset for the output file. Which is not necessarily bad, since the content of each directory usually serves a different purpose. Now, in order to avoit conflicting refids each fileset has a name composed by the idcontainer, followed by a dash and postfixed by the path. Supposing that your output path is bin/classes and the idcontainer is default, the task will create a fileset with refid antclipse-bin/classes. The fileset will include all the files contained in your output directory, but without the trailing path bin/classes (as you usually strip it when creating the distribution jar). If you have two source directories, called src and test, you'll be provided with two filesets, with refids like antclipse-src and antclipse-test.

    However, you don't have to code manually the path since some properties are created as a “byproduct” each time you execute the task. Their name is idref postfixed by “outpath” and “srcpath” (in the case of the source, you'll find the location of the first source directory).

    A pretty self-explanatory Ant script follows (“xml” is a forbidden file type on jroller, so just copy paste it into your favourite text editor). Note that nothing is hardcoded, it's an adaptable Ant script which should work in any Eclipse project.

    Created with Colorer-take5 Library. Type 'ant'
    <?xml version="1.0"?>
    <project default="compile" name="test" basedir="."> <taskdef name="antclipse" classname="fr.infologic.antclipse.ClassPathTask"/>
    <target name="make.fs.output">
    <!-- creates a fileset including all the files from the output directory, called ecl1-bin if your binary directory is bin/ -->
    <antclipse produce="fileset" idcontainer="ecl1" includeoutput="true" includesource="false"
    includelibs="false" verbose="true"/> </target>

    <target name="make.fs.sources">
    <!-- creates a fileset for each source directory, called ecl2-*source-dir-name*/ -->
    <antclipse produce="fileset" idcontainer="ecl2" includeoutput="false" includesource="true" includelibs="false" verbose="true"/>
    </target>

    <target name="make.fs.libs">
    <!-- creates a fileset sontaining all your project libs called ecl3/ -->
    <antclipse produce="fileset" idcontainer="ecl3" verbose="true"/>
    </target>

    <target name="make.cp">
    <!-- creates a fileset sontaining all your project libs called ecl3/ -->
    <antclipse produce="classpath" idcontainer="eclp" verbose="true" includeoutput="true"/>
    </target>

    <target name="compile" depends="make.fs.libs, make.fs.output, make.fs.sources, make.cp">
    <echo message="The output path is ${ecl1outpath}"/>
    <echo message="The source path is ${ecl2srcpath}"/>
    <!-- makes a jar file with the content of the output directory -->
    <zip destfile="out.jar"><fileset refid="ecl1-${ecl1outpath}"/></zip> <!-- makes a zip file with all your sources (supposing you have only source directory) -->
    <zip destfile="src.zip"><fileset refid="ecl2-${ecl2srcpath}"/></zip> <!-- makes a big zip file with all your project libraries -->
    <zip destfile="libs.zip"><fileset refid="ecl3"/></zip>
    <!-- imports the classpath into a property then echoes the property --> <property name="cpcontent" refid="eclp"/>
    <echo>The newly created classpath is ${cpcontent}</echo>
    </target>
    </project>

    TODOS : make “includes” and “excludes” to work on the source and output filesets, find an elegant solution to this multiple fileset/directories issues, and most important make it work with files referenced in other projects.

    I am aware that the task is very far from being perfect, so just download it if you're interested, try to use it, try to break it, and tell me what you think and how it can be improved. Also, if you're interested in the source, just send me an email, but be aware that it's Friday evening beer-induced source code, nothing to be proud of… It was only tested it with Ant 1.5.x so YMMV. I assume no responsibility if you use it a production environment.

    Written by Adrian

    March 1st, 2004 at 5:17 pm

    Posted in Tools

    Tagged with , , ,

    A smoother, gentler hibernation

    leave a comment

    Last week, while optimizing a Java app, we have stumbled upon an interesting trick.
    Well, I suppose it's interesting since I haven't been able to find out any trace of it in Hibernate docs or FAQ.
    So, you are using Hibernate for O/R persistence layer of your latest Java app. Welcome to the club.
    Suppose that your app is distributed or your business logic is on multiple servers for performance reasons. In other words, your database is very frequently accessed from multiple points. Thus, each and every display of a projection of your data (like, an innocent “patients list” screen) has to perform a data retrieval operation, aka SELECT. You are not yet at the point of giving up realtime functionality for performance reasons (via complex caching), but however your queries seem pretty slow. And still, you are using probably the faster O/R mapping tool alive.
    Easiest path to data persistence passes through Hibernate Transaction API. And your transaction looks like that (c/p from the docs) :

    [...]
    Session s = sessions.openSession();
    Transaction tx = null;
    try
    {
    tx = sessions.beginTransaction();
    fooList = s.find(
    "select yummy from Big where complex");
    tx.commit();
    }
    catch (Exception e)
    {
    if (tx!=null) tx.rollback();
    s.close();
    throw e;
    }
    return fooList; //or something similar, which goes to the view, via controller
    [...]


    And here's the trick, if you do only SELECTs, there is no point in commiting the transaction. Because (and I'll quote again the manual):

    “Flushing the session
    If you happen to be using the Transaction API, you don't need to worry about this step. It will be performed implicitly when the transaction is committed.”


    Even if you know for sure that you haven't modified your data, the API still has to check for modifications ! And when data is pretty complex, this might take a pretty long time. Therefore, the following approach :

    [...]
    Session s = sessions.openSession();
    Transaction tx = null;
    try
    {
        tx = sessions.beginTransaction();
        fooList = s.find(
        "select yummy from Big where complex");
        tx.rollback();
    }
    catch (Exception e)
    {
        if (tx!=null) tx.rollback();
        s.close();
        throw e;
    }
    return fooList; //or something similar, which goes to the view, via controller
    [...]


    shoudn't change anything in application behavior, all for a “transaction time” divided by 3.
    Hey, that is great… The app runs sensibly faster – for one line of code.

    Feedback: We do have now a “documentation-compliant” solution thanks to Viktor Szathmary, who suggested a session.setFlushMode(FlushMode.NEVER) for the specific session. We haven't yet profiled this, but. However, we have a problem due to the fact that each session is used a few times before being thrown out. No, we do not pool Hibernate sessions but there's a fair bit of reuse sometimes, behind the business logic. Depending on the type of transaction implied, the setFlushMode should change (or not). Ok I have to admit it's a legit idea, but it's a supplementary line of code. And where's the fun ? :)

    Written by Adrian

    March 1st, 2004 at 5:15 pm

    Posted in Tools

    Tagged with ,

    Real life Jakarta Velocity – simple optimizations

    leave a comment

    Here's another nice “real life” story I'd like to share, and this time is about Velocity. Nothing nasty, just a quick check of if and how well Velocity caching is performing, why and how to make your own Velocity ResourceLoader.

    The guys I'm working with are migrating a rather huge ERP app from mainframe to Java multi-tier. This isn't exactly an quick and easy job, so basically my first step here is to find out a lot of things about the old system (via some boring but extremely necessary training). However, in order not to lose my so-called “Java skills”, I am also performing some tasks, mainly testing and optimization stuff, preparing the baby to face the harsh real world.

    If you haven't seen an ERP tailored for production sites before, you'll be amazed at the massive number of barcode stickers which have to be printed. They are everywhere, from production to distribution, relaying boxes, smaller boxes, bigger boxes, packs, containers, everything your mind can think of.
    These barcodes are produced on special printers which are usually connected to the production systems via serial port (IP connection is possible, but quite expensive so it's used only in very special cases, like really large warehouses).


    Barcode image

    Then, there are these rather thick clients (SWT) deployed at different points in the production/packaging/distribution workflow. Each one has its particularities, however they ALL have to print barcodes, and print them FAST.

    This barcode stuff is not as simple as you might think. Depending on the specific point in the workflow, a different barcode must be printed, containg different data or maybe similar data but in other printable formats. This is a perfect fit for a tool such as Velocity.

    The main issue here is that the templates are not in the filesystem, but they are extracted from a central database, where they are stored and managed by specific tools (an IDE-like tool is used to position different barcode elements on the printed stickers). The first [and easiest] solution was to use the Velocity.evaluate() function. This one-liner worked just fine until the performance test, where it was decided that the barcode generation is slow. It wasn't apparent at first, but you see – in packaging half a second is a pretty long time and the cummulated delays might make the customer lose some serious money at the end of the day.

    The first idea was to look for a way of using Velocity's ResourceLoader, thus being able to forget the usage of evaluate() and use the classic VelocityEngine-Context-Template-merge mantra #:

    import java.io.ByteArrayInputStream;
    import java.io.InputStream;
    import org.apache.commons.collections.ExtendedProperties;
    import org.apache.velocity.exception.ResourceNotFoundException;
    import org.apache.velocity.runtime.resource.Resource;
    import org.apache.velocity.runtime.resource.loader.ResourceLoader;

    public class MyResourceLoader extends ResourceLoader { public void init(ExtendedProperties arg0) {
    // TODO Auto-generated method stub }
    public InputStream getResourceStream(String templateName) throws ResourceNotFoundException {
    //TODO: exceptions here
    InputStream in= new ByteArrayInputStream( (TemplateContentsSingleton.getUniqueInstance().getTemplate(templateName)).getBytes());
    if (in == null) { String msg= "*** BuiltInTemplateResourceLoader Error: cannot find resource " + templateName;
    throw new ResourceNotFoundException(msg);
    }

    return in;
    }
    [...]
    }

    This is pretty “spike-ish” and completely non-thread-safe code, kids don't try this at home without correctly processing all errors and managing modification flags. Basically, the loader is using a TemplateContentsSingleton class that wraps inside it a hashmap of templates, indexed by their key (which by the way is a String, and that's just about perfect).

    public class TemplateContentsSingleton {

    /** unique instance */
    private static TemplateContentsSingleton sInstance= null;
    /** template containers */
    private Map tmplContainers= new HashMap();
    /** * Private constuctor */
    private TemplateContentsSingleton() {
    super();
    }
    /**
    * Get the unique instance of this class.
    */

    public static TemplateContentsSingleton getUniqueInstance() {
    if (sInstance == null) {
    sInstance= new TemplateContentsSingleton();
    }
    return sInstance;
    }
    public void setTemplate(String key, String templateContent) {
    this.tmplContainers.put(key, templateContent);
    }
    public String getTemplate(String templateName) {
    return (String) this.tmplContainers.get(templateName);
    }
    }

    For your extreme comfort, this is an ultra-classic singleton contaning a hashmap.
    Don't forget to initialize Velocity with the corresponding properties :

    Properties veloProps= new Properties();
    veloProps.setProperty("resource.loader", "custom");
    veloProps.setProperty("custom.resource.loader.description","Customized Velocity Template Resource Loader");
    veloProps.setProperty("custom.resource.loader.class", MyResourceLoader.class.getName());
    veloProps.setProperty("custom.resource.loader.path", "");
    veloProps.setProperty("custom.resource.loader.cache", "false");

    Finally we'll be able to test the effect of caching different objects. Obvious candidates for caching are the template and the context. We'll render one of the (pretty small) templates 1000 times, then compute the mean rendering time. We'll do the benchmarking with Velocity caching disabled and enabled (by setting custom.resource.loader.cache to “true”). OK, let's get to work :

    timing graph for 100 rendering



    We see clearly that there is basically no performance difference between Velocity caching and “hand-picked” objects caching. However, wrapping String templates inside a Velocity ResourceLoader gives us an important speed improvement in template rendering, varying from a 10x (cache off) to 4x (cache on) factor. Interesting and rather unexpected here is that even with plain simple Velocity.evaluate() – Velocity caching decreases the merging time (probably Context caching). Meaning that a simple property set could speed up the barcode generation dividing by 2 the time necessary for the merge. Sometimes it really pays off to read the documentation.

    In conclusion, use a custom ResourceLoader and don't forget to enable caching for Velocity maximum performance. Well, this is nice but simple; right ? Something crunchier probably in the next episode …


    # Syntax coloring graciously provided by Codepaste

    Written by Adrian

    March 1st, 2004 at 5:10 pm

    Posted in Tools

    Tagged with ,