Netuality

Taming the big, bad, nasty websites

Archive for the ‘XML’ tag

… and a few lesser known Java tools

leave a comment

Very very busy lately, but I'd like to share some knowledge about a few useful Java OSS gems that were not easy to find. Mr. Google, please 'index this':

1. Aspirin is a self-contained SMTP server (send only) written in Java, open-sourced and free. It simplifies configuration and deployment by allowing your app server to send emails without passing through an external SMTP server. The project is heavily inspired from Apache James code (thus its licencing terms). The few problems I see right now are : possible performance issues when sending big volumes of mail, behavior still erratic (sometimes sending fails without plausible reason), failure reports which do not provide reasons of failure. However, the thingie works pretty well and is a big time saver because, well, configuration is not the most pleasant part of a complex server.

2. If you produce a lot of reports and want to send them automatically on a remote printing server you may use JIPSI (quickstart in English, but site in German) which implements CUPS as a Java Print Service API. This little beauty was found by one of my coworkers and the 'report guys' seem to be making good use of it.

3. You're in for some serious processing on OpenOffice documents using the freely available DTD's (downloadable from the OOo CVS server) ? Then hold your horses ! I've tried to make sensible use of them and failed abruptly. Let's just say that those DTDs are a big pain in the a**: to begin with, no tool is able to transform them into a schema. I've tried XmlSpy and a few other exotic softwares, without success. Even basic stuff like parsing with a validating parser does not work. So much for the usefulness of open standards. Eventually, I have ended up by using the excellent Writer2Latex. Don't be fooled by the name, you may do all sorts of conversions with it, including Writer to XHTML, which I was interested in. You can even write your own plugin to boot some exotic formats, because Writer2Latex is built around a modified version of XMerge. Officially, XMerge is the solution for visualizing documents on 'Small Devices', but it really is a fancy plugin-based document converter. Most probably (too lazy to check the sources), SAX-based with a nonvalidating parser. Go figure.

4. The Eclipse download site has now links to a BitTorrent tracker. I just used it succesfully in order to download RC3 at a reasonable download rate (anyway, being on wifi right now I wasn't expecting blazing speed). I found interesting that all the other peers were using Azureus, a torrent client written in Java+SWT. Azureus is a fantastic source of knowledge, choke full of tips and techniques for writing professional-looking and very responsive SWT apps. But not only : Azureus is also a great example about how to write a plugin-ready app, which performs automatic updates from the net. Not bad, at all.

Written by Adrian

June 21st, 2004 at 8:14 pm

Posted in Tools

Tagged with , , ,

Comparing FOP and JasperReports

one comment

Anybody looking for OSS reporting solutions in Java usually has to make a choice between Apache FOP and Jasper Reports*. While having somewhat different feature sets and addressing distinct reporting solutions, the two APIs boil down to the same basic thing : generate a report from an XML file (or stream/string/whatever). FOP has a clear advantage of standardization (based on XSL-Formatting Objects) while Jasper plays more in the pragmatic field of obtaining those 80% results with a minimum of effort and uses a proprietary XML format.

But FOP is not a standalone reporting solution : it's just a way of transforming XSL-FO files into a report. In order to fill the report with the necessary data, the obvious choice is a templating engine such as Jakarta Velocity. Thus a FOP report creation is a two-step operation :

  • create the XML report via Velocity
  • feed the XML stream to FOP

Jasper alleviates this problem by including its own binding engine, the only restriction being that input data should support some constraints (such as putting your 'rows' inside a JRDataSource).

Both Jasper and FOP allow inclusion of graphic files inside, usual formats (GIF, JPEG) are supported, however FOP has a nice bonus of rendering SVG inside reports. Unfortunately, this comes with the price of using Batik SVG Toolkit, which is a bulky (close to 2MB) and rather slow API. While processing your dynamic charts as XML files (Velocity again) is a seducing idea, the abysmal performance of SVG rendering will make you give up in no time. Unfortunately, I speak from experience.

At first sight, FOP has a lot more options for output format, compared to Jasper Reports. Of course there's PDF and direct printing via AWT, but also Postscript, PCL, MIF as well as SVG. These choices are quite intriguing, since Postscript and PCL are printing formats (easily obtained by redirecting the specific printer queue into a file), MIF is a rather obscure Adobe format (for Framemaker) and SVG … well, a SVG report is too darn slow to be useable (yes, I was foolish enough to try this, too). Jasper makes again a pragmatic choice by allowing really useful output formats such as HTML, CSV and XSL (never underestimate the power of Excel); and of course: direct printing via AWT and PDF.
While FOP's latest version (0.20.5) was released almost a year ago (summer 2003), Jasper Reports is bubbling with activity – Teodor releases a minor version each one or two months (latest being 0.5.3 at 18.05.2004).

I've decided to use as a 'lab rat' one of the apps developed during my 'startup days': the client GUI is written in Swing and features a few mildly complex reports generated using Velocity+FOP. FOP version is 0.20.4 (the current version back in Q1-2003, when we had to quit dreaming about the 'next financing round' and development halted) but as I already told you FOP has evolved little since then. Though, it's perfectly reasonable to use this implementation as a witness for comparison with Jasper (on the opposite, Jasper has evolved a great deal since Q1-2003).

Back then, the report development cycle was quite simplistic. In fact, the XSL-FO templates were written by hand inside a text editor and the application code was run (via a Junit testcase and some necessary configuration and business data mocking) in order to generate a PDF report. In the case of errors, we had feedback by examining the error traces. Visual feedback was given by the PDF output. While simple to perform, this cycle was extremely tiresome after a while as there was an important overhead : start a new JVM, initialize FOP, fire Acrobat Reader (plus we were using some crappy – even by the standards of 2003 – 1GHz machines w 256/512MB RAM). A WYSIWYG editor would have been nice, so one of my coworkers has made some research and the only solution he found was XMLSpy (Stylevision not available back then) – but, at 800USD/seat this was 'a bit' pricey** for us (only the Enterprise flavor covers FO WYSIWYG editing !?). Another interesting idea was to use one of the conversion tools (from RTF to FOP) such as Jfor, XMLMind or rtf2fo (of these products, only Jfor is free, but feature-poor). What stopped us from doing it was that the generated FO was overly complex : we needed comprehensible cut_the_crap files because we were going to integrate inside Velocity templates. And when you have tens of tags and blocks inside blocks and not the slightest idea which one is a row, which one is a column and which one is a transparent dumbass artefact, it's a gruesome trial-and-error task to integrate even simple VTL loops. And you'd have to do this each time you change something in the report : yikes ! Conclusion : the report development cycle was primitive for FOP and there was no way we could change it.

Things are quite different for Jasper Report : there are a lot of available report designers, and some of them are free. While the complete list is on Jasper Report site, I'd like to note at least three of them :

  • iReport is a Swing editor and very interesting because it's not only covering the basic Jasper functionality but also supplementary features such as barcode support (which is admittedly as easy as embedding a barcode font in Jasper with two lines of XML, but much easier to make it via a mouse click). iReport is free, which is excellent, but is a standalone app without IDE integration, and as any complex Swing app is quite slow and a memory hog.
  • if you are a developer using Eclipse, you'd appreciate two graphical editors based on Eclipse GEF, available as Eclipse plugins : JasperAssistant and SunshineReports. None of them is free and, at least on paper, the functionality seem identical, but SunshineReports has only the older 1.1 version downloadable, which is free but does NOT work with recent builds of Eclipse 3. How the heck am I supposed to test it ? On the contrary, Assistant has a much more relaxed attitude allowing the download of a free trial for the latest version of their product. Maybe too relaxed, though, because – even if (theoretically) limited in number of usages – you can use the trial as much as you want to***. But if you are serious about doing Jasper in Eclipse you should probably buy Assistant, available for a rather decent 59USD price tag. I am currently using it and it's a good tool.

So much for the tools, let's get the job done. The bad part : if you're experienced with FO templates, don't expect to be immediately proficient with Jasper, even with a GUI editor. The structure of an FO document has powerful analogies with HTML : you have tables, rows, cells, stuff like that, inside special constructs called blocks. It's relatively easy to use a language such as VTL in order to create nested tables, alternating colors and other data layout tricks. You can even render a tree-organized data via a recursive VTL macro, and everything is smooth and easy to understand. Jasper is completely different and at first sight you'll be shocked by its apparent lack of functionality : only rectangles, lines, ellipses, images, boilerplate text and fields (variable text). Each one of this elements has an extensive set of properties about when the element should be displayed, stretch type, associated expression for value and so on. Basically, you'd have to write Java code instead of Velocity macros and call this code from the corresponding properties of various report elements. If at the beginning it feels a little awkward, after a while it comes quite natural and simple. As for nesting and other advanced layouts, there is a powerful concept of 'subreport'. And yes I've managed to render a tree using a recursive subreport, but given the poor performance the final choice was to flatten the data into a vector then feed it into a simple Jasper report. So pay attention to the depth of 'subreporting'.

Once the reports were completely migrated, I've benchmarked a simple one (without SVG, charts, barcodes or other 'exotic' characteristics). The test machine is a 2.4GHz P4 w 512MB Toshiba Satellite laptop. In the case of FOP, the compiled velocity template and the FOP Driver are cached between successive runs. In the case of Jasper, the report is precompiled and loaded only on first run, then refilled with new data before each generation. The lazy loading and caching of reporting engines is the cause of important time differences between the generation of the first report and the subsequent reports. Delta memory is measured after garbage collection. The values presented are median for 10 runs of the 'benchmark report'.

  First run Subsequent runs Delta memory
Velocity + FOP 10365ms 381ms 850KB
Jasper Reports 1322ms 82ms 1012KB

While I am totally pro-Jasper after this short experiment, it is important to note that commercial and well-maintained FO rendering engines such as RenderX XEP claim improved performance upon FOP. Depending on your requirements, environment and reporting legacy apps, an FO-based solution might be better, especially when report generation is only on server-side.

Of course, usual disclaimer apply: this benchmark is valid only for my specific report in my specific application so YMMV.

* While I am aware that other OSS solutions do exist for Java, I consider these two as 'mainstream'.

** Did I mentioned that we were a startup with financing problems ?

*** No, I'm not going to explain here how it can be done.

Written by Adrian

May 25th, 2004 at 8:57 am

Posted in Tools

Tagged with , , , ,

Eclipse plugins and Groovy : when binary compatibility is not enough

leave a comment

One of my current responsibilities is to maintain an internally developed plugin, used by various members of the team to generate code from the analysis model. As far as I can tell by the webstats of the update site, every version is downloaded by 18 people, a small but heterogeneous user base.

My biggest problem is the Eclipse version. The analysts are not exactly Java geeks waiting anxiously for nightly builds of Eclipse, they use a 'standard' 2.1.2, mainly because it's stable and well internationalized. Things go wilder in the programmers team : versions ranging from conservative (2.1.x) to liberal (3.0M4) and even the occasional dumbass with the latest integration build (that would be me, of course).

The 'enhanced binary compatibility' in 3.0M7 came as a relief, diminishing the need to switch between Eclipse versions in order to develop the plugin or work on other tasks. Well, I still have to briefly test the damn thing on Eclipse 2.1.x before releases. However, running simultaneously two or three Eclipse instances is no piece of cake for my 512Mb laptop (I still haven't found who I have to kill here in order to be awarded a memory upgrade). Unfortunately, checking out the plugin source into M7 has shown the invisible ugly face of 'binary compatibility': the plugin doesn't compile.

There are just a handful of lines of code, some emphasizing differences in Eclipse API which are somehow hidden in 'compatibility mode', some effectively showing small bugs in plugin behavior. But the real issue here is that I cannot really develop the plugin in M7 until I manage somehow to compile it, while not losing downwards compatibility.

Let's dissect one of the compilation issues. The bummer concerns automatic opening of an editor (or focus if already opened) when clicking on its reference (somehow similar to what happens when you Ctrl+click on a class name in JDT). In the older API it was a question of page.openEditor(file); where page is a IWorkbenchPage and file is an IFile. This simple stuff worked well until 3.0M4, then (M5) things changed to page.openEditor(new org.eclipse.ui.part.FileEditorInput(file),editorId); where FileEditorInput implements (among others) an IEditorInput. While this is certainly nice because you may directly link editors to something other than files***, obviously the old code does not compile under M7.

Maintaining different projects for 'old' and 'new' style projects for 10 or so lines of code is obviously overkill. Second solution – via reflection, but it would mean more than few lines of code and the result would not exactly be comprehensible nor maintainable. Only thing left : use a scripting language.

Of course I could have taken any decent scripting language embedded in Java. Decision to go with Groovy was taken mainly because of its coolness factor, but I am sure the idea will apply easily with Jython (big favorite of mine) or the performance-aware Pnuts, for instance.

In a nutshell, you have to execute a line of code depending of the current Eclipse version (it's a little bit trickier, but we'll discuss later about it).

groovy.lang.Binding binding = new Binding();
binding.setVariable("page", page);
binding.setVariable("file", file);
groovy.lang.GroovyShell groovyShell = new GroovyShell(getClass().getClassLoader(), binding);
if (newPlatform)
{
return groovyShell.evaluate("page.openEditor(new org.eclipse.ui.part.FileEditorInput(file),editorId);", someExpressionId);
}
else
{
return groovyShell.evaluate("page.openEditor(file);", someExpressionId);
}

It's basically a vanilla flavored ripoff of the Groovy embedding example from the docs. The boring part : caching the binding, hiding everything behind a nice facade, is left as an exercise for the [interested] reader. Remember to pass the classLoader of the current class, do not create a GroovyClassLoader out of nowhere or you'll end up dealing with Eclipse own class loader, which means trouble for simple tasks like these.

How do we know that the Eclipse version is the 'new' or the 'old' one is not that simple because remember : 'old' means 2.x up to 3.0M4. So finding out Eclipse SDK version is not enough, you have to find out another discriminant which in our case is the 'org.eclipse.ui.ide' plugin. Result:

boolean newPlatform;
//find out if we are inside a new or an old platform
PluginVersionIdentifier pvi = Platform.getPluginRegistry().getPluginDescriptor("org.eclipse.platform").getVersionIdentifier();
newPlatform = pvi.getMajorComponent() >= 3 && Platform.getPluginRegistry().getPluginDescriptor("org.eclipse.ui.ide") != null;

No, we are not ready to deploy yet. A small trick has to be performed or the plugin won't start under older versions of Eclipse. We had to add some new plugins to dependencies (in the pligin descriptor) such as the aforementioned 'org.eclipse.ui.ide', obviously the older versions of Eclipse will not find it, hence block our plugin activation on startup. In order to overcome this, you have to add (by hand !) a lesser known attribute ('optional') in the corresponding tag from the plugin.xml file : <import plugin=”org.eclipse.ui.ide”/> becomes <import plugin=”org.eclipse.ui.ide” optional=”true”/>. Now, the plugin is ready to be deployed.

For those brave enough to dare distributing such a plugin via an update site remember to 'cheat' by not allowing new plugins such as 'org.eclipse.ui.ide' in the feature.xml file (again, delete by hand). The 'optional' attribute doesn't help in this case. Go figure…

I hope that some of you will find useful this recipe for maintaining compatibility between different Eclipse versions with minimum of fuss. However, please note the specific prerequisites for this type of solution :

  • there are only simple 'few-lines' modifications
  • the code is not expected to evolve a lot in the 'affected' areas
  • the evaluated code is not in a performance-sensitive area

***Interesting enough, this was one of the reasons I recommended against adopting RCP in one of our apps, a few weeks ago. It's nice to see that – now – the mechanism linking editors and resources is MUCH more flexible. Anyway, this won't probably change the decision of not using RCP because the main issue it's the volume of code we have to change. Development of one of the app modules started almost a year ago and the animal it's already sold and deployed on different production sites: upgrading would be a real nightmare. Maintaining a fork of the app is not an option either. Well, I guess we'll just have to cope with 'plain old' Jface and SWT.

PS After some days of 'silence', I have noticed from the logs that most popular posts on my blog are those concerning Eclipse plugins and Manning books (I seem to have a nice Google ranking on these topics). So, expect more of these (I am reading the MEAP of 'Tapestry in action' – a review should be up shortly).

Written by Adrian

March 1st, 2004 at 6:36 pm

Posted in Tools

Tagged with , , , ,

Eclipse 2.1 workspace deadlock – and a dirty but small workaround

leave a comment

It happened also on older versions but it does happen more frequently on the “final” 2.1 version. FYI : Gentoo Linux, Eclipse gtk, seems to be related somehow with bug 33138 (don't have the time to dig further).
Sometimes the monster simply hangs during a [take your pick : refactoring, new class generation] with an empty progress bar in the dialog box and a completely useless “Cancel” button. Been there, done that : kill -9 …
Then, trying to re-start Eclipse leads to a deadlock – while recovering workspace : dialog box, empty progress bar and useless “Cancel”. Freezed !
I have a lot of settings and projects so deleting the whole .metadata directory is just too painful. Therefore, I had to find out a smaller workaround : just delete the file .metadata/.plugins/org.eclipse.ui.workbench/workbench.xml and Eclipse restarts with a clean workspace. Some adjustments are lost but hey – my metadata is still there.

Written by Adrian

March 1st, 2004 at 5:21 pm

Posted in Tools

Tagged with ,

Ant goodies : extracting info from Eclipse .classpath

leave a comment

IMPORTANT UPDATE: Please note that 'antclipse' is now part of the ant-contrib at Sourceforge, under Apache licence.

Original blogpost:

I hate duplicating information manually – besides, it's a known fact that duplication is classic code smell that tells you to refactor. This time it's not Java code, but something somewhat different : Ant used in Eclipse context. The issue here is that .classpath files generated by Eclipse have important information which is usually duplicated by hand in the build.xml script. SO many times I've changed libraries in my project in Eclipse just to discover that Ant task was broken…

There surely are some workarounds like the task written by Tom Davies but unfortunately:

  • It's an Eclipse plugin. I want to be able to build my project standalone, we don't need no stinkin' plugin.
  • I's rather old and with a Nazi style checking of tags so it pukes on my 3.0M3 complaining about a certain attribute of type “con” in the .classpath file (lesson learned: don't be picky about tags and attributes names, if you want the plugin to work with future versions of the software which produced the XML document, especially when you do not have a schema or DTD to rely on)
  • It's Friday evening, dark weather outside, I'm alone in the house and the TV is broken (and even if it worked, there's nothing to see on TV anyway). Boys and girls, let's write an Ant task !

From the documentation, it appears that writing an Ant task should be an easy task :) . And yes, it is, once you go past all the little idiosyncracies. Like mandatory “to” string in a RegexpPatternMapper, although all you want to do is matching, not replacing. Like having completely different mechanisms for Path and FileSet (I've always thought a Path is a “dumbed down” FileSet, but I was completely wrong, a fileset is somewhat “smarter” but it only has a single directory).

The result is here, and everything you have to do is to download and put the antclipse.jar (7kB) in your ant/lib library and you're set (just remember to refresh Ant classpath if you're launching Ant from Eclipse).

What does it do ? Well, it creates classpaths or filesets based on your current .classpath file generated by Eclipse, according to the following parameters :

Attribute Description Required
produce This parameter tells the task wether to produce a “classpath” or a “fileset” (multiple filesets, as a matter of fact). Yes
idcontainer The refid which will serve to identify the deliverables. When multiple filesets are produces, their refid is a concatenation between this value and something else (usually obtained from a path). Default “antclipse” No
includelibs Boolean, whether to include or not the project libraries. Default is true. No
includesource Boolean, whether to include or not the project source directories. Default is false. No
includeoutput Boolean, whether to include or not the project output directories. Default is false. No
verbose Boolean, telling the app to throw some info during each step. Default is false. No
includes A regexp for files to include. It is taken into account only when producing a classpath, doesn't work on source or output files. It is a real regexp, not a “*” expression. No
excludes A regexp for files to exclude. It is taken into account only when producing a classpath, doesn't work on source or output files. It is a real regexp, not a “*” expression. No

Classpath creation is simple, it just produces a classpath that you can subsequently retrieve by its refid. The filesets are a little trickier, because the task is producing a fileset per directory in the case of sources and another separate fileset for the output file. Which is not necessarily bad, since the content of each directory usually serves a different purpose. Now, in order to avoit conflicting refids each fileset has a name composed by the idcontainer, followed by a dash and postfixed by the path. Supposing that your output path is bin/classes and the idcontainer is default, the task will create a fileset with refid antclipse-bin/classes. The fileset will include all the files contained in your output directory, but without the trailing path bin/classes (as you usually strip it when creating the distribution jar). If you have two source directories, called src and test, you'll be provided with two filesets, with refids like antclipse-src and antclipse-test.

However, you don't have to code manually the path since some properties are created as a “byproduct” each time you execute the task. Their name is idref postfixed by “outpath” and “srcpath” (in the case of the source, you'll find the location of the first source directory).

A pretty self-explanatory Ant script follows (“xml” is a forbidden file type on jroller, so just copy paste it into your favourite text editor). Note that nothing is hardcoded, it's an adaptable Ant script which should work in any Eclipse project.

Created with Colorer-take5 Library. Type 'ant'
<?xml version="1.0"?>
<project default="compile" name="test" basedir="."> <taskdef name="antclipse" classname="fr.infologic.antclipse.ClassPathTask"/>
<target name="make.fs.output">
<!-- creates a fileset including all the files from the output directory, called ecl1-bin if your binary directory is bin/ -->
<antclipse produce="fileset" idcontainer="ecl1" includeoutput="true" includesource="false"
includelibs="false" verbose="true"/> </target>

<target name="make.fs.sources">
<!-- creates a fileset for each source directory, called ecl2-*source-dir-name*/ -->
<antclipse produce="fileset" idcontainer="ecl2" includeoutput="false" includesource="true" includelibs="false" verbose="true"/>
</target>

<target name="make.fs.libs">
<!-- creates a fileset sontaining all your project libs called ecl3/ -->
<antclipse produce="fileset" idcontainer="ecl3" verbose="true"/>
</target>

<target name="make.cp">
<!-- creates a fileset sontaining all your project libs called ecl3/ -->
<antclipse produce="classpath" idcontainer="eclp" verbose="true" includeoutput="true"/>
</target>

<target name="compile" depends="make.fs.libs, make.fs.output, make.fs.sources, make.cp">
<echo message="The output path is ${ecl1outpath}"/>
<echo message="The source path is ${ecl2srcpath}"/>
<!-- makes a jar file with the content of the output directory -->
<zip destfile="out.jar"><fileset refid="ecl1-${ecl1outpath}"/></zip> <!-- makes a zip file with all your sources (supposing you have only source directory) -->
<zip destfile="src.zip"><fileset refid="ecl2-${ecl2srcpath}"/></zip> <!-- makes a big zip file with all your project libraries -->
<zip destfile="libs.zip"><fileset refid="ecl3"/></zip>
<!-- imports the classpath into a property then echoes the property --> <property name="cpcontent" refid="eclp"/>
<echo>The newly created classpath is ${cpcontent}</echo>
</target>
</project>

TODOS : make “includes” and “excludes” to work on the source and output filesets, find an elegant solution to this multiple fileset/directories issues, and most important make it work with files referenced in other projects.

I am aware that the task is very far from being perfect, so just download it if you're interested, try to use it, try to break it, and tell me what you think and how it can be improved. Also, if you're interested in the source, just send me an email, but be aware that it's Friday evening beer-induced source code, nothing to be proud of… It was only tested it with Ant 1.5.x so YMMV. I assume no responsibility if you use it a production environment.

Written by Adrian

March 1st, 2004 at 5:17 pm

Posted in Tools

Tagged with , , ,