Netuality

Taming the big bad websites

Archive for the ‘XML’ tag

Using HTTPUnit ‘out of the box’

leave a comment

Recently, HTTPUnit project reached version 1.6. While this nifty API is mainly targeted at unit testing webapps, I have also succesfully used it for other purposes such as :

HTTPUnit as a benchmarking tool

There is a plethora of web benchmarking tools out there, both freeware and commercial. However, my customer requested some features for testing, that I've had troubles satisfying simultaneously with the existing tools:

  • the tests must run on headless systems (console mode, non GUI)
  • load testing should simulate complex and realistic user interactions with the site

AFAIK, all the testing tools that allow recording and replaying of intricate web interaction scenarios are GUI-based. And then, command-line tools are also unfit for the job, take for instance Apache Jmeter which is basically a command-line tool with a Swing GUI slapped on it. While Jmeter is great when it comes to bombing a server (or a cluster, for that matter) with requests, it seriously lacks features when it comes to scripting complex interaction (you'd better know your URL's and cookies by heart … well, almost).

Another problem I see with existing automated testing solutions is with their error detection mechanisms. While the vast majority of tools are scanning for standard HTTP error codes such as 404 or 500 in order to find out if the response is erroneous or not, errors in complex Java apps might come as plain web pages containing strack traces and environment information (a good example is the error page in Apache Tapestry).

So eventually I had to come up with an ad-hoc solution – basic idea was to leverage the existing HTTP unit tests for benchmarking purposes. I had to get out of the toolbox another rather under-rated open-source gem: JUnitPerf, in fact a couple of decorators for Junit tests. LoadTest is the decorator I'm interested in : it allows running a test in multithreaded mode, simulating any number of concurrent users. Thus, I am able to reproduce heavy loads of complex user interaction and precisely count the number of errors. The snippet of code is something like:

SimpleWebTestPerf swtp = new SimpleWebTestPerf("testOfMyWebapp");
Test loadTest = new LoadTest(swtp, nbConcurrentThreads);
TestResult tr = TestRunner.run(loadTest);
int nbErr = tr.errorCount();

Now, we'll call this code with increasing values of nbConcurrentThreads and see where the errors start to appear. Might as well write the results in a log file and even create a nice PNG via JFreeChart. Alas, things become a little trickier when we want to measure the bandwidth; in our case we'll have to write something very lame in the TestCase; and it goes like that:

private long bytes = 0;

private synchronized void addBytes(long b)
{
	bytes += b;
}

/**
 * After a test was run, returns the volume of HTTP data.
 * @return
 */
public long getBytes()
{
	return bytes;
}

public void testProdsPerformance() throws MalformedURLException,
  IOException, SAXException
{
	[...]
	WebConversation wc = new WebConversation();
	WebResponse resp = wc.getResponse(something);
	addBytes(resp.getContentLength());
	[...]
}

Then, in the benchmarking code, we'll do swtp.getBytes() in order to find out how many bytes passed between the server and the test client. It is still unclear for me if this value is correct if mod_gzip is activated on the server (we might actually measure the bandwidth of the 'deflated' pages !?).

In order to measure the elapsed time, we'll do a similar (lame) trick with a private long time member and a private synchronized void addTime(long millis). Unfortunately, we do not [yet?] have a getElapsedTime() for the WebResponse, so we'll have to use the good old System.currentTimeMillis() before and after extraction of each WebResponse. Of course, this is also measuring the parsing time of WebResponse, but this isn't usually a problem when you are testing a large number of concurrent users, as the parsing time is much smaller when compared with the response time of a stressed server. But, you'll need a strong system for the client-side tests.

Another tip I've found: use Random in order to test different slices of data on different test runs. This way, when you run, let's say, the 20 threads test, you'll kick different data compared to the previous test, on 10 threads. In this manner, the results will be less influenced by the tested application cache(s). It's perfectly possible to launch LoadTest threads with delays between thread activation, which means that the Random seed could be different within each simulated client – if you're looking for even more realistic behavior.

HTTPUnit as a Web integration API

Besides being a great testing tool, Httpunit is also a cool API for Web manipulation, you can use it to perform data integration with all sorts of websites. For instance, let's log on the public demo instance of MantisBT bug tracking system, as user 'developer', and extract the descriptions of the first three bugs in the list.

package webtests;

import java.io.IOException;
import java.net.MalformedURLException;

import org.xml.sax.SAXException;

import com.meterware.httpunit.WebConversation;
import com.meterware.httpunit.WebForm;
import com.meterware.httpunit.WebResponse;
import com.meterware.httpunit.WebTable;

/**
 * Simple demo class using httpunit to extract the description of
 * three most recent bugs from the MantisBT public demo,
 * logged as 'developer'.
 * @author Adrian Spinei [email protected]
 * @version $Id: $
 */
public class MantisTest
{

	public static void main(String[] args) throws MalformedURLException, IOException, SAXException
	{
		WebConversation wc = new WebConversation();
		WebResponse resp = wc.getResponse("http://mantisbt.sourceforge.net/mantis/login_page.php");
		WebForm wForm = resp.getFormWithName("login_form");
		wForm.setParameter("username", "developer");
		wForm.setParameter("password", "developer");
		//submit login, conect to front page
		resp = wForm.submit();
		//'click' on the 'View Bugs' link
		resp = resp.getLinkWith("View Bugs").click();
		//retrieve the table containing the bug list
		//you'll have to believe me on this one, I've counted the tables !
		WebTable webTable = resp.getTables()[3];
		//first three rows are : navigation and header, then a blank formatting row

		//interesting data starts from the 4th column

		System.out.println(webTable.getCellAsText(3, webTable.getColumnCount() - 1));
		System.out.println(webTable.getCellAsText(4, webTable.getColumnCount() - 1));
		System.out.println(webTable.getCellAsText(5, webTable.getColumnCount() - 1));
	}
}

The code speaks for itself: HTTPUnit is beautiful, intuitive and easy to use.

Other interesting HTTPUnit-related articles:

Written by Adrian

February 2nd, 2005 at 8:21 am

Posted in Tools

Tagged with , , , ,

JUnit Recipes – for the gourmets of Java unit testing

leave a comment

A mini-review

The book already has stellar ratings on Amazon, JavaRanch and other select places, and after reading a few chapters, the only thing I can do is add this post on the praise list.

Why only a few chapters ? Well, you see, this is not exactly the type of book that you read from cover to cover, it's in fact a 720-pages solid collection of JUnit best practices, the most comprehensive you'll ever found in organized, written form. Until now, I have found precise answers to all my JUnit questions.

The book is organized in three big sections weighting some 200-250 pages each. The first one ('The building blocks') is hugely useful if you are a JUnit newbie or even an absolute beginner. It's a detailed introduction to everything you'll need to know in order to start using JUnit: basics, usage in mainstream IDEs, testing patterns, managing test suites, test data and troobleshooting advice. Even seasoned programmers will find useful pieces of advice. I especially liked the 5th chapter ('Working with test data') : an exhaustive description of all the possible ways of organizing your test data. To be honest, it's one of the few chapters from the first section, that I've really read.

The second part ('Testing J2EE') covers testing of XML, JDBC, EJB, web components (JSP, servlets, Velocity templates, out-of-container testing & such) and J2EE applications (security and again some webapp testing – pageflow, broken links, Struts navigation). I can't really pronounce upon this section since I've read only a few subchapters (DBUnit and some related JDBC testing, as well as a few pages from the web testing chapter). But every piece of advice I've got was rock solid.

I've read in its entirety the third part ('More JUnit techniques'). Under this bland title you'll find a group of not-so-common JUnit info such as usage of GSBase addon (funny that I wasn't aware that such a useful addon exists). There's also an intriguing 'Odds and ends' chapter containing some interesting recipes (here's a good one for the QA-freaks like me : 'verify that your test classes adhere to basic syntax rules of JUnit' – sure, why not ?).

Something that I've really missed is a chapter dedicated to mock objects recipes. Yes, there is a quick explanation in the first section – and a reference to Easy Mock or some other mocking API pops now and then in different chapters – there's even an essay about mocking at the end of the book. But the main mock objects dish isn't there. I would've also loved to see some automated GUI-testing recipes (Abbot, Marathon & related tools). But then again, it's a >700-pages book so I'm probably asking too much.

To conclude, 'JUnit Recipes' is the best JUnit book I've ever came into contact with and it supercedes on my list 'JUnit in Action'. Which, interesting enough, was published by the same Manning almost a year ago. Can't have too many good JUnit books, don't they ?

Written by Adrian

November 21st, 2004 at 11:41 am

Posted in Books

Tagged with , , , , , ,

XML descriptors in PHP frameworks – considered harmful

2 comments

No, I am not a seasoned PHP programmer and I do not intend to become one. But we do live in a harsh economy where all IT projects are worth considering, thus my occasional incursions in the world of of PHP-driven websites.

I am not new to PHP either, but – coming from a Java world – immediately felt the need of a serious MVC framework.
Nobody wants to reinvent the wheel each time a new website is built. Just launch the obvious “PHP MVC framework” on Google and the results pages will be dominated by four open-source projects :

  • PHPMVC is probably the oldest project and implements a model 2 front controller/li>
  • Ambivalence declares itself as a simple port of
    Java Maverick project
  • Eocene a “simple and easy to use OO web development framework for PHP and ASP.NET”,implementing MVC and front controller
  • Phrame is a Texas Tech University project released as LGPL, heavily inspired by Struts.

The choice is not easy. There are no examples of industrial-quality sites built with either of these frameworks.
(some may say there are no examples of industrial quality sites built with PHP but let's ignore these nasty people for now).

There are no serious comparisons of the four frameworks, neither feature-wise nor performance-wise.
In the tradition of open-source projects, the documentation is rather scarce and examples are “helloworld”-isms.
Yes I am a bloody bastard for pointing out these aspects – since the authors are not paid to release these projects – and perhaps I could contribute myself with some documentation. However, when under an aggressive schedule I feel it's easier to write my own framework instead of understanding other people's code and document it thoroughly.
However, I have a nice hint for you. The first three frameworks are using XML files for controller initialization (call it “sitemap”, “descriptor” or otherwise; but it's just a plain dumb XML file). So you should safely ignore them in a production environment.

Because, the “controller” is nothing more than a glorified state machine. The succession of states and transitions (or “actions” or whatever) should be persisted somewhere. XML is probably a nice choice for Java frameworks, where the files are parsed and the application server keeps in memory a pool of controller objects.

But: PHP sessions are stateless. The only way of keeping state is via filesystem or database, usually based on an ad-hoc generated unique key, which is kept in the session cookie. More: PHP allows native serialization only for primitive variables; a complex object such as the controller can not be persisted easily, so it has to be retrieved from XML and fully rebuilt. Unlike in Java appservers, objects cannot be shared between multiple session, thus pooling is not an option. Thus, in PHP, the XML approach is highly un-recommended, since this means that the XML files are parsed for each page that is viewed on the site. Although PHP's parser is James Clarks's Expat, one of the fastest parsers right now (written in C), note that the DOM object must be browsed in order to create the controller object (which is becoming more and more complex as the site grows). This is called heavy overhead, no matter how you look at it.

There are a few reasons about why you need XML in a web framework, however this does NOT apply to PHP apps. Myth quicklist:

  • it's “human-readable”. Come on, PHP is stored is ASCII readable files and even if you use Zend products to compile and encrypt your code, why on earth would you allow readability and modification of the controller state machine on the deployment site ?
  • easier to modify than in code. This is probably true for Java and complex frameworks, but in PHP is significantly simpler than Java.
  • automatically generated from code by tools such as Xdoclet or via an IDE. If you're writing it in Java, because PHP does not have such tools.

This means that the only serious candidate (between these considered here) for a PHP MVC framework is Phrame, which stores the sitemap as a multi-dimensional hashmap. Thus, you should either consider Phrame or (for small < 50 screens) sites you'll be better off writing your own mini-framework, with a state machine implemented as a hashed array of arrays and some session persistence in the database. I chose to serialize and persist an array containing primitive variables, using PHPSESSID as the primary key in order to retrieve and unserialize the array, all coupled with a simple "aging" mechanism for these users with the nasty habit of leaving the site without performing logout first.

Finally a last world of advice : use PEAR ! This often overlooked library of PHP classes includes a few man-years of quality work. You'll get a generic database connection layer (PEAR-DB) along with automatic generation of model classes mapped on the database schema (DB_DataObjects), a plethora of HTML tools (tables, forms, menus) and some templating systems to choose from. All in a nice easy to install and upgrade package.

Don't put a heavy burden on your upgrade cycle using heterogenous packages downloaded from different sites on the web, just use PEAR.

Or simply ignore the PHP offer and wait patiently for your next Java project. Vacations are almost over.

Written by Adrian

October 29th, 2004 at 8:53 am

Posted in Tools

Tagged with , , , ,

MVC and Front Controller frameworks in PHP – more considerations

leave a comment

Having recently stumbled upon this thread on Sitepoint community forums, I found a certain Mr. Selkirk advocating page controllers instead of front controller – meaning that the state machine logic is distributed in each page/state. I have some pragmatic problems with the approach since this means that a large (hundreds of pages) site would imply modifying each page if a new generic transition appears.

On this same thread, there's a sensible approach coming from an Interakt guy which I also happen to know personally [hi, Costin]. He describes PHP website design using MVC (from a controller point of view) as having 3 steps :

  • Design your site with a fancy IDE which will generate a lot of XML gunk
  • Let the framework compile the XML to good old PHP code, prefectly human-readable and all
  • Enjoy ! Speed and structure.

Unfortunately his solution is not exactly open-source nor free, and I'll gladly use my 500 maccaronis for a shiny new flat screen. Besides, it looks like my PHP episode is coming to an end (I see some serious consulting on Java apps on the horizon). Anyway my piece of advice to Costin (as a non-customer) is “don't do any serialization, keep the code clean as the bottleneck usually comes from the database – and the world will be a better place to live”.

On a lighter note, there is John telling us cool stuff about PHP:


Does this reloading of all data on every HTTP request mean that PHP is not scalable? Fortunately, as I've said before in other posts, that's a superficial view of things. The reloading of data is precisely the reason why PHP is scalable. The way PHP works encourages you to code using state-less designs. State information is not stored in the PHP virtual machine, but in separate mechanisms, a session variable handler or a database.
This makes PHP very suitable for highly scalable shared-nothing architectures. Each webserver in a farm can be made independent of each other, communicating with a central data and session store, typically a relational database. So you can have 10 or 1000 identically configured PHP web servers, and it will scale linearly, the only bottleneck being the database or other shared resources, not PHP.

Whew ! Only if vendors 'knew' that removing state information from their appservers, it would instantly become very suitable for highly scalable shared-nothing architectures. Somenone should tell this to IMB, BEA and Sun. And maybe to Microsoft. Oh, only if the things were that simple !

PS For those wondering about my sudden passion into PHP, there is an older entry on my weblog explaining the whos and the whats.

Written by Adrian

October 29th, 2004 at 8:44 am

Posted in Tools

Tagged with , , , ,

(Undocumented?) HTTP-based REST API in Jira

leave a comment

While the REST API is mentioned in the Top 10 Reasons to use Jira, I can hardly see any reference to such an external API in the documentation (I mean, besides XML-RPC and upcoming SOAP support) and even Google can't clear the issue. But I can confirm you, it is possible to fully(?) automate Jira using HTTP requests.

Recently, I was asked to programatically login a user into Jira and open a browser with the results of an advanced search. This is part of a really slick integration between one of my employer's products and a JIRA server – mainly helping our testers to avoid bug duplication and also providing insightful statistics for QA dept.

I've started doing it the hard way, trying to obtain and pass JSESSIONID and other useless stuff between my application and the browser, until I realized that Jira can be fully controlled through HTTP in a REST-like manner. Let me explain. Normally, if you are not logged into Jira and launch a browser with a carefully crafted search request (well, you know how to reverse enginner POSTs into GETs don't you ?) – then a graceful invitation to log in is everything you'll ever obtain. But, if you add at the end of your request “&os_username=” + USER + “&os_password=”+ PASS bang ! not only you obtain the desired search results, but you are automagically logged into Jira for the rest of the browsing session. Yes, yes yes : here I come, integration ! A couple of hours later, I am able to programatically insert bug reports, extract bug details, compute statistics and open customized advanced searches.

To quote a classic : Jira ! Jira ! Jira !. Docs would be nice, though.

PS I'm testing this on a Jira Professional 2.6.1-#65. YMMV.

Written by Adrian

September 17th, 2004 at 4:45 pm

Posted in Tools

Tagged with , ,

… and a few lesser known Java tools

leave a comment

Very very busy lately, but I'd like to share some knowledge about a few useful Java OSS gems that were not easy to find. Mr. Google, please 'index this':

1. Aspirin is a self-contained SMTP server (send only) written in Java, open-sourced and free. It simplifies configuration and deployment by allowing your app server to send emails without passing through an external SMTP server. The project is heavily inspired from Apache James code (thus its licencing terms). The few problems I see right now are : possible performance issues when sending big volumes of mail, behavior still erratic (sometimes sending fails without plausible reason), failure reports which do not provide reasons of failure. However, the thingie works pretty well and is a big time saver because, well, configuration is not the most pleasant part of a complex server.

2. If you produce a lot of reports and want to send them automatically on a remote printing server you may use JIPSI (quickstart in English, but site in German) which implements CUPS as a Java Print Service API. This little beauty was found by one of my coworkers and the 'report guys' seem to be making good use of it.

3. You're in for some serious processing on OpenOffice documents using the freely available DTD's (downloadable from the OOo CVS server) ? Then hold your horses ! I've tried to make sensible use of them and failed abruptly. Let's just say that those DTDs are a big pain in the a**: to begin with, no tool is able to transform them into a schema. I've tried XmlSpy and a few other exotic softwares, without success. Even basic stuff like parsing with a validating parser does not work. So much for the usefulness of open standards. Eventually, I have ended up by using the excellent Writer2Latex. Don't be fooled by the name, you may do all sorts of conversions with it, including Writer to XHTML, which I was interested in. You can even write your own plugin to boot some exotic formats, because Writer2Latex is built around a modified version of XMerge. Officially, XMerge is the solution for visualizing documents on 'Small Devices', but it really is a fancy plugin-based document converter. Most probably (too lazy to check the sources), SAX-based with a nonvalidating parser. Go figure.

4. The Eclipse download site has now links to a BitTorrent tracker. I just used it succesfully in order to download RC3 at a reasonable download rate (anyway, being on wifi right now I wasn't expecting blazing speed). I found interesting that all the other peers were using Azureus, a torrent client written in Java+SWT. Azureus is a fantastic source of knowledge, choke full of tips and techniques for writing professional-looking and very responsive SWT apps. But not only : Azureus is also a great example about how to write a plugin-ready app, which performs automatic updates from the net. Not bad, at all.

Written by Adrian

June 21st, 2004 at 8:14 pm

Posted in Tools

Tagged with , , ,

Comparing FOP and JasperReports

one comment

UPDATE: this is a pretty old article, but still popular, if you want to know more about Jasper Reports I recommend JasperReports 3.6 Development Cookbook

Anybody looking for OSS reporting solutions in Java usually has to make a choice between Apache FOP and Jasper Reports*. While having somewhat different feature sets and addressing distinct reporting solutions, the two APIs boil down to the same basic thing : generate a report from an XML file (or stream/string/whatever). FOP has a clear advantage of standardization (based on XSL-Formatting Objects) while Jasper plays more in the pragmatic field of obtaining those 80% results with a minimum of effort and uses a proprietary XML format.

But FOP is not a standalone reporting solution : it’s just a way of transforming XSL-FO files into a report. In order to fill the report with the necessary data, the obvious choice is a templating engine such as Jakarta Velocity. Thus a FOP report creation is a two-step operation :

  • create the XML report via Velocity
  • feed the XML stream to FOP

Jasper alleviates this problem by including its own binding engine, the only restriction being that input data should support some constraints (such as putting your ‘rows’ inside a JRDataSource).

Both Jasper and FOP allow inclusion of graphic files inside, usual formats (GIF, JPEG) are supported, however FOP has a nice bonus of rendering SVG inside reports. Unfortunately, this comes with the price of using Batik SVG Toolkit, which is a bulky (close to 2MB) and rather slow API. While processing your dynamic charts as XML files (Velocity again) is a seducing idea, the abysmal performance of SVG rendering will make you give up in no time. Unfortunately, I speak from experience.

At first sight, FOP has a lot more options for output format, compared to Jasper Reports. Of course there’s PDF and direct printing via AWT, but also Postscript, PCL, MIF as well as SVG. These choices are quite intriguing, since Postscript and PCL are printing formats (easily obtained by redirecting the specific printer queue into a file), MIF is a rather obscure Adobe format (for Framemaker) and SVG … well, a SVG report is too darn slow to be useable (yes, I was foolish enough to try this, too). Jasper makes again a pragmatic choice by allowing really useful output formats such as HTML, CSV and XSL (never underestimate the power of Excel); and of course: direct printing via AWT and PDF.
While FOP’s latest version (0.20.5) was released almost a year ago (summer 2003), Jasper Reports is bubbling with activity – Teodor releases a minor version each one or two months (latest being 0.5.3 at 18.05.2004).

I’ve decided to use as a ‘lab rat’ one of the apps developed during my ‘startup days’: the client GUI is written in Swing and features a few mildly complex reports generated using Velocity+FOP. FOP version is 0.20.4 (the current version back in Q1-2003, when we had to quit dreaming about the ‘next financing round’ and development halted) but as I already told you FOP has evolved little since then. Though, it’s perfectly reasonable to use this implementation as a witness for comparison with Jasper (on the opposite, Jasper has evolved a great deal since Q1-2003).

Back then, the report development cycle was quite simplistic. In fact, the XSL-FO templates were written by hand inside a text editor and the application code was run (via a Junit testcase and some necessary configuration and business data mocking) in order to generate a PDF report. In the case of errors, we had feedback by examining the error traces. Visual feedback was given by the PDF output. While simple to perform, this cycle was extremely tiresome after a while as there was an important overhead : start a new JVM, initialize FOP, fire Acrobat Reader (plus we were using some crappy – even by the standards of 2003 – 1GHz machines w 256/512MB RAM). A WYSIWYG editor would have been nice, so one of my coworkers has made some research and the only solution he found was XMLSpy (Stylevision not available back then) – but, at 800USD/seat this was ‘a bit’ pricey** for us (only the Enterprise flavor covers FO WYSIWYG editing !?). Another interesting idea was to use one of the conversion tools (from RTF to FOP) such as Jfor, XMLMind or rtf2fo (of these products, only Jfor is free, but feature-poor). What stopped us from doing it was that the generated FO was overly complex : we needed comprehensible cut_the_crap files because we were going to integrate inside Velocity templates. And when you have tens of tags and blocks inside blocks and not the slightest idea which one is a row, which one is a column and which one is a transparent dumbass artefact, it’s a gruesome trial-and-error task to integrate even simple VTL loops. And you’d have to do this each time you change something in the report : yikes ! Conclusion : the report development cycle was primitive for FOP and there was no way we could change it.

Things are quite different for Jasper Report : there are a lot of available report designers, and some of them are free. While the complete list is on Jasper Report site, I’d like to note at least three of them :

  • iReport is a Swing editor and very interesting because it’s not only covering the basic Jasper functionality but also supplementary features such as barcode support (which is admittedly as easy as embedding a barcode font in Jasper with two lines of XML, but much easier to make it via a mouse click). iReport is free, which is excellent, but is a standalone app without IDE integration, and as any complex Swing app is quite slow and a memory hog.
  • if you are a developer using Eclipse, you’d appreciate two graphical editors based on Eclipse GEF, available as Eclipse plugins : JasperAssistant and SunshineReports. None of them is free and, at least on paper, the functionality seem identical, but SunshineReports has only the older 1.1 version downloadable, which is free but does NOT work with recent builds of Eclipse 3. How the heck am I supposed to test it ? On the contrary, Assistant has a much more relaxed attitude allowing the download of a free trial for the latest version of their product. Maybe too relaxed, though, because – even if (theoretically) limited in number of usages – you can use the trial as much as you want to***. But if you are serious about doing Jasper in Eclipse you should probably buy Assistant, available for a rather decent 59USD price tag. I am currently using it and it’s a good tool.

So much for the tools, let’s get the job done. The bad part : if you’re experienced with FO templates, don’t expect to be immediately proficient with Jasper, even with a GUI editor. The structure of an FO document has powerful analogies with HTML : you have tables, rows, cells, stuff like that, inside special constructs called blocks. It’s relatively easy to use a language such as VTL in order to create nested tables, alternating colors and other data layout tricks. You can even render a tree-organized data via a recursive VTL macro, and everything is smooth and easy to understand. Jasper is completely different and at first sight you’ll be shocked by its apparent lack of functionality : only rectangles, lines, ellipses, images, boilerplate text and fields (variable text). Each one of this elements has an extensive set of properties about when the element should be displayed, stretch type, associated expression for value and so on. Basically, you’d have to write Java code instead of Velocity macros and call this code from the corresponding properties of various report elements. If at the beginning it feels a little awkward, after a while it comes quite natural and simple. As for nesting and other advanced layouts, there is a powerful concept of ‘subreport’. And yes I’ve managed to render a tree using a recursive subreport, but given the poor performance the final choice was to flatten the data into a vector then feed it into a simple Jasper report. So pay attention to the depth of ‘subreporting’.

Once the reports were completely migrated, I’ve benchmarked a simple one (without SVG, charts, barcodes or other ‘exotic’ characteristics). The test machine is a 2.4GHz P4 w 512MB Toshiba Satellite laptop. In the case of FOP, the compiled velocity template and the FOP Driver are cached between successive runs. In the case of Jasper, the report is precompiled and loaded only on first run, then refilled with new data before each generation. The lazy loading and caching of reporting engines is the cause of important time differences between the generation of the first report and the subsequent reports. Delta memory is measured after garbage collection. The values presented are median for 10 runs of the ‘benchmark report’.

First run Subsequent runs Delta memory
Velocity + FOP 10365ms 381ms 850KB
Jasper Reports 1322ms 82ms 1012KB

While I am totally pro-Jasper after this short experiment, it is important to note that commercial and well-maintained FO rendering engines such as RenderX XEP claim improved performance upon FOP. Depending on your requirements, environment and reporting legacy apps, an FO-based solution might be better, especially when report generation is only on server-side.

Of course, usual disclaimer apply: this benchmark is valid only for my specific report in my specific application so YMMV.

* While I am aware that other OSS solutions do exist for Java, I consider these two as ‘mainstream’.

** Did I mentioned that we were a startup with financing problems ?

*** No, I’m not going to explain here how it can be done.

Written by Adrian

May 25th, 2004 at 8:57 am

Posted in Tools

Tagged with , , , ,

Eclipse plugins and Groovy : when binary compatibility is not enough

leave a comment

One of my current responsibilities is to maintain an internally developed plugin, used by various members of the team to generate code from the analysis model. As far as I can tell by the webstats of the update site, every version is downloaded by 18 people, a small but heterogeneous user base.

My biggest problem is the Eclipse version. The analysts are not exactly Java geeks waiting anxiously for nightly builds of Eclipse, they use a 'standard' 2.1.2, mainly because it's stable and well internationalized. Things go wilder in the programmers team : versions ranging from conservative (2.1.x) to liberal (3.0M4) and even the occasional dumbass with the latest integration build (that would be me, of course).

The 'enhanced binary compatibility' in 3.0M7 came as a relief, diminishing the need to switch between Eclipse versions in order to develop the plugin or work on other tasks. Well, I still have to briefly test the damn thing on Eclipse 2.1.x before releases. However, running simultaneously two or three Eclipse instances is no piece of cake for my 512Mb laptop (I still haven't found who I have to kill here in order to be awarded a memory upgrade). Unfortunately, checking out the plugin source into M7 has shown the invisible ugly face of 'binary compatibility': the plugin doesn't compile.

There are just a handful of lines of code, some emphasizing differences in Eclipse API which are somehow hidden in 'compatibility mode', some effectively showing small bugs in plugin behavior. But the real issue here is that I cannot really develop the plugin in M7 until I manage somehow to compile it, while not losing downwards compatibility.

Let's dissect one of the compilation issues. The bummer concerns automatic opening of an editor (or focus if already opened) when clicking on its reference (somehow similar to what happens when you Ctrl+click on a class name in JDT). In the older API it was a question of page.openEditor(file); where page is a IWorkbenchPage and file is an IFile. This simple stuff worked well until 3.0M4, then (M5) things changed to page.openEditor(new org.eclipse.ui.part.FileEditorInput(file),editorId); where FileEditorInput implements (among others) an IEditorInput. While this is certainly nice because you may directly link editors to something other than files***, obviously the old code does not compile under M7.

Maintaining different projects for 'old' and 'new' style projects for 10 or so lines of code is obviously overkill. Second solution – via reflection, but it would mean more than few lines of code and the result would not exactly be comprehensible nor maintainable. Only thing left : use a scripting language.

Of course I could have taken any decent scripting language embedded in Java. Decision to go with Groovy was taken mainly because of its coolness factor, but I am sure the idea will apply easily with Jython (big favorite of mine) or the performance-aware Pnuts, for instance.

In a nutshell, you have to execute a line of code depending of the current Eclipse version (it's a little bit trickier, but we'll discuss later about it).

groovy.lang.Binding binding = new Binding();
binding.setVariable("page", page);
binding.setVariable("file", file);
groovy.lang.GroovyShell groovyShell = new GroovyShell(getClass().getClassLoader(), binding);
if (newPlatform)
{
return groovyShell.evaluate("page.openEditor(new org.eclipse.ui.part.FileEditorInput(file),editorId);", someExpressionId);
}
else
{
return groovyShell.evaluate("page.openEditor(file);", someExpressionId);
}

It's basically a vanilla flavored ripoff of the Groovy embedding example from the docs. The boring part : caching the binding, hiding everything behind a nice facade, is left as an exercise for the [interested] reader. Remember to pass the classLoader of the current class, do not create a GroovyClassLoader out of nowhere or you'll end up dealing with Eclipse own class loader, which means trouble for simple tasks like these.

How do we know that the Eclipse version is the 'new' or the 'old' one is not that simple because remember : 'old' means 2.x up to 3.0M4. So finding out Eclipse SDK version is not enough, you have to find out another discriminant which in our case is the 'org.eclipse.ui.ide' plugin. Result:

boolean newPlatform;
//find out if we are inside a new or an old platform
PluginVersionIdentifier pvi = Platform.getPluginRegistry().getPluginDescriptor("org.eclipse.platform").getVersionIdentifier();
newPlatform = pvi.getMajorComponent() >= 3 && Platform.getPluginRegistry().getPluginDescriptor("org.eclipse.ui.ide") != null;

No, we are not ready to deploy yet. A small trick has to be performed or the plugin won't start under older versions of Eclipse. We had to add some new plugins to dependencies (in the pligin descriptor) such as the aforementioned 'org.eclipse.ui.ide', obviously the older versions of Eclipse will not find it, hence block our plugin activation on startup. In order to overcome this, you have to add (by hand !) a lesser known attribute ('optional') in the corresponding tag from the plugin.xml file : <import plugin=”org.eclipse.ui.ide”/> becomes <import plugin=”org.eclipse.ui.ide” optional=”true”/>. Now, the plugin is ready to be deployed.

For those brave enough to dare distributing such a plugin via an update site remember to 'cheat' by not allowing new plugins such as 'org.eclipse.ui.ide' in the feature.xml file (again, delete by hand). The 'optional' attribute doesn't help in this case. Go figure…

I hope that some of you will find useful this recipe for maintaining compatibility between different Eclipse versions with minimum of fuss. However, please note the specific prerequisites for this type of solution :

  • there are only simple 'few-lines' modifications
  • the code is not expected to evolve a lot in the 'affected' areas
  • the evaluated code is not in a performance-sensitive area

***Interesting enough, this was one of the reasons I recommended against adopting RCP in one of our apps, a few weeks ago. It's nice to see that – now – the mechanism linking editors and resources is MUCH more flexible. Anyway, this won't probably change the decision of not using RCP because the main issue it's the volume of code we have to change. Development of one of the app modules started almost a year ago and the animal it's already sold and deployed on different production sites: upgrading would be a real nightmare. Maintaining a fork of the app is not an option either. Well, I guess we'll just have to cope with 'plain old' Jface and SWT.

PS After some days of 'silence', I have noticed from the logs that most popular posts on my blog are those concerning Eclipse plugins and Manning books (I seem to have a nice Google ranking on these topics). So, expect more of these (I am reading the MEAP of 'Tapestry in action' – a review should be up shortly).

Written by Adrian

March 1st, 2004 at 6:36 pm

Posted in Tools

Tagged with , , , ,

Eclipse 2.1 workspace deadlock – and a dirty but small workaround

leave a comment

It happened also on older versions but it does happen more frequently on the “final” 2.1 version. FYI : Gentoo Linux, Eclipse gtk, seems to be related somehow with bug 33138 (don't have the time to dig further).
Sometimes the monster simply hangs during a [take your pick : refactoring, new class generation] with an empty progress bar in the dialog box and a completely useless “Cancel” button. Been there, done that : kill -9 …
Then, trying to re-start Eclipse leads to a deadlock – while recovering workspace : dialog box, empty progress bar and useless “Cancel”. Freezed !
I have a lot of settings and projects so deleting the whole .metadata directory is just too painful. Therefore, I had to find out a smaller workaround : just delete the file .metadata/.plugins/org.eclipse.ui.workbench/workbench.xml and Eclipse restarts with a clean workspace. Some adjustments are lost but hey – my metadata is still there.

Written by Adrian

March 1st, 2004 at 5:21 pm

Posted in Tools

Tagged with ,

Ant goodies : extracting info from Eclipse .classpath

leave a comment

IMPORTANT UPDATE: Please note that 'antclipse' is now part of the ant-contrib at Sourceforge, under Apache licence.

Original blogpost:

I hate duplicating information manually – besides, it's a known fact that duplication is classic code smell that tells you to refactor. This time it's not Java code, but something somewhat different : Ant used in Eclipse context. The issue here is that .classpath files generated by Eclipse have important information which is usually duplicated by hand in the build.xml script. SO many times I've changed libraries in my project in Eclipse just to discover that Ant task was broken…

There surely are some workarounds like the task written by Tom Davies but unfortunately:

  • It's an Eclipse plugin. I want to be able to build my project standalone, we don't need no stinkin' plugin.
  • I's rather old and with a Nazi style checking of tags so it pukes on my 3.0M3 complaining about a certain attribute of type “con” in the .classpath file (lesson learned: don't be picky about tags and attributes names, if you want the plugin to work with future versions of the software which produced the XML document, especially when you do not have a schema or DTD to rely on)
  • It's Friday evening, dark weather outside, I'm alone in the house and the TV is broken (and even if it worked, there's nothing to see on TV anyway). Boys and girls, let's write an Ant task !

From the documentation, it appears that writing an Ant task should be an easy task :) . And yes, it is, once you go past all the little idiosyncracies. Like mandatory “to” string in a RegexpPatternMapper, although all you want to do is matching, not replacing. Like having completely different mechanisms for Path and FileSet (I've always thought a Path is a “dumbed down” FileSet, but I was completely wrong, a fileset is somewhat “smarter” but it only has a single directory).

The result is here, and everything you have to do is to download and put the antclipse.jar (7kB) in your ant/lib library and you're set (just remember to refresh Ant classpath if you're launching Ant from Eclipse).

What does it do ? Well, it creates classpaths or filesets based on your current .classpath file generated by Eclipse, according to the following parameters :

Attribute Description Required
produce This parameter tells the task wether to produce a “classpath” or a “fileset” (multiple filesets, as a matter of fact). Yes
idcontainer The refid which will serve to identify the deliverables. When multiple filesets are produces, their refid is a concatenation between this value and something else (usually obtained from a path). Default “antclipse” No
includelibs Boolean, whether to include or not the project libraries. Default is true. No
includesource Boolean, whether to include or not the project source directories. Default is false. No
includeoutput Boolean, whether to include or not the project output directories. Default is false. No
verbose Boolean, telling the app to throw some info during each step. Default is false. No
includes A regexp for files to include. It is taken into account only when producing a classpath, doesn't work on source or output files. It is a real regexp, not a “*” expression. No
excludes A regexp for files to exclude. It is taken into account only when producing a classpath, doesn't work on source or output files. It is a real regexp, not a “*” expression. No

Classpath creation is simple, it just produces a classpath that you can subsequently retrieve by its refid. The filesets are a little trickier, because the task is producing a fileset per directory in the case of sources and another separate fileset for the output file. Which is not necessarily bad, since the content of each directory usually serves a different purpose. Now, in order to avoit conflicting refids each fileset has a name composed by the idcontainer, followed by a dash and postfixed by the path. Supposing that your output path is bin/classes and the idcontainer is default, the task will create a fileset with refid antclipse-bin/classes. The fileset will include all the files contained in your output directory, but without the trailing path bin/classes (as you usually strip it when creating the distribution jar). If you have two source directories, called src and test, you'll be provided with two filesets, with refids like antclipse-src and antclipse-test.

However, you don't have to code manually the path since some properties are created as a “byproduct” each time you execute the task. Their name is idref postfixed by “outpath” and “srcpath” (in the case of the source, you'll find the location of the first source directory).

A pretty self-explanatory Ant script follows (“xml” is a forbidden file type on jroller, so just copy paste it into your favourite text editor). Note that nothing is hardcoded, it's an adaptable Ant script which should work in any Eclipse project.

Created with Colorer-take5 Library. Type 'ant'
<?xml version="1.0"?>
<project default="compile" name="test" basedir="."> <taskdef name="antclipse" classname="fr.infologic.antclipse.ClassPathTask"/>
<target name="make.fs.output">
<!-- creates a fileset including all the files from the output directory, called ecl1-bin if your binary directory is bin/ -->
<antclipse produce="fileset" idcontainer="ecl1" includeoutput="true" includesource="false"
includelibs="false" verbose="true"/> </target>

<target name="make.fs.sources">
<!-- creates a fileset for each source directory, called ecl2-*source-dir-name*/ -->
<antclipse produce="fileset" idcontainer="ecl2" includeoutput="false" includesource="true" includelibs="false" verbose="true"/>
</target>

<target name="make.fs.libs">
<!-- creates a fileset sontaining all your project libs called ecl3/ -->
<antclipse produce="fileset" idcontainer="ecl3" verbose="true"/>
</target>

<target name="make.cp">
<!-- creates a fileset sontaining all your project libs called ecl3/ -->
<antclipse produce="classpath" idcontainer="eclp" verbose="true" includeoutput="true"/>
</target>

<target name="compile" depends="make.fs.libs, make.fs.output, make.fs.sources, make.cp">
<echo message="The output path is ${ecl1outpath}"/>
<echo message="The source path is ${ecl2srcpath}"/>
<!-- makes a jar file with the content of the output directory -->
<zip destfile="out.jar"><fileset refid="ecl1-${ecl1outpath}"/></zip> <!-- makes a zip file with all your sources (supposing you have only source directory) -->
<zip destfile="src.zip"><fileset refid="ecl2-${ecl2srcpath}"/></zip> <!-- makes a big zip file with all your project libraries -->
<zip destfile="libs.zip"><fileset refid="ecl3"/></zip>
<!-- imports the classpath into a property then echoes the property --> <property name="cpcontent" refid="eclp"/>
<echo>The newly created classpath is ${cpcontent}</echo>
</target>
</project>

TODOS : make “includes” and “excludes” to work on the source and output filesets, find an elegant solution to this multiple fileset/directories issues, and most important make it work with files referenced in other projects.

I am aware that the task is very far from being perfect, so just download it if you're interested, try to use it, try to break it, and tell me what you think and how it can be improved. Also, if you're interested in the source, just send me an email, but be aware that it's Friday evening beer-induced source code, nothing to be proud of… It was only tested it with Ant 1.5.x so YMMV. I assume no responsibility if you use it a production environment.

Written by Adrian

March 1st, 2004 at 5:17 pm

Posted in Tools

Tagged with , , ,