Netuality

Taming the big, bad, nasty websites

Archive for the ‘XML’ tag

Using HTTPUnit 'out of the box'

leave a comment

Recently, HTTPUnit project reached version 1.6. While this nifty API is mainly targeted at unit testing webapps, I have also succesfully used it for other purposes such as :

HTTPUnit as a benchmarking tool

There is a plethora of web benchmarking tools out there, both freeware and commercial. However, my customer requested some features for testing, that I've had troubles satisfying simultaneously with the existing tools:

  • the tests must run on headless systems (console mode, non GUI)
  • load testing should simulate complex and realistic user interactions with the site

AFAIK, all the testing tools that allow recording and replaying of intricate web interaction scenarios are GUI-based. And then, command-line tools are also unfit for the job, take for instance Apache Jmeter which is basically a command-line tool with a Swing GUI slapped on it. While Jmeter is great when it comes to bombing a server (or a cluster, for that matter) with requests, it seriously lacks features when it comes to scripting complex interaction (you'd better know your URL's and cookies by heart … well, almost).

Another problem I see with existing automated testing solutions is with their error detection mechanisms. While the vast majority of tools are scanning for standard HTTP error codes such as 404 or 500 in order to find out if the response is erroneous or not, errors in complex Java apps might come as plain web pages containing strack traces and environment information (a good example is the error page in Apache Tapestry).

So eventually I had to come up with an ad-hoc solution – basic idea was to leverage the existing HTTP unit tests for benchmarking purposes. I had to get out of the toolbox another rather under-rated open-source gem: JUnitPerf, in fact a couple of decorators for Junit tests. LoadTest is the decorator I'm interested in : it allows running a test in multithreaded mode, simulating any number of concurrent users. Thus, I am able to reproduce heavy loads of complex user interaction and precisely count the number of errors. The snippet of code is something like:

SimpleWebTestPerf swtp = new SimpleWebTestPerf("testOfMyWebapp");
Test loadTest = new LoadTest(swtp, nbConcurrentThreads);
TestResult tr = TestRunner.run(loadTest);
int nbErr = tr.errorCount();

Now, we'll call this code with increasing values of nbConcurrentThreads and see where the errors start to appear. Might as well write the results in a log file and even create a nice PNG via JFreeChart. Alas, things become a little trickier when we want to measure the bandwidth; in our case we'll have to write something very lame in the TestCase; and it goes like that:

private long bytes = 0;

private synchronized void addBytes(long b)
{
	bytes += b;
}

/**
 * After a test was run, returns the volume of HTTP data.
 * @return
 */
public long getBytes()
{
	return bytes;
}

public void testProdsPerformance() throws MalformedURLException,
  IOException, SAXException
{
	[...]
	WebConversation wc = new WebConversation();
	WebResponse resp = wc.getResponse(something);
	addBytes(resp.getContentLength());
	[...]
}

Then, in the benchmarking code, we'll do swtp.getBytes() in order to find out how many bytes passed between the server and the test client. It is still unclear for me if this value is correct if mod_gzip is activated on the server (we might actually measure the bandwidth of the 'deflated' pages !?).

In order to measure the elapsed time, we'll do a similar (lame) trick with a private long time member and a private synchronized void addTime(long millis). Unfortunately, we do not [yet?] have a getElapsedTime() for the WebResponse, so we'll have to use the good old System.currentTimeMillis() before and after extraction of each WebResponse. Of course, this is also measuring the parsing time of WebResponse, but this isn't usually a problem when you are testing a large number of concurrent users, as the parsing time is much smaller when compared with the response time of a stressed server. But, you'll need a strong system for the client-side tests.

Another tip I've found: use Random in order to test different slices of data on different test runs. This way, when you run, let's say, the 20 threads test, you'll kick different data compared to the previous test, on 10 threads. In this manner, the results will be less influenced by the tested application cache(s). It's perfectly possible to launch LoadTest threads with delays between thread activation, which means that the Random seed could be different within each simulated client – if you're looking for even more realistic behavior.

HTTPUnit as a Web integration API

Besides being a great testing tool, Httpunit is also a cool API for Web manipulation, you can use it to perform data integration with all sorts of websites. For instance, let's log on the public demo instance of MantisBT bug tracking system, as user 'developer', and extract the descriptions of the first three bugs in the list.

package webtests;

import java.io.IOException;
import java.net.MalformedURLException;

import org.xml.sax.SAXException;

import com.meterware.httpunit.WebConversation;
import com.meterware.httpunit.WebForm;
import com.meterware.httpunit.WebResponse;
import com.meterware.httpunit.WebTable;

/**
 * Simple demo class using httpunit to extract the description of
 * three most recent bugs from the MantisBT public demo,
 * logged as 'developer'.
 * @author Adrian Spinei aspinei@yahoo.com
 * @version $Id: $
 */
public class MantisTest
{

	public static void main(String[] args) throws MalformedURLException, IOException, SAXException
	{
		WebConversation wc = new WebConversation();
		WebResponse resp = wc.getResponse("http://mantisbt.sourceforge.net/mantis/login_page.php");
		WebForm wForm = resp.getFormWithName("login_form");
		wForm.setParameter("username", "developer");
		wForm.setParameter("password", "developer");
		//submit login, conect to front page
		resp = wForm.submit();
		//'click' on the 'View Bugs' link
		resp = resp.getLinkWith("View Bugs").click();
		//retrieve the table containing the bug list
		//you'll have to believe me on this one, I've counted the tables !
		WebTable webTable = resp.getTables()[3];
		//first three rows are : navigation and header, then a blank formatting row

		//interesting data starts from the 4th column

		System.out.println(webTable.getCellAsText(3, webTable.getColumnCount() - 1));
		System.out.println(webTable.getCellAsText(4, webTable.getColumnCount() - 1));
		System.out.println(webTable.getCellAsText(5, webTable.getColumnCount() - 1));
	}
}

The code speaks for itself: HTTPUnit is beautiful, intuitive and easy to use.

Other interesting HTTPUnit-related articles:

Written by Adrian

February 2nd, 2005 at 8:21 am

Posted in Tools

Tagged with , , , ,

JUnit Recipes – for the gourmets of Java unit testing

leave a comment

A mini-review

The book already has stellar ratings on Amazon, JavaRanch and other select places, and after reading a few chapters, the only thing I can do is add this post on the praise list.

Why only a few chapters ? Well, you see, this is not exactly the type of book that you read from cover to cover, it's in fact a 720-pages solid collection of JUnit best practices, the most comprehensive you'll ever found in organized, written form. Until now, I have found precise answers to all my JUnit questions.

The book is organized in three big sections weighting some 200-250 pages each. The first one ('The building blocks') is hugely useful if you are a JUnit newbie or even an absolute beginner. It's a detailed introduction to everything you'll need to know in order to start using JUnit: basics, usage in mainstream IDEs, testing patterns, managing test suites, test data and troobleshooting advice. Even seasoned programmers will find useful pieces of advice. I especially liked the 5th chapter ('Working with test data') : an exhaustive description of all the possible ways of organizing your test data. To be honest, it's one of the few chapters from the first section, that I've really read.

The second part ('Testing J2EE') covers testing of XML, JDBC, EJB, web components (JSP, servlets, Velocity templates, out-of-container testing & such) and J2EE applications (security and again some webapp testing – pageflow, broken links, Struts navigation). I can't really pronounce upon this section since I've read only a few subchapters (DBUnit and some related JDBC testing, as well as a few pages from the web testing chapter). But every piece of advice I've got was rock solid.

I've read in its entirety the third part ('More JUnit techniques'). Under this bland title you'll find a group of not-so-common JUnit info such as usage of GSBase addon (funny that I wasn't aware that such a useful addon exists). There's also an intriguing 'Odds and ends' chapter containing some interesting recipes (here's a good one for the QA-freaks like me : 'verify that your test classes adhere to basic syntax rules of JUnit' – sure, why not ?).

Something that I've really missed is a chapter dedicated to mock objects recipes. Yes, there is a quick explanation in the first section – and a reference to Easy Mock or some other mocking API pops now and then in different chapters – there's even an essay about mocking at the end of the book. But the main mock objects dish isn't there. I would've also loved to see some automated GUI-testing recipes (Abbot, Marathon & related tools). But then again, it's a >700-pages book so I'm probably asking too much.

To conclude, 'JUnit Recipes' is the best JUnit book I've ever came into contact with and it supercedes on my list 'JUnit in Action'. Which, interesting enough, was published by the same Manning almost a year ago. Can't have too many good JUnit books, don't they ?

Written by Adrian

November 21st, 2004 at 11:41 am

Posted in Books

Tagged with , , , , , ,

XML descriptors in PHP frameworks – considered harmful

2 comments

No, I am not a seasoned PHP programmer and I do not intend to become one. But we do live in a harsh economy where all IT projects are worth considering, thus my occasional incursions in the world of of PHP-driven websites.

I am not new to PHP either, but – coming from a Java world – immediately felt the need of a serious MVC framework.
Nobody wants to reinvent the wheel each time a new website is built. Just launch the obvious “PHP MVC framework” on Google and the results pages will be dominated by four open-source projects :

  • PHPMVC is probably the oldest project and implements a model 2 front controller/li>
  • Ambivalence declares itself as a simple port of
    Java Maverick project
  • Eocene a “simple and easy to use OO web development framework for PHP and ASP.NET”,implementing MVC and front controller
  • Phrame is a Texas Tech University project released as LGPL, heavily inspired by Struts.

The choice is not easy. There are no examples of industrial-quality sites built with either of these frameworks.
(some may say there are no examples of industrial quality sites built with PHP but let's ignore these nasty people for now).

There are no serious comparisons of the four frameworks, neither feature-wise nor performance-wise.
In the tradition of open-source projects, the documentation is rather scarce and examples are “helloworld”-isms.
Yes I am a bloody bastard for pointing out these aspects – since the authors are not paid to release these projects – and perhaps I could contribute myself with some documentation. However, when under an aggressive schedule I feel it's easier to write my own framework instead of understanding other people's code and document it thoroughly.
However, I have a nice hint for you. The first three frameworks are using XML files for controller initialization (call it “sitemap”, “descriptor” or otherwise; but it's just a plain dumb XML file). So you should safely ignore them in a production environment.

Because, the “controller” is nothing more than a glorified state machine. The succession of states and transitions (or “actions” or whatever) should be persisted somewhere. XML is probably a nice choice for Java frameworks, where the files are parsed and the application server keeps in memory a pool of controller objects.

But: PHP sessions are stateless. The only way of keeping state is via filesystem or database, usually based on an ad-hoc generated unique key, which is kept in the session cookie. More: PHP allows native serialization only for primitive variables; a complex object such as the controller can not be persisted easily, so it has to be retrieved from XML and fully rebuilt. Unlike in Java appservers, objects cannot be shared between multiple session, thus pooling is not an option. Thus, in PHP, the XML approach is highly un-recommended, since this means that the XML files are parsed for each page that is viewed on the site. Although PHP's parser is James Clarks's Expat, one of the fastest parsers right now (written in C), note that the DOM object must be browsed in order to create the controller object (which is becoming more and more complex as the site grows). This is called heavy overhead, no matter how you look at it.

There are a few reasons about why you need XML in a web framework, however this does NOT apply to PHP apps. Myth quicklist:

  • it's “human-readable”. Come on, PHP is stored is ASCII readable files and even if you use Zend products to compile and encrypt your code, why on earth would you allow readability and modification of the controller state machine on the deployment site ?
  • easier to modify than in code. This is probably true for Java and complex frameworks, but in PHP is significantly simpler than Java.
  • automatically generated from code by tools such as Xdoclet or via an IDE. If you're writing it in Java, because PHP does not have such tools.

This means that the only serious candidate (between these considered here) for a PHP MVC framework is Phrame, which stores the sitemap as a multi-dimensional hashmap. Thus, you should either consider Phrame or (for small < 50 screens) sites you'll be better off writing your own mini-framework, with a state machine implemented as a hashed array of arrays and some session persistence in the database. I chose to serialize and persist an array containing primitive variables, using PHPSESSID as the primary key in order to retrieve and unserialize the array, all coupled with a simple "aging" mechanism for these users with the nasty habit of leaving the site without performing logout first.

Finally a last world of advice : use PEAR ! This often overlooked library of PHP classes includes a few man-years of quality work. You'll get a generic database connection layer (PEAR-DB) along with automatic generation of model classes mapped on the database schema (DB_DataObjects), a plethora of HTML tools (tables, forms, menus) and some templating systems to choose from. All in a nice easy to install and upgrade package.

Don't put a heavy burden on your upgrade cycle using heterogenous packages downloaded from different sites on the web, just use PEAR.

Or simply ignore the PHP offer and wait patiently for your next Java project. Vacations are almost over.

Written by Adrian

October 29th, 2004 at 8:53 am

Posted in Tools

Tagged with , , , ,

MVC and Front Controller frameworks in PHP – more considerations

leave a comment

Having recently stumbled upon this thread on Sitepoint community forums, I found a certain Mr. Selkirk advocating page controllers instead of front controller – meaning that the state machine logic is distributed in each page/state. I have some pragmatic problems with the approach since this means that a large (hundreds of pages) site would imply modifying each page if a new generic transition appears.

On this same thread, there's a sensible approach coming from an Interakt guy which I also happen to know personally [hi, Costin]. He describes PHP website design using MVC (from a controller point of view) as having 3 steps :

  • Design your site with a fancy IDE which will generate a lot of XML gunk
  • Let the framework compile the XML to good old PHP code, prefectly human-readable and all
  • Enjoy ! Speed and structure.

Unfortunately his solution is not exactly open-source nor free, and I'll gladly use my 500 maccaronis for a shiny new flat screen. Besides, it looks like my PHP episode is coming to an end (I see some serious consulting on Java apps on the horizon). Anyway my piece of advice to Costin (as a non-customer) is “don't do any serialization, keep the code clean as the bottleneck usually comes from the database – and the world will be a better place to live”.

On a lighter note, there is John telling us cool stuff about PHP:


Does this reloading of all data on every HTTP request mean that PHP is not scalable? Fortunately, as I've said before in other posts, that's a superficial view of things. The reloading of data is precisely the reason why PHP is scalable. The way PHP works encourages you to code using state-less designs. State information is not stored in the PHP virtual machine, but in separate mechanisms, a session variable handler or a database.
This makes PHP very suitable for highly scalable shared-nothing architectures. Each webserver in a farm can be made independent of each other, communicating with a central data and session store, typically a relational database. So you can have 10 or 1000 identically configured PHP web servers, and it will scale linearly, the only bottleneck being the database or other shared resources, not PHP.

Whew ! Only if vendors 'knew' that removing state information from their appservers, it would instantly become very suitable for highly scalable shared-nothing architectures. Somenone should tell this to IMB, BEA and Sun. And maybe to Microsoft. Oh, only if the things were that simple !

PS For those wondering about my sudden passion into PHP, there is an older entry on my weblog explaining the whos and the whats.

Written by Adrian

October 29th, 2004 at 8:44 am

Posted in Tools

Tagged with , , , ,

(Undocumented?) HTTP-based REST API in Jira

leave a comment

While the REST API is mentioned in the Top 10 Reasons to use Jira, I can hardly see any reference to such an external API in the documentation (I mean, besides XML-RPC and upcoming SOAP support) and even Google can't clear the issue. But I can confirm you, it is possible to fully(?) automate Jira using HTTP requests.

Recently, I was asked to programatically login a user into Jira and open a browser with the results of an advanced search. This is part of a really slick integration between one of my employer's products and a JIRA server – mainly helping our testers to avoid bug duplication and also providing insightful statistics for QA dept.

I've started doing it the hard way, trying to obtain and pass JSESSIONID and other useless stuff between my application and the browser, until I realized that Jira can be fully controlled through HTTP in a REST-like manner. Let me explain. Normally, if you are not logged into Jira and launch a browser with a carefully crafted search request (well, you know how to reverse enginner POSTs into GETs don't you ?) – then a graceful invitation to log in is everything you'll ever obtain. But, if you add at the end of your request “&os_username=” + USER + “&os_password=”+ PASS bang ! not only you obtain the desired search results, but you are automagically logged into Jira for the rest of the browsing session. Yes, yes yes : here I come, integration ! A couple of hours later, I am able to programatically insert bug reports, extract bug details, compute statistics and open customized advanced searches.

To quote a classic : Jira ! Jira ! Jira !. Docs would be nice, though.

PS I'm testing this on a Jira Professional 2.6.1-#65. YMMV.

Written by Adrian

September 17th, 2004 at 4:45 pm

Posted in Tools

Tagged with , ,