Netuality

Taming the big bad websites

Archive for the ‘programming’ tag

Java going down, Python way up, and more …

8 comments

According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%. C# is selling more books from year to year and will probably level up with Java in 2008. Javascript is on the rise (due to AJAX, for sure) and PHP is on a surprising decrease path (although the job statistics indicate quite the contrary).

According to O’Reilly Radar, sales of Java books have declined in the last 4 years by almost 50%

In 2007, the number of sold Ruby books was larger than the number of Python books. In their article they qualify Ruby as being a “mid-major programming language” and Python as “mid-minor programming language”. However, after the announcement of Google App Engine the number of Python downloads from ActiveState has tripled in May. This should become visible in the book sales statistics, pretty soon.

Written by Adrian

May 24th, 2008 at 5:36 pm

Posted in Tools

Tagged with , , , , ,

Programming is hard – the website

leave a comment

A newcomer in the world of “code snippets” sites in programmingishard.com. Although the site is a few months old, only recently it started to gain some steam. Unlike its competition Krugle and Koders, this is not a code search engine but a snippet repository entirely tag-based, user-built. The author has a blog at tentimesbetter.com.

As for watering your mouth, this is a Python code fragment that I found on the site, for the classic inline conditional which does not exist “such as” in Python:

n = ['no', 'yes'][thing == 1]

Obviously it has the big disadvantage of having to compute both values no matter what the condition thing is, but is very short and elegant. Simple but nice code sugar.

Written by Adrian

August 2nd, 2006 at 11:07 pm

Posted in Tools

Tagged with , ,

Unicode in Python micro-recipe : from MySQL to webpage via Cheetah

leave a comment

Very easy:

  • start by adding the default-character-set=utf8 in your MySQL configuration file and restart the database server
  • apply this recipe from Activestate Python Cookbook (“guaranteed conversion to unicode or byte string”)
  • inside the Cheetah template, use the ReplaceNone filter:


#filter ReplaceNone
${myUnicodeString}
#end filter

in order to prevent escaping non-ASCII characters.

Now. That’s better.

Written by Adrian

April 14th, 2006 at 11:42 pm

Posted in Tools

Tagged with , , , ,

Monitor everything on your Linux servers – with SNMP and Cacti

5 comments

Two free open-source tools are running the show for network and server-activity monitoring. The oldest and quite popular among network and system administrators is Nagios. Nagios does not only do monitoring, but also event traps, escalation and notification. The younger challenger is called Cacti. Unlike Nagios, it’s written in a scripting language [PHP] so no compiling is necessary – it just runs out of the box1. Cacti’s problem is that – at its current version – is missing lots of real-time features such as monitoring and notification. All these features are scheduled to be integrated in future versions of the product, but as with any open-source roadmap nothing is guaranteed, Anyway, this article is focusing on Cacti integration because it’s what I am currently using.

Cacti is built upon an open-source graphing tool called MRTG and a communication protocol SNMP. SNMP is not exactly a developer’s cup of tea, being more of a network administrator’s tool2. However, a monitoring server comes extremely handy in performance measurement and tuning, especially for complex performance behavior which can only be benchmarked long-term : such as large caches impact on a web application, or performance of long-running operations.

But is that specific variable you need to monitor, available with SNMP out of the box ? There is a strong chance it is. SNMP being an extensible protocol, lots of organization have recorded their own MIBs and respective implementations. Basically, a MIB is a group of unique identifiers called OIDs. An OID is a sequence of numbers separated by dots, for instance ‘.1.3.6.1.4.1.2021.11′; each number has a special meaning in a standard object tree – this example, the meaning of ‘.1.3.6.1.4.1.2021.11′ is ‘.iso.org.dod.internet.private.enterprises.ucdavis.systemStats’. Even you can have your own MIB in the ‘.iso.org.dod.internet.private.enterprises’ tree, by applying on this page at IANA.

Most probably you don’t really need your own MIB, no matter how ‘exotic’ your monitoring is, because:

a) it’s already there, in the huge list of existing MIBs and implementations

and

b) you are not bound to the existing official MIBs, in fact you can create your own MIB as long as you replicate it in the snmp configuration on all the servers that you want to monitor.

To take a look at existing MIBs, free tools are available on the net, IMHO the best one being MibBrowser. This multiplatform [Java] MIB browser has a free version which should be more than enough for our basic task. The screen capture shown here depicts a “Get Subtree” operation on the ‘.1.3.6.1.4.1.2021.11′ MIB; the result is a list of single value MIBs, such for instance ‘.1.3.6.1.4.1.2021.11.11.0′ which has the alias ‘ssCpuIdle.0′ and value 97 [meaning that the CPU is 97% idle]. You can see the alias by loading the corresponding MIB file [select File/Load MIB then choose 'UCD-SNMP-MIB.txt' from the list of predefined MIBs].

From command line, in order to display existing MIB values, you can use snmpwalk:

snmpwalk -Os -c [community_name] -v 1 [hostname] .1.3.6.1.4.1.111111.1

3 and the result is:

.1.3.6.1.4.1.2021.11 OID (.iso.org.dod.internet.private.enterprises.ucdavis.systemStats)
snmpwalk -v 1 -c sncq localhost .1.3.6.1.4.1.2021.11
UCD-SNMP-MIB::ssIndex.0 = INTEGER: 1
UCD-SNMP-MIB::ssErrorName.0 = STRING: systemStats
UCD-SNMP-MIB::ssSwapIn.0 = INTEGER: 0
UCD-SNMP-MIB::ssSwapOut.0 = INTEGER: 0
UCD-SNMP-MIB::ssIOSent.0 = INTEGER: 4
UCD-SNMP-MIB::ssIOReceive.0 = INTEGER: 2
UCD-SNMP-MIB::ssSysInterrupts.0 = INTEGER: 4
UCD-SNMP-MIB::ssSysContext.0 = INTEGER: 1
UCD-SNMP-MIB::ssCpuUser.0 = INTEGER: 2
UCD-SNMP-MIB::ssCpuSystem.0 = INTEGER: 1
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 96
UCD-SNMP-MIB::ssCpuRawUser.0 = Counter32: 17096084
UCD-SNMP-MIB::ssCpuRawNice.0 = Counter32: 24079
UCD-SNMP-MIB::ssCpuRawSystem.0 = Counter32: 6778580
UCD-SNMP-MIB::ssCpuRawIdle.0 = Counter32: 599169454
UCD-SNMP-MIB::ssCpuRawKernel.0 = Counter32: 6778580
UCD-SNMP-MIB::ssIORawSent.0 = Counter32: 998257634
UCD-SNMP-MIB::ssIORawReceived.0 = Counter32: 799700984
UCD-SNMP-MIB::ssRawInterrupts.0 = Counter32: 711143737
UCD-SNMP-MIB::ssRawContexts.0 = Counter32: 1163331309
UCD-SNMP-MIB::ssRawSwapIn.0 = Counter32: 23015
UCD-SNMP-MIB::ssRawSwapOut.0 = Counter32: 13730

Each of this values has its own significance, like for instance ‘ssCpuIdle.0′ which announces that the CPU is 96% idle.
In order to retrieve just a single value of the list, use its alias as a parameter to the snmpget command, for instance

snmpget -Os -c [community_name] -v 1 [hostname] UCD-SNMP-MIB::ssCpuIdle.0

Sometimes, you want to monitor something which you do not seem to find in the list of MIBs. Say, for instance, the performance of a MySQL database that your’re pounding pretty hard with your webapp4. The easiest way of doing this is to pass through a script – snmp implementations can take the result of any script and expose it through the protocol, line by line.

Supposing you want to keep track of the values obtained with the following script:

#!/bin/sh
/usr/bin/mysqladmin -uroot status | /usr/bin/awk '{printf("%fn%dn%dn",$4/
10,$6/1000,$9)}'

The mysqladmin command and a bit of simple awk magic display the following three values, each on a separate line:

  • number of opened connections / 10
  • number of queries / 1000
  • number of slow queries

It is interesting to not that, while the first value is instantaneous gauge-like, the following two are incremental, growing and growing as long as new queries and new slow queries are recorded. Will keep this in mind for later, when we will track these values.

But for now, let’s see how these three values are exposed through snmp. The first step is to tell the SNMP daemon that the script has an associated MIB. This is done in the configuration file, usually located at /etc/snmp/snmp.d. The following line attaches the script [for example /home/user/myscript.sh] execution to a certain OID:

exec .1.3.6.1.4.1.111111.1 MySQLParameters /home/user/myscript.sh

the ‘.1.3.6.1.4.1.111111.1′ OID is a branch of ‘.1.3.6.1.4.1′ [meaning '.iso.org.dod.internet.private.enterprises']. We tried to make it look ‘legitimate’ but obviously you can use here any sequence you want to.

After restarting the daemon, let’s interrogate Mibbrowser for the freshly created OID, see the following image snmpwalk -Os -c [community_name] -v 1 [hostname] .1.3.6.1.4.1.111111.1 ; the result is:

enterprises.111111.1.1.1 = INTEGER: 1
enterprises.111111.1.2.1 = STRING: "MySQLParameters"
enterprises.111111.1.3.1 = STRING: "/etc/snmp/mysql_params.sh"
enterprises.111111.1.100.1 = INTEGER: 0
enterprises.111111.1.101.1 = STRING: "0.900000"
enterprises.111111.1.101.2 = STRING: "18551"
enterprises.111111.1.101.3 = STRING: "108"
enterprises.111111.1.102.1 = INTEGER: 0
enterprises.111111.1.103.1 = ""

Great ! Now we have the proof that it really works and our specific values extracted with a custom script are visible through SNMP. Let’s go back to Cacti and see how we can make some nice charts out of them5.

Cacti has this nice feature of defining ‘templates’ that you can reuse afterwards. My strategy is to define a data template for each one of the 3 parameters I want to chart, using the ‘Duplicate’ function applied to the ‘SNMP – Generic OID Template’.

On the duplicate datasource template, you have to change the datasource title, name to display in charts, data source type [use DERIVE for incremental counters and GAUGE for instantaneous values], specific OID and the snmp community. Do it for the three values.

Using the three new datasource templates, create a chart template for ‘MySQL Activity’. That’s a bit more complicated, but it boils down to the following procedure, repeated for each of the 3 data sources:

  • add a data source and associate a graph [I always use AREA for the first graph as a background and LINE3 for the other, but it's just a matter of taste]
  • associate labels with current or computed values: CURRENT, AVERAGE, MAX in this example

All the rest is really fine tuning – deciding for better colors, wether to use autoscale or fixed scale and so on. By now, your graph template should be ready to use.

Note that for the incremental values ['DERIVE' type data sources] I’ve used titles such as ‘Thousands queries/5 min’ – the 5 minutes come from the Cacti poller which is set to query for data each 5 minutes. The end result is something like this one :

On this real production chart you’ll see a few interesting patterns. For instance, at 3 o’clock in the morning, there is a huge spike in all the charted parameters – indeed, a cron’ed script was provoking this spike. From time to time, a small burst of slow queries is recorded – still under investigation. What is interesting here is that these spikes were previously undetectable on the load average chart, which look clean and innocuous:

To conclude, SNMP is a valuable resource for server performance monitoring. Often, investigating specific parameters and displaying them in tools such as Cacti can bring interesting insights upon the behavior of servers.

Some SNMP implementations in different programming languages:

  • Java: Westhawk’s Java SNMP stack [free w commercial support], AdventNet SNMP API [commercial, with a feature-restricted un-expiring free version], iREASONING SNMP API [commercial implementation], SNMP4J [free and feature-rich implementation - thank you Mathias for the tip]
  • PHP: client-only supported by the php-snmp extension, part of the PHP distribution [free]
  • Python: PySNMP is a Python SNMP framework, client+agents [free].
  • Ruby: client-only implementation Ruby SNMP [free]

1 If you’re running Debian, Cacti comes with apt so it’s a breeze to install and run [apt-get install cacti]

2 a bit out of the scope of this article, SNMP also allows writing values on remote servers, not only retrieving monitored values.

3 Replace [hostname] with the server hostname and [community_name] with the SNMP community – default being ‘public’. The SNMP community is a way of authenticating a client to a SNMP server; although the system can be used for pretty sophisticated stuff, most of the time the servers have a read-only passwordless community, visible only in the internal network for monitoring purposes.

4 In fact, a commercial implementation of SNMP for MySQL does exist.

5 The procedure described here applies to Cacti v0.8.6.c

Written by Adrian

March 5th, 2006 at 5:27 pm

Posted in Tools

Tagged with , , , , , , , ,

Review : Holub on Patterns

leave a comment

In order to read Allen Holub's new book, you'll certainly need some programming skills (Java, OOP and patterns to be more specific). On the back of the book, there's specified 'Intermediate to Advanced'. It certainly depends on what you mean by 'Intermediate'… because the book is not exactly a light read. But then again, we don't expect that from Allen Holub. We want interesting, insightful books from him, and 'Holub on Patterns' falls nicely into that cathegory. However, some 'intermediates' should prepare themselves for a harsh ride.

The volume is structured in 4 chapters. The first one contains some 'preliminaries'. Meaning : short explanations about why OOP is still incorrectly used, design patterns are not fully understood, plus a bonus of controversial statements like 'getters and setters are evil' and 'Swing is an over-engineered piece of junk' [well, maybe not exactly these words]. As a direct consequence of reading this chapter, the 'intermediates' will start banging their heads on the closest wall available : “My code sucks ! I swear I'll never blindly copy/paste again !”.

In the second chapter things really start to heat up. Allen explains why 'extends is evil' and interfaces are not evil. In case we needed an example of fragile-base-class problem, here we go with some MFC bashing (usual stuff). The chapter focuses also on some creational patterns such as Factory and (at great lenghts) Singleton. I especially liked the cool explanations of how to shut down a Singleton.

The third chapter discusses an [overly complex, on purpose] implementation of the 'Game of Life'. Between huge chunks of code (a bit much for my taste) scattered throughout the chapter, the author explains all the implementation choices: from Visitor to Flyweight. Some 60% of the GoF patterns are encountered in this chapter's code.

The fourth and last chapter contains 'production code', as the author declares. It's a small in-memory database, with and embedded SQL interpreter and a JDBC driver. Very solid example, but it'll probably scare away a few 'intermediates'.

It all ends with an Appendix containing a great 'Design-Pattern Quick Reference', presenting the most used patterns in a very pragmatic format. Each pattern is explained via a diagram, some Java code snippets, its motivation, pros and cons, and a very original 'Often Confused With' paragraph.

Unlike all the other pattern books you've read before, this is not a reference. It's a real programming book that you'll have to read from cover to cover. You'll also need solid programming skills in order to understand the last two chapters (and especially the last one).

My gripes:

  • too much code. Probably more than 1/3 of the pages are just printed code.
  • typos. There is a slightly disturbing amount of typos in the book, even in some code snippets [like for instance 'Sting' instead of 'String'].

However, these problems should not scare away any potential readers. Because of its original pragmatic approach, 'Holub on Patterns' is surely in the Top 10 Java books for 2005.

Written by Adrian

January 4th, 2005 at 10:48 am

Posted in Books

Tagged with , ,

If programming is like gardening …

leave a comment

… then a software team is like an aquarium.

“Programming is Gardening, not Engineering” says Andy Hunt (of Pragmatic Programmer fame) in one of his well-known Artima conversations.

Inspired by such an interesting ‘organical’ comparison, it’s my metaphor of a software team which behaves quite like an aquarium. I assume not all my blog readers are aquaria hobbists, so let me explain:

  • Permanent monitoring and adjusting. Left alone and unsupervised, an aquarium apparently manages to ‘survive’ by itself. However, subtle changes in water chemistry will slowly start to build up. Interesting fact is that fishes seem to cope well with these changes – until a certain balance is reached and they get sick and eventually die. In my experience, the threshold is rather thin, one day everything seems ok and the next day it’s a major disaster. The effort necessary to clean up the situation is significantly bigger than the effort spared by not taking care of the aquarium. The parallel here is quite obvious : you can’t manage what you can’t measure, you can’t control what you can’t manage. Software metrics, code reviews, frequent releases, testing and feedback, these practices are vital if you want a ‘healthy’ project and a ‘living’ team. Otherwise, beware, the inflexion point might be just a few days away*.
  • However, changes must be done gradually. Supposing that a major shift in water parameters was detected, taking immediate and radical measures will generally worsen the situation (unless the catastrophy is already there). It is highly recommended to distribute the change over a reasonable period of time and generally never try to influence two major water parameters at the same time (Ph and Gh for instance). Explanation: all these parameters are interconnected in intricate ways, by changing one you’ll automatically influence the others. By changing two or more, the outcome is hard to predict and might open the path to a disaster. There’s a nice parallel here. A major change in methodology with sudden introduction of multiple new/modified development practices, will only make the team unstable. Even if, globally speaking, the change is a highly beneficial one. ‘Good things come to those who wait’ … and measure … and change … and wait … and measure … and change …
  • A beautiful aquarium is a visible one. Transparent glass, lights and everything. You wouldn’t feed and keep your fishes if they were living in a black box and you are afraid to look inside it ?

*Of course [and fortunately], the developers do not get sick because of a reeking team/project, they simply leave.

Written by Adrian

November 4th, 2004 at 7:45 pm

Posted in Process

Tagged with ,

Review : Hibernate in Action

leave a comment

Disclaimer : this review is based on the MEAP draft. Things might be (a little) different in the final version.

From a documentation point of view, Hibernate is one of the most notable exception in the world of open-source LGPL'ed projects. Its website offers a plethora of information, from solid documentation (the reference has no less than 141 pages) and various FAQs to sample projects and third-party resources. The forum is quite active and you may get answers to tricky questions. Or a little bit of rough treatment in case you haven't RTFM – but that is understandable, given the number of questions that the authors have to answer every day.

Under these circumstances, one might wonder what Gavin King (Hibernate founder) and Christian Bauer (documentation/website maintainer and Hibernate core developer) can add in order to be able to write a 400-pages book about Hibernate. I mean – sure – only by joining the reference documentation, different FAQs and guides, one can easily 'extract' a hefty 'manuscript' with more than 200 pages.

Well, I am extremely glad to tell you that this is not the case. The book not only gets you up to speed with Hibernate and its features (which the documentation does quite well). It also introduces you to the right way of developing and tuning an industrial-quality Hibernate application. I consider myself a pretty seasoned Hibernate developer, being familiar with the API since its 1.2 version in Q1-2002 (if I remember well the first app when we used Hibernate). However, I was proved wrong by “Hibernate in action” which describes best practices and even API features that were unknown or vaguely known to me. That is, until now.

The first chapter, in the good tradition of all first chapters in the world, is an introduction. It's a very well written introduction about why do we need ORM solutions in OO applications. The chapter explains the O/R impedance mismatch, while declaring quickly that OODB suck (immature and not widely adopted). Wel'll also find out that EJB also suck from a persistence point of view (for various reasons). Which can be quite a surprise knowing that Gavin is one of the authors of EJB3.0 specs. Or, on the contrary, this will explain a lot of things in the new EJB specs.

Now that we have cleared the “why Hibernate” issue, let's continue to the second chapter. Which – tradition obliged – is a “Hello, world” and a “Let's get started” chapter. Here you go, almost 50 pages later you should be able to write simple Hibernate-based persistence layers and integrate within an application server, like for instance … Jboss ! Humm, well, why not ? They are sponsors of the Hibernate project, after all.

In the 3rd chapter, our fresh knowledge will be put to good use by starting the development of an online auction application called CaveatEmptor. This app will follow our reading progression and will grow bigger and smarter chapter by chapter. But for the moment, we are at the inception phase. What gives : a little bit of analysis, a stylish class diagram of the domain model and the resulting mapping file. And if you thought (based on 2nd chapter) that the mapping file is very intuitive and simple, you're in for a big surprise : it is, indeed, intuitive and simple ! Quite bizarre for an open-source project. As a matter of fact, the mapping file is one of the pivotal elements of Hibernate, since it addresses directly the O/R impedance mismatch, a recipy for transparent linking your POJOs and the constrained relational model. No wonder that a big part of this chapter is aimed at explaining why and how the mapping works in Hibernate. You'll see how class associations and inheritance translate at the metadata and mapping level. You'll start to understand the things that you took for granted in the previous chapter and you'll have that pleasant “uuh, I see” chain reaction. Hold on, it's just the beginning.

Because chapter 4 is going to explain once and for all the lifecycle of persistent object in Hibernate, their behavior from a persistence point of view as well as the available fetching strategies. And if you thought you already knew everything by heart from the documentation … well, maybe you do know everything by heart. Nevertheless, it's very well synthetized in chapter 4 and I'll recommend it anytime to a coworker eager for Hibernate knowlege.

In the next chapter (the 5th) the rollercoaster slows down a bit. That is, if you already know the behavior associated with the four possible isolation modes in transactions, what are the different types of locking, what (the hell) MVCC means and the importance of transaction scopes. Chances are you already know some of this stuff quite well, but everybody needs a refresher from time to time, especially when it's well explained and when it comes with versioning and caching (1st and 2nd level) in Hibernate as a desert. By the way, I thought that OSCache supports clustering, not only SwarmCache and JbossCache, as stated in the book. There's even a thoroughly explained example of using JbossCache as a level 2 clustered cache for Hibernate, but it shouldn't be too hard to convert to other types of caching systems.

Now, if I were the author of the book, I would have placed chapter 6 before chapter 5. But I am not the author, which is quite fortunate for you dear readers since Christian and Gavin are much more competent than me at writing books about Hibernate (and probably at some other unrelated domains). They have decided to go back to mapping in chapter 6, after the short transaction/caching intermezzo. Well, they should know better… it's time for a serious dose of advanced mapping. This chapter is attacking interesting subjects such as custom mapping types (simple or composite) and (finally) the mapping of collections. Special guests stars: the whole gang of “sets, bags, lists and maps”, together with explanations about their relational equivalent (associations, associations and associations !). Oh and yes “polymorphic association” (section 6.4.3) – I wasn't even aware that Hibernate is able to do that… guess I'm not that 'seasoned' (as a Hibernate developer) after all.

The 7th chapter is about “Retrieving objects efficiently” : about 45 pages for the 'retrieving' part and 6 pages for the 'efficiently' part. Fair enough ! You'll learn how to master basic HQL queries (parameters, pagination …). You'll get a grip on the query by criteria API, as well as on advanced stuff such as dynamic queries, filters, subqueries and native SQL (very powerful). At the end of the chapter there's the Hibernate-specific solution for the n+1 selects problem, query caching and result iterators.

Following this wealth of useful knowledge, the 8th chapter starts a bit dry. Nevertheless, after a short introduction about Hibernate in managed environments, you'll find yourself again in the land of advanced programming techniques : application-level transaction implementation ! This is mostly new stuff (at least for me) – a great collection of best practices for transactional behavior management in industrial-quality apps. Somewhat unrelated but still interesting, the chapter ends with legacy schemas integration and a smart implementation example for audit logging.

The 9th (and last) chapter is about the roundtrip development in Hibernate using the classical toolset : Middlegen and/or hbm2java and/or XDoclet. All the available techniques are presented in a very detailed, step-by-step manner.

Wait : don't close the book, there's more ! Ignore Appendix A (a short and rather uninteresting document about SQL fundamentals – that is, if you know SQL). Appendix B contains mildly un-fascinating ORM implementation strategies pour les connaisseurs (come on guys, I'm just a dumb user). But – Appendix C is a great collection of real-world stories and by all means read them all ! Especially the last one, a treasure of hard to find knowledge (no spoilers, please…).

In the end, I have to confess that there is something truly interesting about 'Hibernate In Action' : albeit very technical, it reads astonishingly easy – and this kind of books is unfortunately very rare nowadays. My congratulations to the authors for this excellent piece of work – it was worth the wait.

As for you dear potential reader, if you already know all the information detailed in the book, I bow before you, great Hibernate wizard. But if you don't, what are you waiting for ? Because, if you're going to read only one technical book this summer, make sure that it's 'Hibernate In Action' (or, at least chapters 6,7 and 8, if you are that good !).

Written by Adrian

August 5th, 2004 at 10:42 pm

Posted in Books

Tagged with , , , ,

Junit : it’s not [only] about the API

leave a comment

Being extremely busy lately, I arrive a bit late at the Junit destruction feast. While it is probably true that some guys with a certain gift for writing blog articles may “come up with something far more useful in a couple of days”, I think the discussion is missing an important point: there's a whole ecosystem living around Junit. We have Ant integration, we have the choice between code coverage tools (both commercial and open-source), plugins for mainstream IDEs and a certain number of useful or less-useful extensions. We have extensive documentation and a plethora of examples to feed the small fishes. Throwing Junit down the drain means throwing all these down the drain. Or, at least: write your own Ant integration, adapt a code coverage tool and rewrite the IDE integration, rewrite documentation and examples – this is not going to be done in “a couple of days”.

Another Junit advantage is that this little simplistic API is ubicuous. I mean, every developer heard about it and knows how to use it, unless of course he/she was living under a rock for the last few years. And I don't mean every Java developer, but just about every developer for a language under the xunit umbrella. Meaning : all the programming languages (unless you consider “languages” such as Whitespace, Brainfuck and INTERCAL).

Beck and Gamma have not only written some “crappy” classes and put the few “laughable” chunks of code on Sourceforge, they have done it first. Now, there is some well-founded criticism about the lack of evolution in Junit, but one thing is undeniable : it really did fill a niche, back then in 2000. The code may not be beautiful (and this is not good coming from XPers) but it serves its purpose : to provide a simple framework for unit testing.

Competition is the key here and smart newcomers on this “market” are good news for us programmers. But, it's gonna take some time and a lot of work to build a similar ecosystem, a similar mindshare and usurp Junit's kingdom. That would be of course more interesting to see than denial of four years of Junit influence in a few well-rounded, but futile phrases.

Written by Adrian

July 14th, 2004 at 9:55 am

Posted in AndEverythingElse

Tagged with , , , ,

Hallowed be thy tablename !

leave a comment

If you haven't had the opportunity to work on a really big project, naming is probably not on your top list of programming best practices. And you are certainly going to regret that when your project grows.

Of course, everybody, including good old Scott, knows that CUST signifies CUSTOMER and DEPT signifies DEPARTMENT. And statistically speaking, the chances for these abreviations to mean something else is very small – as long as your domain model is, also, quite small. But, when the number of classes in the domain is in the hundreds or even in the thousands you'll suddenly find out that CUST may signify CUSTOMS (as in 'Customs Tax'), CUSTOMIZATION or even CUSTARD. I am working right now in the development team of an ERP for agro-food industry and wouldn't be amazed to see such an attribute name. I've seen worst, some details of the implemented business model are a total blasphemy for human logic and common sense.

Anyway, the problem is even worse in these big projects because domain model classes are not written by hand, they are generated. While this is hardly a novelty for you (please don't laugh in the audience), it also means that analysts are composing the datamodel, then classes/mappings/SQL schema/docs are generated, finally programmers will write the business logic and infrastructure integration using the generated artifacts. Names are usually propagated all along the generation chain. And when a programmer finds 'Cust' in the name of an attribute, how does she know it's a 'Customer' and not some 'Custard' ? Especially when the documentation is scarce and the author analyst is in a well-deserved six-months sabbatical in Anctarctica.

Hence, the need for standardization. This is usually done via a dictionary containing the abbreviations and their meaning(s). The rule is very simple : every word in the datamodel must be composed of abbreviations from the dictionary. Some programmers might argue that there is no need for abbreviations and full words are ok – lovely code such as '.getSecondaryBillingAddressForService(currentBill.getBillableServicesList(i).get(currentService)).getStreet().getName()'. This is perfectly understandable, however let's not forget that some databases (Oracle, Sap DB, etc.) have issues with table and column names longer than 32 characters, like for instance refusing to create it in the first place. Which is mildly bothersome if you use a relational database*.

And the golden rules of domain model naming are :

  • Be a pedantic bastard. Don't just throw the dictionary in the wild and tell people 'yeah, pleease follow this standard'. Make automatic checking on every piece of datamodel feeding the code generator. The automatic checking should be done at each save operation if possible. I have implemented this inside an Eclipse plugin used by the project analysts: when hitting save on an entity containing invalid names, a window will immediately pop up and inform about the errors. Don't just display the errors, but completely forbid saving if the entity has naming issues. This will keep the naming absolutely pristine, however the analysts might be tempted to create a lynch mob. Do not give up.
  • Avoid synonyms, plurals, etc. This is a software product, not a grammar contest.
  • Throw some stats on the mail from time to time to tell how well the model is named. People will like that.

My current gig involves, among other interesting stuff, managing the naming tools in the various Java projects that we are developing. Unfortunately, the naming rules were not really enforced (they had no pedantic bastards before me ?), so the domain model is only partially compliant. Hence, I'm in the midst of developing tools for automatic renaming of model and the new code is going to disrupt the activity for a while (thank God for autocompletion features in modern IDE's !). Things would have been much smoother if the naming was enforced from the beginning. I think there is not such thing as 'too late' to put naming in order in a big project. And it'll absolutely be done, because there's very strong managerial support for this kind of tasks (main company shareholder and CEO is a former programmer himself, as well as a quality buff – 'when time permits'™).

Unfortunately, I had to allow some 'non-compliant' islands of code in the modules which are already deployed at customers. But, have no false hopes, sooner or later I'm gonna get that code too. I'm a pedantic bastard, and proud of it.

* Now, if you're using a wanabee storage solution like Prevayler to store gigabytes of business data (or more!), you have much bigger problems than naming. Please stop reading this article and do something about it.

Written by Adrian

June 26th, 2004 at 11:54 pm

Posted in Process

Tagged with , ,

Using jython to internationalize a PHP app

leave a comment

At first, this might seem a mind-boggling combination. What do
jython and PHP have in common (excepting the fact that I am a Python fan
and my current consulting task is in a PHP project) ?
Well, internationalizing a PHP app is pretty much a trivial task.
If you are a sensible PHP programmer insisting to use PEAR instead of randomly choosing a script from the tons of snippets
populating the “scripting websites”, I18N is probably the
safest choice.
Maybe – for you – application maintainability and performance are not exactly important concerns.
For me, they are. This is why I chose to store internationalized texts in files rather than database.
I'd rather keep the database for real data, which is created, modified, aggregated and such.
And I'd rather like to have an internationalized error message on the screen even if the database is down.
Now we know that we'll use I18N and text will be kept in some php files. However, I am no professional translator and
have no desire to translate or to manually maintain the correspondence between translators files and PHP files
(no, translators won't modify PHP code, stop this nonsense right away).
Code generation comes immediately in mind.
Basically, my first idea was to investigate wether the files used by the translators can be quickly transformed to PHP,
and if I am able to generate their formats from my own files (aka. “roundtrip internationalization process” ?).
Unfortunately, this is not an easy task – as the only clue was that the translators use Office tools such as Word or Excel, because they
rely upon some specialized translation software integrated with these products.
The easy choice is Excel, since it allows a better organization of data than having to search for tables in a Word document.
The hard choice is the tool that I'd use for automatically reading and even generating Excel files.

The difficulty comes from the fact I don't have Windows with Office installed on my desktop, just Gentoo Linux and OpenOffice.
Thus, I am unable to write a simple Python script which could perform my generation tasks via automation.
Fortunately, this is not the first time I am confronted with the issue.
I happen to know that there is a very nice Java tool that I wholeheartedly
recommend for your Excel processing needs :
JExcelApi.

Still, Java is a heavyweight programming language – it would be a really bad idea to fire up the

monster just for some easy processing of Excel files.
Here's why Jython comes naturally into equation. Four hours and about
100 lines of debugged code later, here I am sitting on top of a perfectly functional internationalization tool which :

  • generates PHP code from a big xls file (the root vocabulary) which centralizes all the internationalization texts
  • generates 2-language xls files for translators usage
  • updates the root vocabulary starting from the files modified by the translators
  • Automation scripts are already in cron and there's also a nice text document explaining translators where to get
    their files and where to put them after modification. The resulting script is not exactly fast, but this is tooling
    and not production so this should not be a problem after all.

    Whatever your project contraints are, give Jython a try and you'll be amazed … As they put it on the
    Useless Python site – If it were any simpler, it would be illegal.
    Finally there's a trick not quite related with Jyhon, nevertheless interesting.
    There is an easy way of solving the problem of translating phrases with real data inside them, with easy parameter swapping.
    We'll use the good old sprintf but not directly. We'll pass through a not so popular but extremely useful function,
    call_user_func_array. Suppose that our example needs the
    user name and authorization profile description to display inside a nice message. All you have to do is to define placeholders
    in I18N files which would fit as the first argument for sprintf. The following example should make it clearer:

    localization/en/login.php
    $messages = array(
    'loggedin'=>'You are authenticated successfully as user %1$s with profile %2$s.'
    );
    $this->set($messages);
    
    localization/fr/login.php
    $messages = array(
    'loggedin'=>'Vous avez le profile %2$s en tant qu'utilisateur %1$s.'
    );
    $this->set($messages);
    
    Simple passing of multiple parameters to I18N in PHP. Example function without error processing or data domain checking.
    #this is the multiple parameter function
    function complexTranslation($i18n, $label, $params)
    {
      return call_user_func_array('sprintf',array_merge(array($i18n->_($label)),$params));
    }
    
    Then, you have to initialize your I18N object. This can be done in a generic manner for all pages.
    #specific I18N initialization stuff
    require_once 'I18N/Messages/File.php';
    $g_language_dir = dirname($_SERVER['PATH_TRANSLATED']).'/localization/';
    $i18n =& new I18N_Messages_File($g_langCode,$script_name,$g_language_dir);
    
    Finally, use the function.
    #translate the successfull login message
    $loginbox = Tools::complexTranslation($i18n,'loggedin',array($operator->name,$profile->description));
    

    Written by Adrian

    March 1st, 2004 at 5:20 pm

    Posted in Tools

    Tagged with , , , , , ,