September 2010


In honour of the Mozilla QA Haiku list:

Current tests are odd
Use jUnit for output
Many tools are free

I'm currently re-writing a Thunderbird plugin – and in the last few years have caught the unit-testing and test driven development bug… So, how do I make my life easy by integrating Hudson and Thunderbird?

It turned out to be suprisingly difficult, here's lots of instructions plus a download.

First job was to find a javascript interpreter and unittest framework:

  • jsunit – jsunit is no longer actively maintained and has become Jasmine.
  • Jasmine – tries to be a whole way of life, very very young, almost no documentation whatsoever.
  • jstest – no longer maintained and has a fatal version dependancy conflict: jstest requires version 1.6R5 of js.jar but envjs requires 1.7R2 or later…
  • rhinounit – rhino is an implementation of javascript in java. Rhinounit has a really horrible output format that dumps the entire java call-stack when a test fails.
  • xpcshell – is a command-line version of the javascript in firefox and thunderbird. It provides a full javascript browser environment including XMLHttpRequest implementations, so envjs is not needed. Also includes runxpcshelltests.py for executing tests.

So xpcshell it is (believe me – that took much longer to research than you took to read it!).

You need to compile a mozilla thunderbird package on your hudson server to get access to xpcshell. These instructions are boiled down from Simple Thunderbird build. Note that my version does not have debug enabled – this is deliberate and important.

apt-get build-dep thunderbird
apt-get install mercurial libasound2-dev libcurl4-openssl-dev libnotify-dev libiw-dev autoconf2.13
mkdir -pf /opt/kits/thunderbird
cd /opt/kits/thunderbird

# this takes a minute or two
hg clone http://hg.mozilla.org/releases/comm-1.9.2/
cd comm-1.9.2

# this takes several minutes
python client.py checkout

# edit/create .mozconfig and enter
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir-tb
mk_add_options MOZ_MAKE_FLAGS="-j4"
ac_add_options --enable-application=mail

# this takes ages, 2hrs on an EC2 m1.small! Come back tomorrow...
make -f client.mk

runxpcshelltests.py has a very non-standard output format. I've implemented a set of plugins for TAP and jUnit output formats – download runxpcsheltests.tgz – this is a drop-in replacement for /opt/kits/thunderbird/comm-1.9.2/mozilla/testing/xpcshell (if you've followed the build instruction above) but you can unpack it anywhere on your hudson server – for example, if you have a source directory then create a directory "scripts" and unpack the tgz file in it. This is also the reason for building mozilla without debug – if debug is enabled then xpcshell prints out various usage information that can't be trapped and excluded from the formatted test output.

Create a directory test/xpcshell in your source root and create a file all.sh in it containing the following:

#!/bin/bash

D=`dirname $0`
X=$D/../../scripts/xpcshell

/usr/bin/python2.6 -u /opt/kits/thunderbird/comm-1.9.2/mozilla/config/pythonpath.py    \
   -I/opt/kits/thunderbird/comm-1.9.2/mozilla/build  \
   $X/runxpcshelltests.py  \
   --output-type=junit --no-leaklog --no-logfiles \
   /opt/kits/thunderbird/comm-1.9.2/objdir-tb/mozilla/dist/bin/xpcshell  \
   $D

Now you can add test files to that directory, e.g. test_001_pass.js:

function run_test() {
        do_check_true(true);
}

The do_check_true function effectively checks against "arg == true" so I also created a head_test_funcs.js file in that directory to add more testing functions, e.g.:

function do_check_trueish(item, stack) {
  if (!stack)
    stack = Components.stack.caller;

  var text = item + " a true-ish value?";
  if (item) {
    ++_passedChecks;
    xpcshell_output.pass(stack, text);
  } else {
    do_throw(text, stack);
  }
}

The last step is to integrate with hudson. Click on the Configure link in a hudson job. In the Execute Shell section add the line

trunk/test/xpcshell/all.sh > report_xpcshell.xml

In the Post-Build Actions section tick on Publish JUnit test result report and in the Test Report XMLs section enter

report_*.xml

If you're already using junit tests then you may need different output file names to suit.

Groovy!  We can now do automated unit/regression testing on plugin base classes! The next step is to figure out how to provide the xul document environment and perform functional testing like Selenium does for browsers…

NB. I'd really like a Mozilla developer to pick up runxpcsheltests.tgz and drop it into the current Mozilla system – standardised test output is an item on the mozilla software testing wishlist.

Update: the mozilla team have taken this up as bug 595866.

There's been a meme going around recently that SQL and relational databases are somehow "too complicated", antiquated and "old hat" and should be replaced with something simpler and therefore more efficient.

This opinion is missguided (and perhaps slightly juvenile). Never-the-less a kind of "NoSQL" movement formed which has created some very useful things in the Distributed Hash Table (DHT) space. (In a video on Cassandra, Eric Evans claims to have invented the term NoSQL and wishes he hadn't!).

I hope to show that SQL and DHT (NoSQL) systems are complimentary to each other and not in competition.

Useful data storage system have "ACID" characteristics (Atomicity, Consistency, Isolation, Durability). SQL systems are very strong on Atomicity, Consistency and Isolation and can also achieve "5 nines" or more reliability in terms of Durability. But, even with highly partitioned data stores, the Consistency requirements often prove to be a bottleneck in terms of performance. This can be seen as an impact on Durability – i.e. database performance under sufficient write load can drop to a point where the database is effectively unavailable.

Sharding – completely splitting the database into isolated parts – can be used to increase performance very effectively, but Consistency, and queries that require access to the whole database, can become costly and complicated. In the latter case a proxy is usually required to submit the same query to all shards and then combine the results together before returning it to the client. This can be very ineffiecient when making range queries.

DHT systems trade Atomicity and Consistancy even further for more Durability under load (ie. performance scaling). Strictly speaking NoSQL can be implemented by a simple hash table on a single host – e.g. Berkley DB – but these implementations have no scaling capability so are not included in this discussion.

SQL implementations include: MySQL, Oracle, PostgreSQL, SQL server etc. DHT implementations include: Cassandra, HBase, membase, voldemort etc.. MapReduce implementations (e.g. Hadoop) are a form of DHT but one that can trade key uniqueness for the speed of "stream/tail processing".

 

SQL DHT
Immediate (or blocking) consistancy Eventual consistancy: reads don't wait for a write to completely propogate. Last write wins, conflict resolution on read etc.
Transactional Multiple-operation transactions implemented in the application.
Scale write performance by partitioning (utilise multiple disk spindles). Writes go to a privileged master or master cluster (which may also service reads).
Scale read performance by "fan out": multiple read slaves replicating from the master.

All nodes are functionally equal, no privileged "name" or meta nodes.
Scale reads and writes by adding new nodes (heterogenious preferably).

Relational. Indexes available on multiple columns (one column optionally a "primary" unique key). Non-relational, single index, key-value stores ("column family" DHT systems are just an extension of the single key)

 

The metric is then quite simple: if high-capacity (data volume or operations per second) is required, data is only ever accessed by primary key, and eventual consistancy is good enough, then you have an excellent candidate for storage in a DHT.

Other relational storage can be replaced with DHT systems but only at the cost of denormalising the data – the data is structured for reads not writes – but this should probably be avoided! You can use a DHT to speed up a RDMS with regard to the storage of blobs. Some RBMSs have a separate disk space for blobs, some include them in the normal memory space along with the rest of the data. If you have a DHT to hand then another technique is to split up any updates into 2 halves – the first uses the RDMS to store the simple, relational data and returns a primary key, the 2nd then store the blobs in the DHT against that primary key instead of in the RDMS. This shortens the write thread, and any associated locking, in the RDMS as much as possible.