S h o r t S t o r i e s

// Tales from software development

Archive for August 2010

The unit-tested code has the most bugs

leave a comment »

One of the projects that I worked on last year has now been running in a live environment for 14 months. It handles around 5000 patient data updates per day and has performed without a single bug showing up. Until today…

It didn’t take long to find the code that was throwing the IndexOutOfRange exception. It was a method that attempted to parse a string containing a name into name components – forename, middlename, and surname. The puzzle was why the exception was occurring when the code had executed successfully for the past year.

I looked at the data that was being processed at the time of the exception and saw that the string containing the name actually contained just a single comma. Although this isn’t a valid name I still wanted to update the method so that it returned the correct output, a name structure with blank name components, rather than failing with an exception.

When I reviewed the code I noticed that the one assembly with any unit tests was the one where this method was implemented. There was severe time pressure on this project and some of our normal process engineering was bypassed. I’m not making any excuses for this but it happened. Unit tests were not written. Except for this one assembly (out of 18 assemblies in the project).

So, why did the one bug that’s shown up in the live environment occur in the assembly that had the unit tests ? At the very least it seems ironic. At worst, could it be inferred that writing unit tests makes code more buggy!

The answer is more prosaic. The assembly with the unit tests mostly contains utility methods used by other assemblies. This makes it easy to write unit tests for these methods because they are all public and have well defined inputs and outputs. It’s also a no-brainer that if there’s only sufficient time to write unit tests for one assembly then this one is a prime candidate because so many of the other assemblies depend on it. And, finally, I had a suspicion that if a bug was going to lurk somewhere in the code I’d written then it’d be in some of the routines in this assembly.


Written by Sea Monkey

August 27, 2010 at 8:00 am

Posted in Debugging, Development

Tagged with ,

It’s not impossible, it just costs more

leave a comment »

Scope creep: During the implementation phase of a software project clients always ask for more functionality than was originally agreed.

I usually do what I can to incorporate their requests but there’s always the caveat that it might take more time and therefore cost more. And, usually, once the project budget has been agreed it’s difficult to get more funding. I usually say “Almost anything is possible with software but it will take more time and money”.

In season 4 episode 5 of American Chopper, “Senior’s Vintage Project 1”, Mikey can be seen wearing a t-shirt with the slogan:

It’s not impossible, it just costs more

Well put!

Written by Sea Monkey

August 24, 2010 at 8:00 am

Posted in Development

Tagged with

NUnit StringAssert.DoesNotContain()

with one comment

I’ve just finished writing a small test application that needed to verify that an error message did not occur in an output file.

I looked through the list of methods implemented on the NUnit StringAssert class and couldn’t see anything that would suffice. There is a Contains() but not a DoesNotContain() or a NotContains(). That’s a shame because although Contains() is a standard string method and DoesNotContain() isn’t, at least in code you can easily negate the result, e.g.:

if (!stringValue.Contains("something"))

but you can’t do this with an NUnit assertion because the result is an exception not a boolean value.

First I used a quick fix that consisted of testing whether the string value contained the error message and throwing an NUnit AssertionException if it did. But this wasn’t pretty and I started looking at how to implement the assertion properly using NUnit’s extensibility features.

Although the correct way to implement the new assertion is to write a custom constraint I found there was another way to get the functionality I needed. Using Reflector on the NUnit.Framework assembly, I could see that the StringAssert.Contains() method is just a wrapper for this:

Assert.That(actual, new SubstringConstraint(expected), message)

Looking down the list of Constraint classes I noticed the NotConstraint which, as implied by its name, negates the result of another constraint. So, I didn’t need to implement anything at all, I just needed this:

Assert.That(actual, new NotConstraint(new SubstringConstraint(expected)), message);


Written by Sea Monkey

August 17, 2010 at 8:00 am

Posted in Development

Tagged with

Writing an NUnit test runner

leave a comment »

I would guess that most developers using NUnit use the supplied console or GUI test runners to run their tests and have never considered why they might write their own test runner. That’s certainly true of me but I recently wrote a test application where it wasn’t really possible to use the NUnit test runners but it still made sense to use the classes implemented in the NUnit.Framework assembly.

The requirement was to create a test application for a command line database update tool. The tests would be defined in files that could easily be created and updated using a text editor.

My first attempt used a base class that did all the hard work of performing the test setup, running the database update program, performing the tests, and the test teardown (all defined in the test file) and a set of concrete classes derived from this that did nothing except specify the name of the test.  The tests were executed using the NUnit GUI test runner.

The only problem was that adding a new test meant adding a new concrete class and recompiling the assembly. There were two possible solutions: (1) create a list of test files and then use code generation to create the concrete classes, compile the assembly, and then invoke the NUnit test runner, or; (2) make the base class a concrete class, create an instance of it for each test file, and call its methods using a custom test runner.

I started working on option (2) because it didn’t require much additional work. I quickly realised that no changes were required in the test code that I’d already written. What was really required was simply a test runner that would load the test definition files, one at a time, create an instance of the test class, and then do exactly what the NUnit test runners do: call any methods marked with the TestFixtureSetup attribute, then call methods marked with the Test attribute, and finally the methods marked with the TestFixtureTeardown attribute. It would have to handle exceptions thrown by all these calls but specfically by the Test methods because NUnit assertion failures result in an AssertionException being thrown.

The result is exactly what was needed. Each test file is an XML file with this structure:

<TestDefinition name="Test1" description="Tests update">
   <SQLCommand>DELETE FROM person WHERE idno = '9999999999';</SQLCommand>
   <SQLCommand>INSERT INTO person(lastname, forename, idno) VALUES('SMITH', 'JOHN', '9999999999');</SQLCommand>
  S 00003 9999999999  ; Select row / Item / value
  U 00002 DAVID       ; Update / Item / value
  <SQLAssert name="check_forename" description="Check that forename was updated">
   <Query>SELECT forename FROM person WHERE IDHNO='9999999999';</Query>
     <ResultColumn name="forename" value="DAVID" />
  <ErrorFileAssertion name="no_errors" description="Check that no errors occurred" doesNotContain="***ERROR***" />
   <SQLCommand>DELETE FROM person WHERE idno = '9999999999';</SQLCommand>

The <Setup> and <Teardown> elements specify the SQL statements required to create the test scenario and then clean up afterwards. The <Data> element contains the input to the database update program that is executed. The <SQLAssertions> and <ErrorFileAssertions> specify assertions that are applied to the database and the contents of the database update error log, respectively.

These files are easily created and modified by the testers and all that’s required to run the tests is to execute the test runner.

Written by Sea Monkey

August 13, 2010 at 8:00 am

Posted in Development

Tagged with

Using SQLite as an ‘in memory’ database engine

leave a comment »

I recently needed a lightweight database engine to process data collected from logfiles. Ideally, I wanted a small database engine with minimal deployment requirements and that stored and processed data in memory.

After considering a number of possibilities I went with SQLite. No deployment configuration is required and creating a database is as simple as specifying a filename in the connection string.

I was pleasantly surprised with the capabilities and performance of SQLite but then I noticed that it implements an option to store temporary tables in memory rather than on disk. So, I wondered if it could provide the ‘in memory’ requirement too ?

After a connection is opened, my application executes the following Pragma statement:

PRAGMA temp_store = MEMORY;

Then all tables are created as temporary tables, e.g.:


I did a few quick tests to check the performance improvement. Previously, writing 150,000 rows of data took around 5.5 seconds. With the changes the same processing dropped to around 0.25 seconds.

Written by Sea Monkey

August 11, 2010 at 8:00 am

Posted in Development

Tagged with

CodePoint Open and MySQL spatial extensions

leave a comment »

The geographical location data in the Ordnance Survey’s CodePoint Open dataset is expressed as 6 digit eastings and northings values in the British National Grid Reference System. It’s not immediately obvious how this data can be used.

In fact, because the 6 digit values represent an easting or northing at 1 metre resolution, SQL spatial extensions can be used to work on the data directly to produce useful results. For example, the Distance() function can be used against two Point values to return the distance between two points in the same scale as the geometry system in use, i.e. in this case, in metres. Or, it would be if MySQL implemented the Distance function but there’s an easy workaround for this.

First, how do we create a Point value ? The OpenGIS standard defines two formats for expressing geometry data: Well Known Text (WKT) and Well Known Binary (WKB). For example, expressing a Point value in the British National Grid Reference System in WKT might look like this:

POINT(530622, 183959)

(This is actually easting and northing for the London postcode N1 0AA.)

To create a Point value in MySQL you need to pass this WKT expression, as text, to the PointFromText() function:

SELECT PointFromText('POINT(530622 183959)');

If you display the value created you’ll see that it’s binary data. It can be rendered as WKT or WKB using the appropriate function:

SELECT AsWKT(PointFromText('POINT(530622 183959)'));

Ideally, to calculate the distance between two postcodes you’d create Point values for each location and then use the Distance() function to return the distance between them. As MySQL doesn’t yet implement Distance() we have to use this workaround: create a LineString value using the two location values and then call the GLength() function to return the lenth of the line.

The WKT for a LineString is:

LINESTRING(x1 y1, x2 y2)

The following SQL statement creates the LineText value by constructing the WKT value from two postcode locations and returning the GLength value scaled to from metres to kilometres:

    p1.PC ,
    p2.PC ,
    GLength(LineStringFromText(CONCAT('LINESTRING(',p1.`EA`, ' ', p1.`NO`,', ',p2.`EA`, ' ', p2.`NO`,')'))) / 1000 AS Distance_Km
        postcode p1,
        postcode p2
        p1.PC = 'N1 0AA'
        p2.PC = 'EH1 1AD';

The second postcode is in the Edinburgh area and the result is:

PC          PC          Distance_Km
N1  0AA     EH1 1AD     530.76

One thing to note about the CodePoint Open dataset is that the fomatting of the postcode in the PC field is inconsistent. A postcode consists of an outward code and an inward code separated by a space but some of the postcodes in the dataset do not have a separating space, some have a single space, and others have two spaces. The CodePoint Open User Guide suggests normalizing the formatting of this data to make it usable in your application.

Written by Sea Monkey

August 9, 2010 at 8:00 am

Posted in Development

Tagged with

Free postcode data

leave a comment »

The government and Royal Mail have prevaricated for years over how postcode data should be made available. Last year the government promised that it would make the data available by April 2010 at no charge to encourgage e-business initiatives but then backed down on this, apparently because Royal Mail was not prepared to release its postcode dataset for free. As a compromise, a subset of the Ordnance Survey’s CodePoint product has now been released as a free to use dataset.

CodePoint Open is a list of postcodes, geographical co-ordinates of the centre of each postcode area, and the government and health service units associated with each postcode.

Many potential users have complained that this is too little data to be of use. For example, it does not allow the address lookups that most UK based web sites now use. However, it is still potentially very useful data and actually provides data that is missing from most commercially available datasets.

My recent projects have all been health service related and a common issue is trying to identify the relationship between a health service area and the patient’s postal address. The CodePoint Open dataset includes this data making it possible to determine which postcodes occur within a health service unit.

Details of CodePoint Open are here:


The dataset is requested from this page:


Click the ‘download’ checkbox next to the CodePoint Open entry then click ‘Next’. You’ll be asked to provide an email address to which the download instruction are sent.

Written by Sea Monkey

August 4, 2010 at 8:00 am

Posted in Development

Tagged with