S h o r t S t o r i e s

// Tales from software development

Archive for November 2008

Wabi sabi

leave a comment »

Whether we like it or not, there is a philosophical aspect to software development. For the past thirty years there has been a debate as to whether computer programming is a craft or a science. The reality is that, like traditional engineering itself, what we do is a combination of science, art, and craft.

The conflicts of time, scope, and quality require that a compromise is agreed and adhered to by the development team. A good developer adapts to what is required for this particular project – typically the quality of the code is compromised by the short timescale available to implement it. Some developers, often the best, find it difficult to write code that they know to be less than optimal and many developers find it hard to be positive about a project that they feel is compromised by the quality of the code.

Wabi sabi is a Japanese concept that is usually translated as ‘imperfect, impermanent, incomplete’ or ‘nothing is permanent, nothing is forever. nothing is perfect’. It’s usually used to describe an artistic philosophy that recognises that the artist cannot create a perfect piece of art and should not only accept this but also embrace it.

‘Imperfect, impermenent, incomplete’ sounds like a description for every software development project that I’ve ever been involved with. Perhaps wabi sabi is something we should use to help us. If we recognise the limitations of a project in terms of the time available and agreed quality level then perhaps we can learn to embrace these facts and find something positive in what we’re doing.

From a personal perspective, understanding wabi sabi has helped me in some of my own software projects. I find it easier to accept that the code I write is not perfect. As long as it does what is required of it then it shouldn’t matter that it is not perfectly written or isn’t a complete solution for all possible scenarios. All software is impermanent.

Written by Sea Monkey

November 25, 2008 at 8:00 pm

Posted in Development

Tagged with

Ken's First Law revisited

leave a comment »

In Ken’s First Law of Problem Diagnosis I recounted how one of my former colleagues had observed that the length of time it takes to diagnose a problem is often inversely proportional to the size of the fix. Ken’s First Law struck with vengeance last week.

On Wednesday Rob asked me to look at a bug that had been reported with our client’s ecommerce system. The order system automatically places an order on hold if it fails one of several validation rules. Manual intervention is then required to either accept or reject the order. If the order is rejected then an email is automatically generated and sent to the customer to inform them. The bug report indicated that rejected orders were being processed correctly except that the customer email was not being generated and sent.

Unfortunately, the code in the systems that make up the ecommerce application was written several years ago and although there is some basic error logging it doesn’t have diagnostic features such as tracing. So, getting to the bottom of this problem wasn’t going to be easy.

I built and ran the web site code on my development machine and confirmed that the correct values were being passed from the UI. After about half a dozen calls down through various components, the web site code called a web service, part of the Order Administration Services (OAS) system, to reject the order.

Debugging the web service locally wasn’t an option because of the complexity of configuring OAS to run on my development machine. So, I carefully examined the code that was being called and followed the control flow down through about four levels. Eventually, the code made a call to another web service to create and send the customer email. This web service was one of another group of web services called Commerce Website (CW) services. Configuring the CW services to run locally was feasible, so I did this and then called the web service and debugged it. The control flow passed down about five levels and eventually invoked a stored procedure that added a row to a table. Each row on this table represents an email request. A separate task runs every few minutes to process these rows and generate and send the emails.

Over the next few days I seemed to go around in circles chasing every possibility but eventually coming up with nothing. There didn’t appear to be a problem in the code itself but it was impossible to run all the code, end to end, on my development machine so it was difficult to prove conclusively that there wasn’t a problem with the code. I seemed to be coming at this problem from the wrong angle anyway – I was trying to replicate the problem not prove that there wasn’t one.

The problem had been logged in the test system and had been replicated there. I logged into the test system and tried the replication steps and confirmed that the problem was still occurring.

I wandered over to where the Environments guys sit. They’re responsible for the test and live systems and for deploying code builds into these environments. One of them spared me some time to go through the SQL logs, the Windows event logs, and the application’s own SQL logging table, looking for any evidence of exceptions or errors in the test system. We didn’t find anything.

Back at my desk I created a simple web service client to call the OAS web service. I configured the client to call the service in the test environment. This turned out to be more difficult than I expected because the environments are configured with DNS aliases and suffixing. In the end I specified the IP address in the web service URL to ensure that I was calling the correct machine. When I ran the client I found that the call to the OAS service didn’t generate an email. So I knew the problem was somewhere between the call into the OAS service and the execution of the store procedure in the CW web service.

The next step was to create a client to call the CW web service. This was a bit more complicated because the web service method needed to be passed an instance of an Order object. After 30 minutes of cutting and pasting code from the source code for the OAS web service, I had a working client. Again, I configured the URL using the IP address of the machine where the test system’s copy of the CW services was located. The result of running the client was interesting: the call to the CW web service successfully generated the customer email.

As the call to the CW web service successfully generated an email but the call to the OAS service (which then called the CW web service) did not, it appeared that the OAS web service method was failing or was not successfully calling the CW service.

At least this was some progress. I had narrowed the failure down to a few call layers in the OAS web service and the possibility that the OAS web service was failing to call the CW web service.

I spent some more time with the Environments guys. We tried to determine if there was any reason why the OAS service would fail to call the CW service. Because these services run on different machines connectivity issues sometimes arise if, for example, firewalls are not correctly configured. We couldn’t find anything to indicate that the OAS web service would fail to call the CW web service. So, that was another possibility ruled out.

Back at my desk, I carefully looked through the code yet gain. I couldn’t see any obvious point of failure in the OAS web service. The code was straightforward enough and much of it was replicated in the web service client that I’d created to call the CW web service. So, I started leaning towards the idea that the problem was that the CW service was not being called even though we’d just failed to find a connectivity issue that might explain the failure.

I ran my two web service clients again just to make sure that I wasn’t imagining the results and to confirm my suspicions that the OAS web service was failing to successfully call the CW web service. At this point Rob wandered over to see how I was getting on. When I explained where I’d got to and that I was stumped on what to do next, he said that I needed to talk to Dave. He’s the Environments guy responsible for deploying releases.

Within 15 minutes we’d identified the problem. The web.config for the OAS web services contains settings for the URLs of all the web services that are called. Someone must have been doing some debugging or testing on the OAS services machine and had copied their own web.config into the folder where the particular service that the order rejection processing was calling was located. The URL settings in this web.config were almost the same except for two which began with http://localhost/. One of them was the web service that OAS calls to create an email. So when the OAS web service called the CW web service, it wasn’t the ‘real’ CW web service on the machine hosting the CW services that was being called, it was a local copy of the CW web service that didn’t successfully create the email. After the web.config file was replaced with a correctly configured version the bug was resolved.

So, after debugging all the way down through about 15 levels of the call stack, across three systems, on multiple machines, over a period of five days, the cause of the bug turned out to be someone’s discarded web.config.

Written by Sea Monkey

November 20, 2008 at 9:00 pm

Posted in Debugging

Tagged with

%~dp0

leave a comment »

Batch file enhancements

Batch files. Can’t live with them, can’t live without them.

Despite all the options available today for ad hoc programming under Windows sometimes a quick and dirty batch file really is the best option.

There are some interesting enhancements that Microsoft has implemented over the years that many people who only occasionally use batch files, like me, are probably unaware of. A couple that I think are particularly useful are the %0 parameter and the syntax that can be used to modify the way that a value is substituted for a batch parameter.

The %0 parameter

You probably remember that any command line parameters passed to a batch file are accessed using %1, %2, %3, etc. What you might not know is that %0 is also implemented and, like arg[0] in C, represents the executable itself. Or, in this case, the batch file.

Having access to the name of the file that’s executing can sometimes be useful but it’s the way that batch parameter substitution can be modified that makes it really useful.

Batch parameter substitution

If a batch parameter is a file path then there is a syntax that allows you to extract exactly the parts of the path that you want:

If %0 is the path (including filename) of the batch file that’s executing then %~d0 is the drive specifier from the path, %~p0, is the path component, %~n0 is the filename, etc.

These modifiers can be combined so that, for example, %~dp0 is the drive and path of the batch file that is currently executing. Now this is useful! For example, it used to be a problem if you wanted your batch file to write to a log file in the same folder as the batch file. If you used a relative path for the log file then it would only work correctly when the current working directory was the directory where the batch file and log file were located. The only way around this was to hard code the full path to the log file. But then if you moved the batch file, the log file would still be written to in the original location if you forgot to update the path to the log file. But you can use %~dp0 in the log file path:

ECHO Starting backup >>%~dp0Backup.log

This syntax and the modifiers are documented on MSDN but you can also get information by typing Help Call or Call /? at a command window prompt. There is some other goodness too:

%* represents all batch parameters

%~0 is the path with quotes removed

%~$PATH:0 searches your PATH for the named file and returns the path of the matching file. For example, if you called a batch file with notepad.exe as a parameter, then %~$PATH:1 will usually return C:\Windows\System32\notepad.exe. The modifier syntax can also be used, so %~dp$PATH:1 will return C:\Windows\System32\.

Written by Sea Monkey

November 19, 2008 at 8:30 pm

NUnit or MSTest ? The answer might be xUnit…

leave a comment »

Now that Visual Studio 2008 includes MSTest in the Professional Edition and above, I thought it’d be worth investigating whether to switch from NUnit to MSTest. There’s not much to choose between the two in terms of functionality and my preference for NUnit comes down to these reasons:

  • There’s currently more third party support for NUnit
  • NUnit is arguably a more mature and stable tool currently
  • NUnit is freely available but MSTest is only available with Professional and above editions
  • NUnit provides better compatibility across Visual Studio versions

That said, MSTest is likely to gain traction (as Microsoft would phrase it) as time goes on and the balance may tilt in its favour.

Just when I thought I’d made my mind up, I discovered xUnit on CodePlex. Definitely worth keeping an eye on and it’s interesting to read why xUnit was written and how it compares with NUnit and MSTest.

Written by Sea Monkey

November 17, 2008 at 8:00 pm

Posted in Development

Tagged with

Why Azure is different

leave a comment »

I was in the US three weeks ago when Microsoft announced Windows Azure. I imagined that it was creating a storm of emails and messages amongst my colleagues but when I got back I found… well, not a single email about it.

I’m surprised at the level of apparent apathy. The reason that I’m convinced that this is significant announcement is the fact that unlike comparable announcements in the past, Microsoft is providing the hardware platform to support Azure. The BBC got a peek at the new data centre in California that Microsoft has built to host Azure services. According to the BBC report, the centre contains 330,000 servers.

Written by Sea Monkey

November 16, 2008 at 8:00 pm

Posted in Deployment

Tagged with

The answer isn't always obvious until you know what it is

leave a comment »

“It is perfectly true, as philosophers say, that life must be understood backwards. But they forget the other proposition, that it must be lived forwards. And if one thinks over that proposition it becomes more and more evident that life can never really understood in time simply because at no particular moment can I find the necessary resting place from which to understand it—backwards.”

Søren Kierkegaard

Once you know the cause of a problem it’s often so obvious that you can’t understand why you didn’t see it immediately. Hindsight is a wonderful thing and here’s a good example…

Having given up on WebDAV to transfer the backups from the desktop PC I use at work to a remote WAN location, I wrote a WCF service to provide FTP-like functionality using HTTP on port 80. I installed and configured the service on a server running at home and then tested the client application on various machines to ensure that I could connect to the service from any environment. This is what I found:

  • Computers on my home LAN – all connected and authenticated successfully.
  • Desktop at work (WAN) – connected and authenticated successfully
  • Laptop connected to the internet via 3G mobile data service – connected but failed authentication.

I investigated the problem with the laptop and found that if I established a VPN connection to my employer’s domain (I was using a domain login) then I could successfully connect to my WCF service and authenticate. If I dropped the VPN connection then the laptop would fail to connect and authenticate.

I should explain at this point that I work for a services company. So I connect to my employer’s network via VPN from time to time but I’m actually based at a client site most of the time using a desktop workstation, that they provide, on their LAN with internet access.

I spent a couple of days researching the problem having convinced myself that authentication was failing because a I was using a domain login on a machine that was disconnected from the domain. Maybe this was some type of anti-spoofing mechanism in Windows ? I asked a few friends and colleagues but no one had come across this problem.

A few days later I logged into the management UI of my home router/modem to tighten up the security on the HTTP/80 forwarding that I’d configured for the WCF service and saw that I’d already implemented what I was about to configure – the forwarding rule was already configured to filter on a WAN address range. I’d entered a range that would cover the block of addresses used at my workplace so that the service could only be called from my desktop at work.

However, because I didn’t know the exact range of addresses, I’d made the range a bit wider than what was probably required. Just wide enough, in fact, to also include my employer’s internet address range too. So, the reason I could connect to the WCF service when I was connected to the VPN was nothing to do with authentication succeeding when I was connected to the domain, it was simply that the VPN was acting as a gateway and this address was not being blocked by my router/modem.

As soon as I saw the WAN address range on the HTTP/80 forwarding rule it was obvious why I hadn’t been able to connect from my laptop. It’s a shame it wasn’t obvious when I first experienced the symptoms.

Written by Sea Monkey

November 14, 2008 at 8:00 pm

Posted in Debugging

Tagged with

How to tell if your MSBuild project is running in Visual Studio

with one comment

Steve asked me a question the other day: “What’s the name of the property that tells you if your MSBuild project is running from the Visual Studio IDE ?”

I’d forgotten its name too and it took me a while to find an example of where I’d used it. The property is called BuildingInsideVisualStudio and is set to ‘true’ when the IDE is running MSBuild.

Why would you want to know if your project is being run from the IDE ? Well, as an example, on the project that Steve and I worked on, the automated builds used the Visual Studio project files to compile the projects in one step and then executed StyleCop (now publicly available at http://code.msdn.microsoft.com/sourceanalysis) in another step. The developers had to manually execute StyleCop on their source code and sometimes forgot to do this before checking code in. Actually, the testers who were writing test automation code were the worst for forgetting to run StyleCop.

So, I modified the pre-build project to check if the BuildingInsideVisualStudio property was set to ‘true’. If it was then StyleCop was executed otherwise it wasn’t. This gave us the behaviour we wanted: StyleCop would run automatically when a project was built in the Visual Studio IDE but would not be executed when the project file was used to compile the project in our automated builds.

Written by Sea Monkey

November 12, 2008 at 7:45 pm

Posted in MSBuild

Tagged with