S h o r t S t o r i e s

// Tales from software development

Don’t assume anything…

leave a comment »

Over the past year I’ve written a series of interfaces that integrate external data sources with the medical data application that is my company’s main product line. Along the way I’ve developed several libraries of reusable code for diagnostics, logging, exception handling, etc.

Last week one of the first interfaces I’d written started misbehaving. It’s a Windows Service that polls several medical data stores every few minutes to check if medical test results are available for patients defined in our application’s database. What was odd was that it was experiencing an IOException while writing to its log file. The service has been running 24×7 for at least six months without any problems but within the last week was now failing every few days with this problem.

Text editors like Notepad don’t usually lock files or open them exclusively so I was fairly sure that the problem wasn’t simply due to someone viewing the log file while it was being written to. Also, the exception message indicated that the file, or a portion of it, was locked – not that it was opened exclusively by another application. So, another application was accessing the log file and locking it in a way that prevented the service writing to it.

I should have guessed… McAfee AntiVirus!

The customer is using the Enterprise Edition of McAfee and had changed a policy that resulted in the log file being scanned although previously it had been excluded.

The service creates a new log file each day and a day’s worth of activity typically results in a log file of a few hundred Kb. However, periodically a new external data store is added to the interface and this requires some initialisation processes to run that result in the log file growing to a few Mb. The combination of a larger log file and more frequent write requests makes a clash, between the service attempting to write to the log file and McAfee locking and scanning the file, more likely.

When I wrote the code it seemed so unlikely that the service could fail just by attempting to write to its own log file that I hadn’t really coded for this possibility. There was some exception handling code but it did little beyond log the exception to the Windows Event Log (but not its log file, of course!)

The solution isn’t very elegant but it is practical. The single line of code that called the log file StreamWriter’s WriteLine method has been replaced by about 15 lines of code that catches the IOException and retries the WriteLine every few seconds for up to two minutes. After that it gives up as continued failure is likely to indicate a different problem such as lack of disk space.

One oddity about the solution is that if the retries eventually succeed then the log file shows multiple copies of the message. Presumably, each time WriteLine() is called the message text is copied to the write buffer and stays there even when the I/O fails. Each retry adds another copy of the message to the buffer so that when the I/O eventually succeeds, all the copies of the message in the write buffer are written. One possible solution would be to repeatedly try calling Flush() instead of retrying the WriteLine() but there was a question mark in my mind as to whether the entire message was in the buffer and whether the buffer had been corrupted. I took the safe option and chose to rewrite the entire message.

I’m still slightly surprised by the fact that McAfee is blocking the service from writing to its own log file but it’s a good example of why you should be very careful about any, and all, assumptions you make when writing code.


Written by Sea Monkey

March 5, 2010 at 9:00 pm

Posted in Development

Tagged with

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: