S h o r t S t o r i e s

// Tales from software development

The value of a control file when writing data files

leave a comment »

A decade ago Vitality’s data interfaces were all file based. Patient Administration Systems (PAS) and Laboratory Information Management Systems (LIMS) would sent patient and lab results data as files using FTP. These data files typically had an associated control file. Sometimes the control file contained a checksum but often the file itself was simply an indicator that the data file had been successfully written to disk.

For example, the presence of the file with the .ok extension indicates that the associated data file was successfully written to disk:

hl7-0001237562.dat
hl7-0001237562.dat.ok

If the control file is not present then it is assumed that the data file is incomplete and should not be processed. This may happen because the write operation failed or because it is still active, i.e. the lack of a control file ensures that the data file is not prematurely opened for processing when the process writing to it is still active.

When I joined the company and began developing new PAS and LIMS interfaces typically based on TCP/IP data transmission such as HL7 MLLP, I decoupled the processing stages (receive message, process message, update database) by writing the intermediate data to disk. All our customers’ servers were running Microsoft Windows Server and writing the data to the NTFS file system was a simple, reliable, and robust means to store data.

A few years ago I noticed that one of my colleagues had written application that relied on file locking to ensure that the downstream processing did not open the file before the write process had completed. I discussed this with him and pointed out that all of our other interfaces were consistent in their use of a control file but he was adamant that his use of file locking meant that there was no need for a control file.

Something about this bothered me but, at the time, I couldn’t work out what it was.

Late last year I was diagnosing an error on one of our customer sites when I realised that there were some data files from several months earlier that remained unprocessed. These files were missing their associated control files and this was causing the processing task to ignore them. I checked the Windows Event Log for the dates when the files had been created and could see that free disk free had reached zero. Examining the data files showed that most of them appeared to be complete although it was difficult to be sure but several were definitely incomplete or corrupt. So, the absence of the control files was because the write of the data files had not completed successfully.

And then I realised what had been bothering me about my colleague’s reliance on file locking: it only addresses the problem of the file being read before it is completely written and does not deal with the problem that the write process may fail and cause an incomplete or corrupt file to be written to disk.

Control files may seem ‘old fashioned’ but they are a reliable and robust mechanism for dealing with data files.

Written by Sea Monkey

January 23, 2017 at 8:00 pm

Posted in Uncategorized

Tagged with

Leave a comment