S h o r t S t o r i e s

// Tales from software development

Reading FLAC files

leave a comment »

I’ve just finished writing a small appplication to read the tags in a FLAC file. The numeric values in the various FLAC blocks are generally big endian which required me to implement methods to do this as Windows and .NET use little endian values.

The exception is the VORBISCOMMENT block where the tags are stored. This uses big endian values and I used the various ReadXXXX() methods of the BinaryReader class to process the data in this block. Each tag is stored as a 32 integer indicating the length of the tag string and the tag string itself as a sequence of characters. 

My initial tests showed that my implementation appeared to be correct and bug free but when I ran the application against several hundred FLAC files it failed on one of them. When I debugged the failure I found that a ReadInt32() method appeared to be returning the wrong value.

I noticed that the 32 bit integer value in the file was not word or half-word aligned and my initial suspicion was that the ReadInt32() method was expecting an aligned value. So, I implemented my own method based on the method that I’d written to read a little endian 32 bit integer value. This simply reads four bytes from the file stream and then multiplies and adds the individual bytes as required to give the 32 bit integer value. When I ran a test against the problem file the result was the same – something else was causing the failure.

The exception was occuring on the second tag in the block but the problem appeared to be in the way that my application was handling the first tag.

What was odd was the two lines of code that read the length of a tag and then the tag string itself just didn’t seem to be working as expected on the first tag. When the string was read it appeared to be one byte longer than the string data in the file stream. This didn’t make any sense because the string was being read using the BinaryReader’s ReadChars() method with the length passed in as the number of characters to read. So why wasn’t it reading the correct number of characters ?

I should have seen it sooner… I noticed that the first tag string had a non-ASCII character in it. When I looked carefully at the data for the tag value I realised that this character was represented by a double byte character sequence. Hmmm… So does the length value indicate the number of characters in the string or the number of bytes required to store the string ?

It turned out to be the latter – the tag length value indicates the number of bytes used to store the string not the number of characters in the string. So, when I called ReadChars() using the length, it read one character too many and moved the current position in the filestream one byte further than it should have. When I called ReadInt32() to get the next tag string length instead of reading x’35 00 00 00′ as the length (53 bytes), it read the three bytes of zeros and the first byte of the string: x’00 00 00 54′. This gives a length of 1409286144. Calling the ReadChars() method with this value understandably results in:

An unhandled exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll

There are probably more sophisticated ways of handling this but my quick solution was to override ReadChars() as:

        public override char[] ReadChars(int count)
        {
            byte[] bytes = this.ReadBytes(count);
            return Encoding.UTF8.GetString(bytes).ToCharArray();
        }

This ensures that the number of bytes, not characters, specified in the tag field length descriptor is read and then rendered as a UNICODE character sequence.

Since then I’ve realised that this isn’t a good idea as it confuses exactly what the ReadChars() method should do. The name sounds like it’s reading chars but I’m passing in the number of bytes to read not chars. A better solution would be to implement this as a new method named something like ReadCharBytes().

Advertisements

Written by Sea Monkey

April 9, 2009 at 1:00 pm

Posted in Debugging, Development

Tagged with ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: