S h o r t S t o r i e s

// Tales from software development

MySQL covering indexes for performance

leave a comment »

Vitality’s application has a function to create clinical letters. While most customers print a physical copy of the letter and post it, increasingly we are asked to implement an interface to a document management system that sends the letter electronically to the recipient. Typically, this is the patient’s GP.

The interface executes a SELECT query on the correspondence table in the Vitality database similar to this:

    correspondence AS c
    INNER JOIN codesets AS cs ON cs.CodeId = c.Type
    c.Approved = true
    AND c.Processed = NULL
    AND cs.CodeSetNo = 20;

Note that the columns referenced on the correspondence table are FileName (of the Microsoft Word document), Type (the type of letter), Approved (indicating the letter is complete and may be sent), and Processed (indicating whether the interface has processed the letter). Codeset 20 contains a list of the letter templates and their descriptions.

This SELECT query has executed in less than a second for the past five years that the interface has been installed at one of our largest customers where approximately 60,000 letters are created each year.

However, a few weeks ago it was noticed that the interface was intermittently failing to send letters. Investigation showed that the SELECT query was sometimes taking in excess of 90 seconds to execute. The command was timing out and throwing an exception that caused the interface to suspend processing. The puzzle was, why was a query that typically executed in less than a second now taking over 90 seconds ?

I’m not sure that we’re ever going to fully understand the change in behaviour but, in part, it appears to be that the MySQL query planner changed the execution plan as the size of the correspondence table grew to over 300,000 rows. After rebuilding the indexes on the table and repairing it the execution time was consistently around 10 seconds.

We realised that whilst we had always thought that the execution time was less than a second and the interface’s processing logs usually showed this, this was likely to be because much of the time MySQL was using a cached query result.

Because the correspondence table contains a summary of the letter content, each row is quite large and a table scan is expensive. With over 300,000 rows in the table, a table scan is very expensive.

So we started to look at how the query could be optimised. Initially we thought that a suitable index would help the query engine to efficiently access the table data but after a number of failed experiments we realised that whilst this was better than a table scan the performance was still quite poor.

The last resort was to try creating a covering index. This is an index in which all the referenced columns are contained in the index so that the query engine doesn’t need to access the table data at all. Our first attempt didn’t show any improvement until we realised that whilst we had included the FileName, Approved, and Processed columns referenced in the SELECT column list and WHERE clause, we had overlooked the reference to the Type column in the INNER JOIN condition. As soon as this column was added to the index there was a massive improvement and the query now consistently executes in 0.3-06 seconds with caching disabled.

Using EXPLAIN to display the query plan shows that the correspondence table is now not being access at all. Instead the covering index is providing all the correspondence column data referenced in the query.


Written by Sea Monkey

April 13, 2016 at 8:00 pm

Posted in MySQL

Tagged with

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: