2016-02-27

Improving performance with covering index.

To appreciate this post you must read the post  when fast is not enough first.


I was asked some weeks ago, ‘can you explain your PHP structure assembling program?’.
Why? I asked.
‘It seems we have a problem, it takes about 10 hours to assemble the spare parts structure for the CPD factory!’


Ten hours is definitely too much, in this spare BOM tree there are some 8000 structures or bills of meterial and some +400000 tree nodes. I expected this to take some 15 minutes at the most.
Something must be awful wrong, so I decided to take a look myself first. Sure enough it was the tree assembling PHP program that took some 10 hours. It was setup to distribute the work over 7 parallel threads like this:
This was obviously just cut and pasted from a job with a much larger BOM where the chunks assigned to threads was optimized for that special BOM. As you can see from the forevery iterator, the first chunk takes the first 11000 bills or top nodes. In this example we only have 8000 BOMs to assemble, which means all go into the first chunk effectively single-threading the assembly job, in this case it would probably be better to split the BOMs evenly over 7 chunks like this:
  
Here we do not try to optimize each chunk, it will be good enough to split the BOMs evenly (on 7 chunks) and parallel process them all. Said and done, now we were down to about 1 hour 20 minutes, which was expected since we now run seven parallel threads, still this was not good enough, since I know the PHP program iterates the same SQL query for every node I took a look into PhpMyAdmin to see what’s going on:
  
We run this query over and over again, and a close examine of the table we found an index missing. In this case you really want a covering index as this will drastically improve performance.  This is easily fixed:


We just add an alter statement in the job creating the table.
(A covering index is an index that satisfies a query without accessing the table.)


And now we are down to about three minutes for the tree assembly job.

The moral of this post is: indexes are good, but optimizing too much is not necessarily good.

No comments:

Post a Comment