Some time ago I wrote a post about how we improved performance of a job a lot by parallelize the process and introduce a covering index. With the covering index and 10 parallel processes we cut down execution time from +10 hours to a few minutes. The job uses map and reduce design pattern, where 8000 top nodes with some 400000 lower level nodes are assembled into three structures.
Last weekend I decided to see how far I could take the parallelizing within the present infrastructure. How much could we improve the performance by increase the number of threads? Measure parallel processes in a live production environment is quite complex there are many variables to take into account like system load, caches etc. I decided to run my tests during low traffic hours many times with warm caches and buffers, then take the average from many test runs. The figures were consistent with only small variations so I think result is relevant. As you see in the graph I started with 5 parallel processes and ended with 100 processes. The job scales reasonable well down to about 15 processes, while more than 80 process do not improve the performance much.
Running this job with 10 parallel processes gives a total execution time well below 3 minutes which is much lower than what is required. There is no need to add more parallel processes. But it’s nice to know we can go further.