Friday, 3 August 2012

Finding the bottleneck using KCachegrind


I spent the afternoon trying to find a bottle neck within the Scene File Generation for the Render Module. It's not immediately obvious, what was causing the slowdown, which with complicated geometry was making me wait a precious 5 extra seconds. I wanted to write this because if couldn't quickly find on google how to use it to refresh my memory.

Here's a classic case of how Valgrind & KCachegrind came to help. These are virtual machines that profile your code and only need debug information to be compiled. It is both very useful for developers and also importantly testers to find memory leaks, segfaults, 100% cpu loading caused by infinite loops and so forth.

First download both Valgrind and the KDE Graphical Frontend KCachegrind.

Ensure the program your using has debug information and you're not using any compiler optimisation flags (-O3)

Now you can run the Valgrind profiler from terminal in the same directory as the program executable: issueing a command like this.


valgrind --tool=callgrind -v programName

The program will run, but it will be around 20-50x slower than normal during profiling.

-----
For developers if you know the program area that is creating problems, you can dump the output before and after a class method/function e.g.

valgrind --tool=callgrind -v --dump-before=Renderer::generateScene FreeCAD

Ideally the function should be called two times. This will produce 3  seperate files,
  1. Output before call
  2. Before second call
  3. Remaining program till termination
This can make it easier later to diagnose the problem using KCachegrind
-----
Using KCachegrind

Start KCachegrind and then open the log file generated from Valgrind. My first intuition was to look at the Renderer::generateScene where I knew FreeCAD was being slow.

Clicking on this in the left window, will update the right window area showing the call tree.

Scrolling down we can find the culprit (GeomAPI_ProjectPointOnSurf) that is causing the slow down:



This can be seen verified using the source browser:


Looking at the source code and simply commenting this out, I find out that this exert is not actually required. With its removal, generating the scene file was happening near instantly!

I hope this short guide may help some developers and testers!