openvins_linux/docs/dev-profiling.dox

/**


@page dev-profiling System Profiling
@tableofcontents


@section dev-profiling-compute Profiling Processing Time

One way (besides inserting timing statements into the code) is to leverage a profiler such as [valgrind](https://www.valgrind.org/).
This tool allows for recording of the call stack of the system.
To use this with a ROS node, we can do the following (based on [this](http://wiki.ros.org/roslaunch/Tutorials/Roslaunch%20Nodes%20in%20Valgrind%20or%20GDB) guide):

- Edit `roslaunch ov_msckf pgeneva_serial_eth.launch` launch file
- Append `launch-prefix="valgrind --tool=callgrind --callgrind-out-file=/tmp/callgrind.txt"` to your ROS node. This will cause the node to run with valgrind.
- Change the bag length to be only 10 or so seconds (since profiling is slow)

@code{.shell-session}
sudo apt install valgrind
roslaunch ov_msckf pgeneva_serial_eth.launch
@endcode

After running the profiling program we will want to visualize it.
There are some good tools for that, specifically we are using [gprof2dot](https://github.com/jrfonseca/gprof2dot) and [xdot.py](https://github.com/jrfonseca/xdot.py).
First we will post-process it into a xdot graph format and then visualize it for inspection.

@image html example_callgrind.png width=80%

@code{.shell-session}
// install viz programs
apt-get install python3 graphviz
apt-get install gir1.2-gtk-3.0 python3-gi python3-gi-cairo graphviz
pip install gprof2dot xdot
// actually process and then viz call file
gprof2dot --format callgrind --strip /tmp/callgrind.txt --output /tmp/callgrind.xdot
xdot /tmp/callgrind.xdot
@endcode


@section dev-profiling-leaks Memory Leaks

One can leverage a profiler such as [valgrind](https://www.valgrind.org/) to perform memory leak check of the codebase.
Ensure you have installed the `valgrind` package (see above).
We can change the node launch file as follows:

- Edit `roslaunch ov_msckf pgeneva_serial_eth.launch` launch file
- Append `launch-prefix="valgrind --tool=memcheck --leak-check=yes"` to your ROS node. This will cause the node to run with valgrind.
- Change the bag length to be only 10 or so seconds (since profiling is slow)


This [page](https://web.stanford.edu/class/archive/cs/cs107/cs107.1206/resources/valgrind.html) has some nice support material for FAQ.
An example loss is shown below which was found by memcheck.

@code{.text}
==5512== 1,578,860 (24 direct, 1,578,836 indirect) bytes in 1 blocks are definitely lost in loss record 6,585 of 6,589
==5512==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
....
==5512==    by 0x543F868: operator[] (unordered_map.h:973)
==5512==    by 0x543F868: ov_core::TrackKLT::feed_stereo(double, cv::Mat&, cv::Mat&, unsigned long, unsigned long) (TrackKLT.cpp:165)
==5512==    by 0x4EF8C52: ov_msckf::VioManager::feed_measurement_stereo(double, cv::Mat&, cv::Mat&, unsigned long, unsigned long) (VioManager.cpp:245)
==5512==    by 0x1238A9: main (ros_serial_msckf.cpp:247)
@endcode


@section dev-profiling-compiler Compiler Profiling

Here is a small guide on how to perform compiler profiling for building of the codebase.
This should be used to try to minimize compile times which in general hurt developer productivity.
It is recommended to read the following pages which this is a condenced form of:

- https://aras-p.info/blog/2019/01/16/time-trace-timeline-flame-chart-profiler-for-Clang/
- https://aras-p.info/blog/2019/09/28/Clang-Build-Analyzer/

First we need to ensure we have a compiler that can profile the build time.
Clang greater then 9 should work, but we have tested only with 11.
We can get the [latest Clang](https://apt.llvm.org/) by using the follow auto-install script:


@code{.shell-session}
sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
export CC=/usr/bin/clang-11
export CXX=/usr/bin/clang++-11
@endcode

We then need to clone the analyzer repository, which allows for summary generation.

@code{.shell-session}
git clone https://github.com/aras-p/ClangBuildAnalyzer
cd ClangBuildAnalyzer
cmake . && make
@endcode

We can finally build our ROS package and time how long it takes.
Note that we are using [catkin tools](https://catkin-tools.readthedocs.io/en/latest/) to build here.
The prefix *CBA* means to run the command in the ClangBuildAnalyzer repository clone folder.
While the prefix *WS* means run in the root of your ROS workspace.

@code{.shell-session}
(WS) cd ~/workspace/
(WS) catkin clean -y && mkdir build
(CBA) ./ClangBuildAnalyzer --start ~/workspace/build/
(WS) export CC=/usr/bin/clang-11 && export CXX=/usr/bin/clang++-11
(WS) catkin build ov_msckf -DCMAKE_CXX_FLAGS="-ftime-trace"
(CBA) ./ClangBuildAnalyzer --stop ~/workspace/build/ capture_file.bin
(CBA) ./ClangBuildAnalyzer --analyze capture_file.bin > timing_results.txt
@endcode

The `time-trace` flag should generate a bunch of .json files in your build folder.
These can be opened in your chrome browser `chrome://tracing` for viewing.
In general the ClangBuildAnalyzer is more useful for finding what files take long.
An example output of what is generated in the timing_results.txt file is:


@code{.text}
Analyzing build trace from 'capture_file.bin'...
**** Time summary:
Compilation (86 times):
  Parsing (frontend):          313.9 s
  Codegen & opts (backend):    222.9 s

**** Files that took longest to parse (compiler frontend):
 13139 ms: /build//ov_msckf/CMakeFiles/ov_msckf_lib.dir/src/update/UpdaterSLAM.cpp.o
 12843 ms: /build//ov_msckf/CMakeFiles/run_serial_msckf.dir/src/ros_serial_msckf.cpp.o
 ...

**** Functions that took longest to compile:
  1639 ms: main (/src/open_vins/ov_eval/src/error_comparison.cpp)
  1337 ms: ov_core::BsplineSE3::get_acceleration(double, Eigen::Matrix<double, ... (/src/open_vins/ov_core/src/sim/BsplineSE3.cpp)
  1156 ms: ov_eval::ResultSimulation::plot_state(bool, double) (/src/open_vins/ov_eval/src/calc/ResultSimulation.cpp)
  ...

 *** Expensive headers:
 27505 ms: /src/open_vins/ov_core/src/track/TrackBase.h (included 12 times, avg 2292 ms), included via:
   TrackKLT.cpp.o TrackKLT.h  (4372 ms)
   TrackBase.cpp.o  (4297 ms)
   TrackSIM.cpp.o TrackSIM.h  (4252 ms)
   ...
 @endcode

 Some key methods to reduce compile times are as follows:
 - Only include headers that are required for your class
 - Don't include headers in your header files `.h` that are only required in your `.cpp` source files.
 - Consider [forward declarations](https://www.wikiwand.com/en/Forward_declaration) of methods and types
 - Ensure you are using an include guard in your headers


*/