157 lines
6.4 KiB
Plaintext
157 lines
6.4 KiB
Plaintext
/**
|
|
|
|
|
|
@page dev-profiling System Profiling
|
|
@tableofcontents
|
|
|
|
|
|
@section dev-profiling-compute Profiling Processing Time
|
|
|
|
One way (besides inserting timing statements into the code) is to leverage a profiler such as [valgrind](https://www.valgrind.org/).
|
|
This tool allows for recording of the call stack of the system.
|
|
To use this with a ROS node, we can do the following (based on [this](http://wiki.ros.org/roslaunch/Tutorials/Roslaunch%20Nodes%20in%20Valgrind%20or%20GDB) guide):
|
|
|
|
- Edit `roslaunch ov_msckf pgeneva_serial_eth.launch` launch file
|
|
- Append `launch-prefix="valgrind --tool=callgrind --callgrind-out-file=/tmp/callgrind.txt"` to your ROS node. This will cause the node to run with valgrind.
|
|
- Change the bag length to be only 10 or so seconds (since profiling is slow)
|
|
|
|
@code{.shell-session}
|
|
sudo apt install valgrind
|
|
roslaunch ov_msckf pgeneva_serial_eth.launch
|
|
@endcode
|
|
|
|
After running the profiling program we will want to visualize it.
|
|
There are some good tools for that, specifically we are using [gprof2dot](https://github.com/jrfonseca/gprof2dot) and [xdot.py](https://github.com/jrfonseca/xdot.py).
|
|
First we will post-process it into a xdot graph format and then visualize it for inspection.
|
|
|
|
@image html example_callgrind.png width=80%
|
|
|
|
@code{.shell-session}
|
|
// install viz programs
|
|
apt-get install python3 graphviz
|
|
apt-get install gir1.2-gtk-3.0 python3-gi python3-gi-cairo graphviz
|
|
pip install gprof2dot xdot
|
|
// actually process and then viz call file
|
|
gprof2dot --format callgrind --strip /tmp/callgrind.txt --output /tmp/callgrind.xdot
|
|
xdot /tmp/callgrind.xdot
|
|
@endcode
|
|
|
|
|
|
|
|
@section dev-profiling-leaks Memory Leaks
|
|
|
|
One can leverage a profiler such as [valgrind](https://www.valgrind.org/) to perform memory leak check of the codebase.
|
|
Ensure you have installed the `valgrind` package (see above).
|
|
We can change the node launch file as follows:
|
|
|
|
- Edit `roslaunch ov_msckf pgeneva_serial_eth.launch` launch file
|
|
- Append `launch-prefix="valgrind --tool=memcheck --leak-check=yes"` to your ROS node. This will cause the node to run with valgrind.
|
|
- Change the bag length to be only 10 or so seconds (since profiling is slow)
|
|
|
|
|
|
This [page](https://web.stanford.edu/class/archive/cs/cs107/cs107.1206/resources/valgrind.html) has some nice support material for FAQ.
|
|
An example loss is shown below which was found by memcheck.
|
|
|
|
@code{.text}
|
|
==5512== 1,578,860 (24 direct, 1,578,836 indirect) bytes in 1 blocks are definitely lost in loss record 6,585 of 6,589
|
|
==5512== at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
|
|
....
|
|
==5512== by 0x543F868: operator[] (unordered_map.h:973)
|
|
==5512== by 0x543F868: ov_core::TrackKLT::feed_stereo(double, cv::Mat&, cv::Mat&, unsigned long, unsigned long) (TrackKLT.cpp:165)
|
|
==5512== by 0x4EF8C52: ov_msckf::VioManager::feed_measurement_stereo(double, cv::Mat&, cv::Mat&, unsigned long, unsigned long) (VioManager.cpp:245)
|
|
==5512== by 0x1238A9: main (ros_serial_msckf.cpp:247)
|
|
@endcode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@section dev-profiling-compiler Compiler Profiling
|
|
|
|
Here is a small guide on how to perform compiler profiling for building of the codebase.
|
|
This should be used to try to minimize compile times which in general hurt developer productivity.
|
|
It is recommended to read the following pages which this is a condenced form of:
|
|
|
|
- https://aras-p.info/blog/2019/01/16/time-trace-timeline-flame-chart-profiler-for-Clang/
|
|
- https://aras-p.info/blog/2019/09/28/Clang-Build-Analyzer/
|
|
|
|
First we need to ensure we have a compiler that can profile the build time.
|
|
Clang greater then 9 should work, but we have tested only with 11.
|
|
We can get the [latest Clang](https://apt.llvm.org/) by using the follow auto-install script:
|
|
|
|
|
|
@code{.shell-session}
|
|
sudo bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)"
|
|
export CC=/usr/bin/clang-11
|
|
export CXX=/usr/bin/clang++-11
|
|
@endcode
|
|
|
|
We then need to clone the analyzer repository, which allows for summary generation.
|
|
|
|
@code{.shell-session}
|
|
git clone https://github.com/aras-p/ClangBuildAnalyzer
|
|
cd ClangBuildAnalyzer
|
|
cmake . && make
|
|
@endcode
|
|
|
|
We can finally build our ROS package and time how long it takes.
|
|
Note that we are using [catkin tools](https://catkin-tools.readthedocs.io/en/latest/) to build here.
|
|
The prefix *CBA* means to run the command in the ClangBuildAnalyzer repository clone folder.
|
|
While the prefix *WS* means run in the root of your ROS workspace.
|
|
|
|
@code{.shell-session}
|
|
(WS) cd ~/workspace/
|
|
(WS) catkin clean -y && mkdir build
|
|
(CBA) ./ClangBuildAnalyzer --start ~/workspace/build/
|
|
(WS) export CC=/usr/bin/clang-11 && export CXX=/usr/bin/clang++-11
|
|
(WS) catkin build ov_msckf -DCMAKE_CXX_FLAGS="-ftime-trace"
|
|
(CBA) ./ClangBuildAnalyzer --stop ~/workspace/build/ capture_file.bin
|
|
(CBA) ./ClangBuildAnalyzer --analyze capture_file.bin > timing_results.txt
|
|
@endcode
|
|
|
|
The `time-trace` flag should generate a bunch of .json files in your build folder.
|
|
These can be opened in your chrome browser `chrome://tracing` for viewing.
|
|
In general the ClangBuildAnalyzer is more useful for finding what files take long.
|
|
An example output of what is generated in the timing_results.txt file is:
|
|
|
|
|
|
@code{.text}
|
|
Analyzing build trace from 'capture_file.bin'...
|
|
**** Time summary:
|
|
Compilation (86 times):
|
|
Parsing (frontend): 313.9 s
|
|
Codegen & opts (backend): 222.9 s
|
|
|
|
**** Files that took longest to parse (compiler frontend):
|
|
13139 ms: /build//ov_msckf/CMakeFiles/ov_msckf_lib.dir/src/update/UpdaterSLAM.cpp.o
|
|
12843 ms: /build//ov_msckf/CMakeFiles/run_serial_msckf.dir/src/ros_serial_msckf.cpp.o
|
|
...
|
|
|
|
**** Functions that took longest to compile:
|
|
1639 ms: main (/src/open_vins/ov_eval/src/error_comparison.cpp)
|
|
1337 ms: ov_core::BsplineSE3::get_acceleration(double, Eigen::Matrix<double, ... (/src/open_vins/ov_core/src/sim/BsplineSE3.cpp)
|
|
1156 ms: ov_eval::ResultSimulation::plot_state(bool, double) (/src/open_vins/ov_eval/src/calc/ResultSimulation.cpp)
|
|
...
|
|
|
|
*** Expensive headers:
|
|
27505 ms: /src/open_vins/ov_core/src/track/TrackBase.h (included 12 times, avg 2292 ms), included via:
|
|
TrackKLT.cpp.o TrackKLT.h (4372 ms)
|
|
TrackBase.cpp.o (4297 ms)
|
|
TrackSIM.cpp.o TrackSIM.h (4252 ms)
|
|
...
|
|
@endcode
|
|
|
|
Some key methods to reduce compile times are as follows:
|
|
- Only include headers that are required for your class
|
|
- Don't include headers in your header files `.h` that are only required in your `.cpp` source files.
|
|
- Consider [forward declarations](https://www.wikiwand.com/en/Forward_declaration) of methods and types
|
|
- Ensure you are using an include guard in your headers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*/ |