4.1 KiB

Raw Blame History

Benchmark

Build

cd small_gicp
mkdir build && cd build

cmake .. -DBUILD_WITH_TBB=ON -DBUILD_WITH_PCL=ON -DBUILD_BENCHMARKS=ON

# [optional] Build with Iridescence (visualization)
git clone https://github.com/koide3/iridescence --recursive
mkdir iridescence/build && cd iridescence/build
cmake .. && make -j 
sudo make install

cmake .. -DBUILD_WITH_IRIDESCENCE=ON

# [optional] Build with fast_gicp
export FAST_GICP_INCLUDE_DIR=/your/fast_gicp/include
cmake .. -DBUILD_WITH_FAST_GICP=ON

# Build
make -j

# Test
# Show options
./odometry_benchmark

# USAGE: odometry_benchmark <dataset_path> <output_path> [options]
# OPTIONS:
#   --visualize
#   --num_threads <value> (default: 4)
#   --num_neighbors <value> (default: 20)
#   --downsampling_resolution <value> (default: 0.25)
#   --voxel_resolution <value> (default: 2.0)
#   --engine <pcl|small_gicp|small_gicp_omp|small_vgicp_omp|small_gicp_tbb|small_vgicp_tbb|small_vgicp_model_tbb|small_gicp_tbb_flow> (default: small_gicp)

# Run odometry benchmark
./odometry_benchmark /your/kitti/dataset/velodyne /tmp/traj_lidar.txt --visualize --num_threads 16 --engine small_gicp_tbb

Results

All benchmarks were conducted on the KITTI 00 sequence.

Downsampling

cd small_gicp/scripts
./run_downsampling_benchmark.sh
python3 plot_downsampling.py

Single-threaded small_gicp::voxelgrid_sampling is about 1.3x faster than pcl::VoxelGrid.
Multi-threaded small_gicp::voxelgrid_sampling_tbb (6 threads) is about 3.2x faster than pcl::VoxelGrid.
small_gicp::voxelgrid_sampling gives accurate downsampling results (almost identical to those of pcl::VoxelGrid) while pcl::ApproximateVoxelGrid yields spurious points (up to 2x points).
small_gicp::voxelgrid_sampling can process a larger point cloud with a fine voxel resolution compared to pcl::VoxelGrid (for a point cloud of 1000m width, minimum voxel resolution can be 0.5 mm).

While TBB shows slightly better scalability, both the parallelism backends do not obtain a speed gain for the cases with threads more than 16.

KdTree construction

cd small_gicp/scripts
./run_kdtree_benchmark.sh
python3 plot_kdtree.py

Multi-threaded implementation (TBB and OMP) can be up to 4x faster than the single-threaded one (All the implementations are based on nanoflann).
The processing speed gets faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?).
This benchmark only compares the construction time (query time is not included).

Odometry estimation

cd small_gicp/scripts
./run_odometry_benchmark.sh
python3 plot_odometry.py

Single-thread small_gicp::GICP is about 2.4x and 1.9x faster than pcl::GICP and fast_gicp::GICP, respectively.
small_gicp::(GICP|VGICP) shows a better multi-thread scalability compared to fast_gicp::(GICP|VGICP).
small_gicp::GICP parallelized with TBB flow graph shows an excellent scalability to many-threads situations (~128 threads) but with latency degradation.

SIMD intrinsics (-march=native) (We recommend keeping this feature disabled unless you are 100% sure what it is)

BUILD_WITH_MARCH_NATIVE=ON enables platform-specific intrinsics and squeezing the performance (1.1x speedup for free).
However, you must ensure that all involved libraries are built with -march=native, otherwise the program will crash.
Generally, it is difficult to properly set -march=native for all libraries, and we recommend keeping BUILD_WITH_MARCH_NATIVE=OFF.

Results:

BUILD_WITH_MARCH_NATIVE=OFF : Eigen::SimdInstructionSetsInUse()=SSE, SSE2
BUILD_WITH_MARCH_NATIVE=ON : Eigen::SimdInstructionSetsInUse()=AVX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2

4.1 KiB Raw Blame History

Benchmark

Build

Results

Downsampling

KdTree construction

Odometry estimation

4.1 KiB

Raw Blame History