odometry benchmark (native)

This commit is contained in:
k.koide 2024-04-02 13:05:03 +09:00
parent 734bc905c8
commit 0e6ad9a157
7 changed files with 195 additions and 3 deletions

104
BENCHMARK.md Normal file
View File

@ -0,0 +1,104 @@
# Benchmark
## Build
```bash
cd small_gicp
mkdir build && cd build
cmake .. -DBUILD_WITH_TBB=ON -DBUILD_WITH_PCL=ON -DBUILD_BENCHMARKS=ON
# [optional] Build with Iridescence (visualization)
git clone https://github.com/koide3/iridescence --recursive
mkdir iridescence/build && cd iridescence/build
cmake .. && make -j
sudo make install
cmake .. -DBUILD_WITH_IRIDESCENCE=ON
# [optional] Build with fast_gicp
export FAST_GICP_INCLUDE_DIR=/your/fast_gicp/include
cmake .. -DBUILD_WITH_FAST_GICP=ON
# Build
make -j
# Test
# Show options
./odometry_benchmark
# USAGE: odometry_benchmark <dataset_path> <output_path> [options]
# OPTIONS:
# --visualize
# --num_threads <value> (default: 4)
# --num_neighbors <value> (default: 20)
# --downsampling_resolution <value> (default: 0.25)
# --voxel_resolution <value> (default: 2.0)
# --engine <pcl|small_gicp|small_gicp_omp|small_vgicp_omp|small_gicp_tbb|small_vgicp_tbb|small_vgicp_model_tbb|small_gicp_tbb_flow> (default: small_gicp)
# Run odometry benchmark
./odometry_benchmark /your/kitti/dataset/velodyne /tmp/traj_lidar.txt --visualize --num_threads 16 --engine small_gicp_tbb
```
## Results
All benchmarks were conducted on the KITTI 00 sequence.
### Downsampling
```bash
cd small_gicp/scripts
./run_downsampling_benchmark.sh
python3 plot_downsampling.py
```
- Single-threaded `small_gicp::voxelgrid_sampling` is about **1.3x faster** than `pcl::VoxelGrid`.
- Multi-threaded `small_gicp::voxelgrid_sampling_tbb` (6 threads) is about **3.2x faster** than `pcl::VoxelGrid`.
- `small_gicp::voxelgrid_sampling` gives accurate downsampling results (almost identical to those of `pcl::VoxelGrid`) while `pcl::ApproximateVoxelGrid` yields spurious points (up to 2x points).
- `small_gicp::voxelgrid_sampling` can process a larger point cloud with a fine voxel resolution compared to `pcl::VoxelGrid` (for a point cloud of 1000m width, minimum voxel resolution can be 0.5 mm).
![downsampling_comp](docs/assets/downsampling_comp.png)
- While TBB shows slightly better scalability, both the parallelism backends do not obtain a speed gain for the cases with threads more than 16.
![downsampling_threads](docs/assets/downsampling_threads.png)
### KdTree construction
```bash
cd small_gicp/scripts
./run_kdtree_benchmark.sh
python3 plot_kdtree.py
```
- Multi-threaded implementation (TBB and OMP) can be up to **4x faster** than the single-threaded one (All the implementations are based on nanoflann).
- Basically the processing speed get faster as the number of threads increases, but the speed gain is not monotonic sometimes (because of the scheduling algorithm or some CPU(AMD 5995WX)-specific issues?).
- This benchmark only compares the construction time (query time is not included).
![kdtree_time](docs/assets/kdtree_time.png)
### Odometry estimation
```bash
cd small_gicp/scripts
./run_odometry_benchmark.sh
python3 plot_odometry.py
```
- Single-thread `small_gicp::GICP` is about **2.4x and 1.9x faster** than `pcl::GICP` and `fast_gicp::GICP`, respectively.
- `small_gicp::(GICP|VGICP)` shows a better multi-thread scalability compared to `fast_gicp::(GICP|VGICP)`.
- `small_gicp::GICP` parallelized with [TBB flow graph](src/odometry_benchmark_small_gicp_tbb_flow.cpp) shows an excellent scalablity to many-threads situations (**~128 threads**) but with latency degradation.
![odometry_time](docs/assets/odometry_time.png)
**SIMD intrinsics (-march=native)** (We recommend keeping this feature disabled unless you are 100% sure what it is)
- `BUILD_WITH_MARCH_NATIVE=ON` enables platform-specific intrinsics and squeezing the performance (**1.1x speedup for free**).
- However, you must ensure that all involved libraries are built with `-march=native`, otherwise the program will crash.
- Generally, it is difficult to properly set `-march=native` for all libraries, and we recommend keeping `BUILD_WITH_MARCH_NATIVE=OFF`.
Results:
- `BUILD_WITH_MARCH_NATIVE=OFF` : `Eigen::SimdInstructionSetsInUse()=SSE, SSE2`
- `BUILD_WITH_MARCH_NATIVE=ON` : `Eigen::SimdInstructionSetsInUse()=AVX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2`
![odometry_native](docs/assets/odometry_native.png)

View File

@ -335,7 +335,7 @@ open3d.visualization.draw_geometries([target_o3d, source_o3d])
</details>
## Benchmark
## [Benchmark](BENCHMARK.md)
### Downsampling

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

View File

@ -21,16 +21,17 @@ def parse_result(filename):
num_points.append(int(matched.group(1)))
continue
matched = re.match(r'(\S+)_times=(\S+) \+\- (\S+)', line)
matched = re.match(r'(\S+)_times=(\S+) \+\- (\S+) \(median=(\S+)\)', line)
if matched:
method = matched.group(1)
mean = float(matched.group(2))
std = float(matched.group(3))
med = float(matched.group(4))
if method not in results:
results[method] = []
results[method].append(mean)
results[method].append(med)
continue
return (num_threads, num_points, results)

View File

@ -0,0 +1,66 @@
#!/usr/bin/python3
import os
import re
import numpy
from collections import namedtuple
from matplotlib import pyplot
Result = namedtuple('Result', ['reg_mean', 'reg_std', 'tp_mean', 'tp_std'])
def parse_result(filename):
reg_mean = None
reg_std = None
throughput_mean = None
throughput_std = None
with open(filename, 'r') as f:
for line in f.readlines():
found = re.findall(r'([^=]+)\s*\+\-\s*(\S+)', line)
if not found or len(found) != 2:
found = re.findall(r'total_throughput=(\S+)', line)
if found:
throughput_mean = float(found[0])
continue
reg_mean = float(found[0][0].strip())
reg_std = float(found[0][1].strip())
throughput_mean = float(found[1][0].strip())
throughput_std = float(found[1][1].strip())
return Result(reg_mean, reg_std, throughput_mean, throughput_std)
def main():
results_path = os.path.dirname(__file__) + '/results'
results = {}
for filename in os.listdir(results_path):
found = re.findall(r'odometry_benchmark_(\S+)_(native|nonnative)_(\d+).txt', filename)
if not found:
continue
rets = parse_result(results_path + '/' + filename)
results['{}_{}_{}'.format(found[0][0], found[0][1], found[0][2])] = rets
fig, axes = pyplot.subplots(1, 1, figsize=(12, 2))
axes = [axes]
num_threads = [1, 2, 4, 8, 16, 32, 64, 128]
print(results['small_gicp_native_1'], results['small_gicp_tbb_native_1'])
print(results['small_gicp_nonnative_1'], results['small_gicp_tbb_nonnative_1'])
native = [results['small_gicp_tbb_native_{}'.format(N)].reg_mean for N in num_threads]
nonnative = [results['small_gicp_tbb_nonnative_{}'.format(N)].reg_mean for N in num_threads]
axes[0].plot(num_threads, native, label='small_gicp_tbb (-march=native)', marker='o')
axes[0].plot(num_threads, nonnative, label='small_gicp_tbb (nonnative)', marker='o')
axes[0].set_xlabel('Number of threads [1, 2, ..., 128]')
axes[0].set_ylabel('Time [msec/scan]')
axes[0].set_xscale('log')
axes[0].grid()
axes[0].legend()
pyplot.show()
if __name__ == "__main__":
main()

View File

@ -0,0 +1,20 @@
#!/bin/bash
dataset_path=$1
exe_path=../build/odometry_benchmark
mkdir results
engines=(small_gicp small_gicp_tbb)
for engine in ${engines[@]}; do
N=1
$exe_path $dataset_path $(printf "results/traj_lidar_%s_nonnative_%d.txt" $engine $N) --num_threads $N --engine $engine | tee $(printf "results/odometry_benchmark_%s_nonnative_%d.txt" $engine $N)
done
engines=(small_gicp_tbb)
num_threads=(128 96 64 32 16 8 4 2)
for N in ${num_threads[@]}; do
for engine in ${engines[@]}; do
$exe_path $dataset_path $(printf "results/traj_lidar_%s_nonnative_%d.txt" $engine $N) --num_threads $N --engine $engine | tee $(printf "results/odometry_benchmark_%s_nonnative_%d.txt" $engine $N)
done
done

View File

@ -54,6 +54,7 @@ int main(int argc, char** argv) {
}
}
std::cout << "SIMD in use=" << Eigen::SimdInstructionSetsInUse() << std::endl;
std::cout << "dataset_path=" << dataset_path << std::endl;
std::cout << "output_path=" << output_path << std::endl;
std::cout << "registration_engine=" << engine << std::endl;