Real-time multi-camera instance segmentation for AI Racing Tech, implemented as a ROS 2 C++ node that runs a fine-tuned YOLOv8 segmentation model through TensorRT. Designed for ROS 2 Iron on Ubuntu 22.04 with NVIDIA GPUs.
The node consumes one image topic per camera, batches a fixed-size group of frames per inference tick (matching the model's batch size), and publishes detections, optional overlay images, and an optional one-channel mask image suitable for LiDAR projection.
This package assumes you have already fine-tuned a YOLOv8 segmentation model and exported it to ONNX. The companion training repos are YOLOv8-Fine-Tune and SAM2_YOLOv8_autolabeler_finetune.
For each camera topic /<camera> in CAMERA_TOPICS, the node publishes:
Per-frame detection bundle. All arrays are parallel — the i-th entry of labels, probabilities, class_names, and bounding_boxes describe the same detection.
header— copied from the source camera image (timestamp + frame_id).indexes— 1-based instance ids. 0 is reserved for background inseg_mask_one_channel.labels— model class ids.probabilities— confidence in[0, 1].class_names— human-readable names, indexed byCLASS_NAMES.seg_mask_one_channel—sensor_msgs/Image(mono8). All instance masks combined into a single channel, with pixel value = instance index. Sameheader/height/width/stepas the source camera image.bounding_boxes—yolov8_interfaces/Yolov8BBox[], each containing top-leftPoint2D,rect_width,rect_height.
Published when visualize_masks=true. The source camera image with bounding boxes, labels, and translucent segmentation masks rendered on top — useful for RViz2 / Foxglove.
Published when both enable_one_channel_mask=true and visualize_one_channel_mask=true. The same data as detections.seg_mask_one_channel, but normalized to [0, 255] and converted to RGB so it can be displayed. The normalization changes pixel colors when the number of detections changes — this image is only for visualization, never for downstream consumers.
Tested on Ubuntu 22.04 with ROS 2 Iron. Other distributions are unsupported.
Install from the Debian packages: https://docs.ros.org/en/iron/Installation/Ubuntu-Install-Debians.html
Do not source a conda environment when building or running this package — ROS 2 does not officially support conda and it tends to break rclpy.
Install via the runfile (not the deb): https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=runfile_local
Confirm nvidia-smi runs cleanly afterwards. If it errors out, the deb installer likely replaced your low-level NVIDIA drivers — reinstall the appropriate proprietary driver from Software & Updates → Additional Drivers before rebooting.
Download from https://developer.nvidia.com/cudnn and install per https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html.
Install either the DEB package or the TAR archive — not both. By default this package looks for TensorRT at /usr/src/tensorrt (the DEB layout). To use a TAR install, set TensorRT_DIR or the TENSORRT_DIR environment variable to the TAR's root directory:
export TENSORRT_DIR="$HOME/libs/TensorRT-10.x.x.x/"- DEB (recommended): TensorRT 10 GA from https://developer.nvidia.com/tensorrt/download/10x (install full C++ and Python runtimes). Headers live in
/usr/src/tensorrt, shared libs in/usr/lib/x86_64-linux-gnu/. - TAR: TensorRT 8.6 or 10 GA. Unpack into
~/libs/and follow https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-tar. Remember to add the TensorRTlib/toLD_LIBRARY_PATHin your shell rc.
We need OpenCV built with CUDA support. This will overwrite any existing system OpenCV. From the workspace root:
make install-opencv-cuda OPENCV_VERSION=4.8.0 CUDA_BIN_ARCH=<your-compute-capability>Find your GPU's compute capability at https://developer.nvidia.com/cuda-gpus. The build takes a while.
To revert (pass the version that was originally installed):
make uninstall-opencv-cuda OPENCV_VERSION=4.8.0Create a ROS 2 workspace and copy (or symlink) the yolov8 and yolov8_interfaces packages into its src/ directory. Then from the workspace root:
source /opt/ros/iron/setup.bash
cp example.env yolov8.env
# Edit yolov8.env for your model path, camera topics, etc.
make build
source install/setup.bashmake build-debug builds with debug symbols for use with GDB / the VS Code ROS extension.
ros2 launch yolov8 yolov8.launch.pyThe launch file reads parameters from install/yolov8/share/yolov8/yolov8.env, which is installed from the workspace-root yolov8.env. You must rebuild after editing yolov8.env so the share copy is refreshed.
The first run on a given GPU will build the TensorRT engine and cache it in install/yolov8/share/yolov8/models/engines/. Subsequent runs load the cached engine and start in seconds. Use make copy-engine to preserve the engine across make clean.
Two tmuxp sessions are provided in tmuxp_configs/:
rviz_yolov8.yaml— node +rosbag play+ RViz2 + topic echo.foxglove_yolov8.yaml— node +rosbag play+ Foxglove Bridge + topic echo.
Edit the rosbag play path before running.
Install tmuxp with sudo apt install tmuxp.
Use debug_yolov8.launch.py with the ROS 2 VSCode extension. It is a thin wrapper around yolov8.launch.py that passes debug:=true, which drops the nice -n prefix (GDB cannot attach across nice). Equivalent to running ros2 launch yolov8 yolov8.launch.py debug:=true directly.
Check the most recent ROS log first:
cat ~/.ros/log/latest.logCommon errors:
- Exit code
-9during engine build/load — the process was OOM-killed. Close other apps; monitor withhtop+nvidia-smi. CUDA initialization failure with error: 46— another process is holding the GPU. Sometimes a reboot is the only fix.No module named rclpy— you are in a conda environment. Deactivate it.libnvinfer.so.*: cannot open shared object file— TAR install of TensorRT not onLD_LIBRARY_PATH. Re-source after exporting it.cuda_runtime_api.h could not determine number of CUDA-capable devices— usually appears after long uptime; reboot.
A few deliberate choices that may look odd at first glance:
src/yolov8/libs/tensorrt-cpp-api/is vendored MIT code from YOLOv8-TensorRT-CPP. It is kept close to upstream so future syncs are tractable; we only apply bug fixes and small cleanups, not structural refactors. It logs tostd::cout/std::cerrbecause it is designed to be usable outside ROS.-Ofast(not-O3) in the CMakeLists — this is a real-time inference workload, the fast-math relaxations are acceptable, and the measured speedup over-O3is non-trivial. Debug builds drop the flag.BATCH_SIZEis fixed at engine-build time and must equallen(CAMERA_TOPICS). The TensorRT engine is specialized to exactly that batch size for performance; varying it would require an engine rebuild.- Camera frame buffering: the node accumulates the next frame from each camera and flushes either when all cameras have published (most of the time) or when
CAMERA_BUFFER_HZelapses (graceful degradation when a camera stalls). Cameras that did not publish in a given window contribute a zero placeholder image to the batch but do not produce detections. - No unit tests are shipped. The hot path is GPU-coupled (TensorRT + CUDA + OpenCV-with-CUDA), and meaningful tests require the target hardware; we lean on integration testing via rosbags instead.
This project incorporates code from YOLOv8-TensorRT-CPP (MIT).