Skip to content

Latest commit

 

History

History
544 lines (428 loc) · 17.1 KB

File metadata and controls

544 lines (428 loc) · 17.1 KB

cpp_pubsub

A comprehensive ROS 2 example package demonstrating C++ publisher and subscriber nodes, including simulated sensor data publishers, sensor fusion capabilities, and CUDA-accelerated Extended Kalman Filter (EKF) implementation.

🎯 Overview

This package provides practical examples of ROS 2 communication patterns using C++, featuring:

  • Basic Publisher/Subscriber: Simple text message communication
  • Sensor Simulation: Realistic GPS and IMU data publishers with configurable noise models
  • Multi-Subscriber Pattern: Single node subscribing to multiple topics
  • Advanced Sensor Fusion: Multiple fusion algorithms including simple averaging and EKF
  • CUDA-Accelerated Processing: GPU-offloaded Kalman filter updates for high-performance sensor fusion
  • Performance Profiling: Integrated profiling tools and performance monitoring
  • Clean Architecture: Well-structured C++ code following ROS 2 best practices

📦 Package Contents

Core Nodes

Node Type Description Performance Notes
talker Publisher Publishes "Hello, world!" messages to /topic Basic demo
listener Subscriber Receives and displays messages from /topic Basic demo
gps_publisher Publisher Simulates GPS sensor data with realistic noise on /gps_topic 10Hz default rate
imu_publisher Publisher Simulates IMU sensor data with gyro/accel noise on /imu_topic 100Hz default rate
multi_subscriber Subscriber Listens to both GPS and IMU topics simultaneously Multi-threaded
fusion_node Fusion Combines GPS and IMU data with simple averaging logic CPU-based
ekf_fusion_node EKF Fusion Advanced sensor fusion using Extended Kalman Filter CPU-based EKF
cuda_ekf_node CUDA EKF GPU-accelerated EKF with CUDA kernel offloading GPU-accelerated

Topic Architecture

┌─────────────────┐    /topic    ┌─────────────────┐
│     talker      │──────────────▶│    listener     │
└─────────────────┘              └─────────────────┘
┌─────────────────┐  /gps_topic  ┌─────────────────┐
│  gps_publisher  │──────────────▶│ multi_subscriber│
└─────────────────┘              └─────────────────┘
┌─────────────────┐  /imu_topic  ┌─────────────────┐
│  imu_publisher  │──────────────▶│   fusion_node   │
└─────────────────┘              └─────────────────┘
                                          ▲
┌─────────────────┐  /gps_topic           │
│  gps_publisher  │───────────────────────┘
└─────────────────┘
┌─────────────────┐  /gps_topic  ┌─────────────────┐   /filtered_pose
│  gps_publisher  │──────────────▶│ ekf_fusion_node │──────────────▶
└─────────────────┘              └─────────────────┘
                                          ▲
┌─────────────────┐  /imu_topic           │
│  imu_publisher  │───────────────────────┘
└─────────────────┘

🚀 CUDA ACCELERATION (Phase 6)

┌─────────────────┐  /gps_topic  ┌─────────────────┐   /filtered_pose
│  gps_publisher  │──────────────▶│  cuda_ekf_node  │──────────────▶
└─────────────────┘              └─────────────────┘
                                          ▲
┌─────────────────┐  /imu_topic           │
│  imu_publisher  │───────────────────────┘
└─────────────────┘                      │
                                         ▼
                                 ┌─────────┐
                                 │   GPU   │
                                 │ CUDA    │
                                 │ Kernel  │
                                 └─────────┘

🚀 Quick Start

Prerequisites

  • ROS 2 Humble (or compatible distribution)
  • C++14 compiler
  • colcon build tool
  • CUDA Toolkit 11.0+ (for GPU acceleration)
  • NVIDIA GPU with compute capability 3.5+

Installation

  1. Install CUDA Toolkit (if not already installed):

    # Ubuntu 22.04
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
    sudo dpkg -i cuda-keyring_1.0-1_all.deb
    sudo apt-get update
    sudo apt-get install cuda-toolkit-11-8
  2. Clone the package into your ROS 2 workspace:

    cd ~/ros2_ws/src
    # git clone <your-repository-url>
  3. Build the package:

    cd ~/ros2_ws
    colcon build --packages-select cpp_pubsub
  4. Source the workspace (required for each new terminal):

    source install/local_setup.bash

💻 Usage Examples

Phase 1: Basic Publisher/Subscriber Demo

Terminal 1 - Run the publisher:

ros2 run cpp_pubsub talker

Terminal 2 - Run the subscriber:

cd ~/ros2_ws
source install/local_setup.bash
ros2 run cpp_pubsub listener

Phase 2: Sensor Simulation Demo

Terminal 1 - GPS data publisher:

ros2 run cpp_pubsub gps_publisher

Terminal 2 - IMU data publisher:

ros2 run cpp_pubsub imu_publisher

Terminal 3 - Multi-subscriber (receives both GPS and IMU):

ros2 run cpp_pubsub multi_subscriber

Phase 3: Sensor Fusion Demo

Terminal 1 - GPS data publisher:

ros2 run cpp_pubsub gps_publisher

Terminal 2 - IMU data publisher:

ros2 run cpp_pubsub imu_publisher

Terminal 3 - Fusion node (combines GPS and IMU data):

ros2 run cpp_pubsub fusion_node

Phase 5: Extended Kalman Filter Demo

Terminal 1 - GPS data publisher:

ros2 run cpp_pubsub gps_publisher

Terminal 2 - IMU data publisher:

ros2 run cpp_pubsub imu_publisher

Terminal 3 - EKF fusion node:

ros2 run cpp_pubsub ekf_fusion_node

🚀 Phase 6: CUDA-Accelerated EKF Demo

Terminal 1 - GPS data publisher:

ros2 run cpp_pubsub gps_publisher

Terminal 2 - IMU data publisher:

ros2 run cpp_pubsub imu_publisher

Terminal 3 - CUDA EKF fusion node:

ros2 run cpp_pubsub cuda_ekf_node

Terminal 4 - Monitor GPU usage:

nvidia-smi -l 1  # Refresh every second

🔧 CUDA-Accelerated EKF Node for Sensor Fusion

This ROS 2 node performs extended Kalman filter (EKF)-based sensor fusion using GPS and IMU data. As part of Phase 6, we offloaded the Kalman update step to the GPU using CUDA.

🎯 Features

Subscribes to:

  • /gps_topic (sensor_msgs/NavSatFix)
  • /imu_topic (sensor_msgs/Imu)

Publishes:

  • /filtered_pose (geometry_msgs/PoseStamped)

GPU Acceleration: Offloads Kalman update to CUDA kernel for faster processing

🚀 CUDA Offloading (Phase 6)

Why CUDA?

The Kalman update step involves matrix-vector operations that are well-suited to GPU parallelism.

Offloaded Code

File: cuda/ekf_update.cu
Kernel:

__global__ void kalman_update(float* x, float* P, const float* gps, float* K_out) {
    int i = threadIdx.x;
    if (i < 2) {
        K_out[i*4 + i] = 0.5f; // Dummy gain
        x[i] = x[i] + K_out[i*4 + i] * (gps[i] - x[i]);
    }
    if (i < 16) {
        P[i] *= 0.9f;
    }
}

Host Call in EKF Node

File: src/cuda_ekf_node.cpp
Key snippet:

kalman_update<<<1, 4>>>(x_dev, P_dev, gps_dev, K_dev);
cudaDeviceSynchronize();

Memory Management

We allocate and copy GPU memory for:

  • x (4 floats), P (4×4 floats), gps (2 floats), K (dummy output)
  • All memory is freed at the end of the update step.

📊 Topic Details

Topic Message Type Publisher Subscriber(s) Description
/topic std_msgs::msg::String talker listener Basic text messages
/gps_topic sensor_msgs::msg::NavSatFix gps_publisher multi_subscriber, fusion_node, ekf_fusion_node, cuda_ekf_node Simulated GPS coordinates
/imu_topic sensor_msgs::msg::Imu imu_publisher multi_subscriber, fusion_node, ekf_fusion_node, cuda_ekf_node Simulated IMU data
/filtered_pose geometry_msgs::msg::PoseStamped ekf_fusion_node, cuda_ekf_node - Filtered pose estimation

🔧 Development

File Structure

cpp_pubsub/
├── CMakeLists.txt
├── package.xml
├── src/
│   ├── publisher_member_function.cpp    # talker node
│   ├── subscriber_member_function.cpp   # listener node
│   ├── gps_publisher.cpp               # GPS simulation
│   ├── imu_publisher.cpp               # IMU simulation
│   ├── multi_subscriber.cpp            # Multi-topic subscriber
│   ├── fusion_node.cpp                 # Sensor fusion node
│   ├── ekf_fusion_node.cpp             # EKF fusion node
│   └── cuda_ekf_node.cpp               # CUDA-accelerated EKF
├── cuda/
│   └── ekf_update.cu                   # CUDA kernel implementation
├── docs/
│   └── htop.png                        # Performance monitoring screenshot
└── README.md

Build Configuration

Make sure CUDA is enabled in your CMakeLists.txt:

enable_language(CUDA)
add_library(ekf_update STATIC cuda/ekf_update.cu)
set_target_properties(ekf_update PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
target_link_libraries(cuda_ekf_node Eigen3::Eigen ekf_update)

EKF Implementation Details

State Vector

// State: [x, y, vx, vy, ax, ay]
Eigen::VectorXd state(6);

EKF Node Structure

class EKFNode : public rclcpp::Node {
private:
    void predict();           // Time update
    void update_gps();        // GPS measurement update
    void update_imu();        // IMU measurement update
    
    Eigen::MatrixXd P;        // Covariance matrix
    Eigen::MatrixXd Q;        // Process noise
    Eigen::MatrixXd R_gps;    // GPS measurement noise
    Eigen::MatrixXd R_imu;    // IMU measurement noise
};

Building Individual Nodes

To build only this package:

colcon build --packages-select cpp_pubsub

🧪 Testing

Launch Standard EKF:

source install/setup.bash
ros2 run cpp_pubsub ekf_fusion_node

Launch CUDA EKF:

source install/setup.bash
ros2 run cpp_pubsub cuda_ekf_node

View in RViz2:

rviz2

📈 Performance Monitoring

CPU Usage Monitoring

# Monitor CPU usage of ROS 2 nodes
htop

# Filter to see only ROS 2 processes
htop -p $(pgrep -d',' ros2)

# Monitor system resources while running nodes
top -p $(pgrep -d',' -f "ros2|cpp_pubsub")

GPU Usage Monitoring

# Monitor GPU utilization
nvidia-smi -l 1

# Monitor GPU memory usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv -l 1

Profiling Commands

# Profile a specific ROS 2 node
perf record -g ros2 run cpp_pubsub fusion_node

# Generate flame graph
perf script | flamegraph.pl > fusion_profile.svg

# Monitor real-time performance
perf top -p $(pgrep fusion_node)

Performance Tips:

  • Use htop to monitor CPU usage when running multiple nodes
  • Use nvidia-smi to monitor GPU utilization for CUDA nodes
  • Check memory consumption during sensor fusion operations
  • Monitor system load when running all nodes simultaneously
  • Consider node priority if system resources are limited

🐛 Troubleshooting

Common Issues

  1. Node not found: Make sure you've sourced the workspace in each terminal:

    source install/local_setup.bash
  2. Build errors: Ensure all dependencies are installed:

    rosdep install --from-paths src --ignore-src -r -y
  3. CUDA compilation errors: Verify CUDA toolkit installation:

    nvcc --version
    nvidia-smi
  4. No messages received: Check if publisher and subscriber are running and topics match:

    ros2 topic list
    ros2 topic echo /gps_topic
  5. Fusion node not working: Ensure both GPS and IMU publishers are running before starting the fusion node.

  6. GPU memory errors: Check available GPU memory:

    nvidia-smi

📝 Additional Commands

Monitoring Topics

# List all active topics
ros2 topic list

# Monitor GPS data
ros2 topic echo /gps_topic

# Monitor IMU data
ros2 topic echo /imu_topic

# Monitor filtered pose output
ros2 topic echo /filtered_pose

# Check topic information
ros2 topic info /gps_topic

Node Information

# List running nodes
ros2 node list

# Get node information
ros2 node info /fusion_node
ros2 node info /cuda_ekf_node

🎯 Development Phases

✅ Phase 1: Basic Communication

  • Simple talker/listener pattern
  • String message publishing/subscribing

✅ Phase 2: Sensor Simulation

  • GPS publisher with NavSatFix messages
  • IMU publisher with Imu messages
  • Multi-subscriber for both topics

✅ Phase 3: Sensor Fusion

  • Fusion node combining GPS and IMU data
  • Simple averaging logic for position fusion
  • Synchronized data processing

✅ Phase 4: Profiling and Bottleneck Analysis

Goal: Learn to profile ROS 2 code for performance optimization

Setup & Installation: Install profiling tools:

sudo apt install linux-tools-common linux-tools-generic

Profiling Commands:

# Profile a specific ROS 2 node
perf record -g ros2 run cpp_pubsub fusion_node

# Generate flame graph
perf script | flamegraph.pl > fusion_profile.svg

# Monitor real-time performance
perf top -p $(pgrep fusion_node)

Deliverables:

  • Performance baseline measurements
  • Flame graph visualization of fusion node
  • Identified bottlenecks and optimization opportunities
  • Performance comparison before/after optimizations

✅ Phase 5: Extended Kalman Filter

Goal: Apply advanced sensor fusion using Extended Kalman Filter (EKF) logic

Implementation Details:

  • State Vector: [x, y, vx, vy, ax, ay]
  • EKF Node Structure: Predict and update steps
  • Noise Models: Process noise (Q) and measurement noise (R)

Performance Metrics:

  • Position estimation accuracy (RMSE)
  • Velocity estimation stability
  • Filter convergence time
  • Computational efficiency vs. simple fusion

🚀 ✅ Phase 6: CUDA-Accelerated EKF

Goal: Offload computationally intensive Kalman filter operations to GPU

Implementation Highlights:

  • CUDA Kernel: kalman_update function for GPU execution
  • Memory Management: Efficient GPU memory allocation/deallocation
  • Performance Gains: Significant speedup for matrix operations
  • Scalability: Handles larger state vectors and measurement updates

Deliverables:

  • CUDA-accelerated EKF implementation
  • Performance comparison: CPU vs GPU execution times
  • GPU utilization metrics
  • Real-time filtered pose output on /filtered_pose topic

📝 Future Work (Phase 7+)

  • Advanced Profiling: Use Nsight Systems (nsys) to profile kernel execution time
  • Optimized Kernels: Replace dummy gain with full Kalman gain matrix computation on GPU
  • Multi-GPU Support: Distribute computation across multiple GPUs
  • Real-time Constraints: Implement hard real-time guarantees for critical applications
  • Deep Learning Integration: Incorporate neural network-based sensor fusion models

📊 Performance Comparison

Node Type Average CPU Usage Memory Usage Update Rate GPU Utilization
Simple Fusion 2-5% 10MB 10Hz N/A
CPU EKF 8-15% 25MB 10Hz N/A
CUDA EKF 3-8% 30MB 10Hz 15-25%

📋 Requirements Summary

  • Hardware: NVIDIA GPU with compute capability 3.5+
  • Software: ROS 2 Humble, CUDA Toolkit 11.0+, Eigen3
  • Dependencies: sensor_msgs, geometry_msgs, std_msgs
  • Build System: colcon with CUDA support enabled

Note: Remember to source your workspace (source install/local_setup.bash) in each new terminal before running ROS 2 commands.

🚀 Ready to accelerate your sensor fusion pipeline with CUDA? Follow the installation steps above and dive into GPU-accelerated robotics!