autoware_reference_system

This file is meant to define the Autoware Reference System and all of its nodes, topics and message types.

To get started and test this system yourself, head down to the Quick Start section and follow the instructions there.

See the generating a node graph using graphviz section on how to generate the above image

Evaluation Criteria

The autoware reference system was made with a few goals in mind. See the list below to get the complete picture of why certian tests are run and why this system was chosen as a good system to benchmark different executors.

Each item below should have a corresponding test report with it or be able to be extracted from an existing test report. See the testing section for more details on how to generate your own test reports.

If you believe we are missing another metric to measure executors by, please create an issue and let us know!

Key Performance Indicators (KPIs)

CPU utilization
- In general a lower CPU utilization is better since it enables you to choose a smaller CPU or have more functionality on a larger CPU for other things.
- The lower CPU utilization the better
Memory utilization
- In general a lower memory utilization is better since it enables you to choose a smaller memory or have more space for other things
- The lower memory utilization the better
Use latest samples, count dropped samples
- This is representative of the real-world where old sensor data is much less valuable than new sensor data
  - For example an image from 30 seconds ago wont help you to drive down the road as much as an image from 0.1 seconds ago)
- If there is more than one new sample, the old ones will be dropped in favor of the newest sample
- As a result, dropped messages may mean that information was lost
  - Fusion Nodes may drop messages by design if their inputs have different frequencies, do not count dropped messages for these nodes
  - Transform nodes should not drop messages though, and these should be counted
- The lower number of dropped samples the better
Every Front Lidar sample should cause update in Object Collision Estimator
- The Front and Rear Lidars have the same publishing frequency
- This means Object Collision Estimator should run for every lidar sample
- Count number of executions of Object Collision Estimator and Front Lidar and report any difference
- The smaller the difference in executions, the better
Lowest possible latency from Front Lidar to Object Collision Estimator
- As in the real world, we want to know as soon as possible if the reference system will collide with something
- Measure the mean and max latency for this chain of nodes
- The lower latency of the signal chain the better
The Behavior Planner should be as cyclical as possible
- The desired behavior of the Behavior Planner is to be as cyclical as possible, meaning it should be executed as close to its set frequency of 100ms as possible
- Measure the jitter and drift over time of the timer callback
- The lower the jitter and drift of the Behavior Node timer callback the better

Message Types

A single message type is used for the entire reference system when generating results in order to simplify the setup as well as make it more repeatible and extensible.

This means only one message type from the list below is used during any given experimental run for every node in the reference system.

Message4kB
- reference message with a fixed size of 4 kilobytes (kB)

Other messages with different fixed sizes could be added here in the future.

When reporting results it will be important to include the message type used duing the experiement so that comparisons can be done "apples to apples" and not "apples to pears".

Autoware Reference System

Built from a handful of building-block node types, each one of these nodes are meant to simulate a real-world node from the Autoware.Auto project lidar data pipeline.

Under each node type are the requirements used for this specific reference system, autoware_reference_system. Future reference systems could have slightly different requirements and still use the same building-block node types.

For simplicity's sake, every node except for the command nodes only ever publishes one topic and this topic has the same name as the node that publishes it. However, each topic can be subscribed to by multiple different nodes.

Message Type
- all nodes use the same message type during any single test run
- default message type:
  - Message4kB
- to be implemented:
  - Message64kB
  - Message256kB
  - Message512kB
  - Message1024kB
  - Message5120kB
Sensor Nodes
- all sensor nodes have a publishing rate (cycle time) of 100 milliseconds
- all sensor_nodes publish the same message type
- total of 5 sensor nodes:
Transform Nodes
- all transform nodes have one subscriber and one publisher
- all transform nodes start processing for 50 milliseconds after a message is received
- publishes message after processing is complete
- total of 10 transform nodes:
Fusion Nodes
- all fusion nodes have two subscribers and one publisher for this reference system
- all fusion nodes start processing for 25 milliseconds after a message is received from all subscriptions
- all fusion nodes have a max input time difference between the first input received and last input received before publishing of 9999 seconds
- publishes message after processing is complete
- total of 5 fusion nodes:
Cyclic Nodes
- for this reference system there is only 1 cyclic node
- this cyclic node has 6 subscribersand one publisher
- this cyclic node starts processing for 1 millisecond after a message is received from any single subscription
- publishes message after processing is complete
Command Nodes
- all command nodes have 1 subscriber and zero publishers
- all command nodes prints out the final latency statistics after a message is received on the specified topic
- total of 2 command nodes:
  - VehicleDBWSystem
  - IntersectionOutput
Intersection Nodes
- for this reference system there is only EuclideanClusterDetector
- this intersection node has 2 subscribers and 2 publishers
- publishes message after processing is complete on the correspoding publisher

Quick Start

This section will go over how to clone, build and run the autoware_reference_system in order to generate your own test reports.

Dependencies

Before running the tests there are a few prerequisites to complete:

Install python depedencies used during test runs and report generation
- python3 -m pip install psrecord bokeh networkx numpy pandas
Install dependencies using the following command from the colcon_ws directory:
- rosdep install --from-paths src --ignore-src -y
Install LTTng and ros2_tracing following the instructions in ros2_tracing
- Note: if you are setting up a realtime linux kernel for a raspberry pi using this docker file, it should already include LTTng
- Note: make sure to clone ros2_tracing into the same workspace as where you put the reference-system, the tests will not properly run if they are not in the same directory.

Tests will fail if any of the above dependencies are missing on the machine.

Once the above steps are complete you sould be ready to configure the setup for your platform and run the tests to generate some results.

Configure Processing Time

Many nodes in the reference system are actually performing some psuedo work by finding prime numbers up until some maximum value. Depending on the platform, this maximum value will need to be changed so that these nodes do not take an absurd amount of time. This maximum value should be chosen on a platform-by-platform basis so that the total run time of this work takes some desired length of time.

In order to make finding this maximum value a bit easier across many different platforms a simple number_cruncher_benchmark is provided that will loop over various maximum values and spit out how long each one takes to run. After running this executable on your platform you should have a good idea what maximum value you should use in your timing configuration so that each node does some measurable work for some desired amount of time.

Here is an example output of the number_cruncher_benchmark run on a typical development platform (Intel 9i7):

ros2 run autoware_reference_system number_cruncher_benchmark 
maximum_number     run time
          64       0.001609ms
         128       0.002896ms
         256       0.006614ms
         512       0.035036ms
        1024       0.050957ms
        2048       0.092732ms
        4096       0.22837ms
        8192       0.566779ms
       16384       1.48837ms
       32768       3.64588ms
       65536       9.6687ms
      131072       24.1154ms
      262144       62.3475ms
      524288       162.762ms
     1048576       429.882ms
     2097152       1149.79ms

Run the above command on your system, select your desired run_time and place the corresponding maximum_number in the timing configuration file for the desired nodes.

Running the Tests

Source your ROS distribution as well as your ros2_tracing overlay, compile this repository using the proper CMake arguments and generate some test results:

Make sure you've installed the required dependencies as outlined above before trying to run these tests.

Supported CMake Arguments

RUN_BENCHMARK
- Tell CMake to build the benchmark tests that will check the reference system against its requirements before running a sweep of tests to generate trace files and reports
- Without the RUN_BENCHMARK variable set to True only the standard linter tests will be run
TEST_PLATFORM
- Test CMake to build the tests to check if the tests are being run from a supported platform or not
- This flag can be ommited if you would like to run the tests on a development system before running them on a supported platform.
ALL_RMWS
- Set this to ON if you'd like to run tests on all available RMWS as well
- Otherwise use only default RMW (first one listed by CMake function get_available_rmw_implementations)
- Defaults to OFF

Make sure you've installed the required dependencies as outlined above before trying to run these tests.

# source your ROS distribution
source /opt/ros/galactic/setup.bash

# cd to your colcon_ws with this repo and `ros2_tracing` inside
cd /path/to/colcon_ws
# build packages with benchmark tests enabled
colcon build --cmake-args -DRUN_BENCHMARK=TRUE -DTEST_PLATFORM=TRUE

# IMPORTANT
# source the newly built workspace to make sure to use the updated tracetools package
source install/local_setup.bash
# run tests, generate traces and reports
colcon test

Note: during the testing trace data generated from LTTng will be placed in $ROS_HOME/tracing.

If the $ROS_HOME/tracing directory is missing the tests will automatically generate it for you.

This directory should now hold tracing data and reports for all ros2_tracing tests performed.

Additionally, CPU and Memory Usage tests generate data and reports and saves them to $ROS_HOME/memory.

PICAS Executor

To build the PICAS executor, you can use the PICAS CMake variable:

# build packages with the PICAS executor enabled
colcon build --cmake-args -DRUN_BENCHMARK=TRUE -DTEST_PLATFORM=TRUE -DPICAS=TRUE

This compiles autoware_default_singlethreaded_picas_single_executor.cpp and autoware_default_singlethreaded_picas_multi_executors.cpp, where the first one launches a single instance (thread) of the PICAS executor and the second launches four threads on different CPUs (CPUS 0-3). PICAS allows the user to assign priorities to individual callbacks. The default callback priorities are defined in system/priority/default.hpp.

Configuration for PICAS: Since PICAS leverages rt priority of with SCHED_FIFO scheduling policy in the Linux. Please modify /etc/security/limits.conf file as below, then reboot the system.

<userid>    hard    rtprio  99
<userid>    soft    rtprio  99

Test Results and Reports

Reports are automatically generated depending on which tests are run. Below are the locations where each report is stored after successfully running the tests as described above.

CPU and Memory Tests
- results are stored in your ${ROS_HOME}/memory directory
- if ${ROS_HOME} is not set, it defaults to ${HOME}/.ros/memory
Executor KPI tests (Latency, Dropped Messages and Jitter)
- results are generated directly to the tests streams.log file using std::cout prints
- reports are generated and stored in the log/latest_test/autoware_reference_system directory
ros2_tracing Tests
- results and reports are stored in your ${ROS_HOME}/tracing directory
- if ${ROS_HOME} is not set, it defaults to ${HOME}/.ros/tracing

More reports can be added going forward.

Generating Node Graph Image

To generate the image shown above you can take advantage of a program called graphviz that has a command line interface (CLI) command dot.

First, check out the provided .dot file within this directory to get an idea of how the dot syntax works (feel free to modify it for your use case or future reference systems).

To generate the .dot file into an .svg image, run the following command:

dot -Tsvg autoware_reference_system.dot

Note: you can change the generated image type to any of the supported type parameters if you would like a different filetype.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

autoware_reference_system

Evaluation Criteria

Key Performance Indicators (KPIs)

Message Types

Autoware Reference System

Quick Start

Dependencies

Configure Processing Time

Running the Tests

Supported CMake Arguments

PICAS Executor

Test Results and Reports

Generating Node Graph Image

Files

README.md

Latest commit

History

README.md

File metadata and controls

autoware_reference_system

Evaluation Criteria

Key Performance Indicators (KPIs)

Message Types

Autoware Reference System

Quick Start

Dependencies

Configure Processing Time

Running the Tests

Supported CMake Arguments

PICAS Executor

Test Results and Reports

Generating Node Graph Image