Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: segmentation fault due to race condition. #285

Merged
merged 6 commits into from
May 21, 2024

Conversation

xygyo77
Copy link
Contributor

@xygyo77 xygyo77 commented May 15, 2024

Description

Segmentation failure occurred at autoware pandar_node_container.
From the backtrace, it was confirmed that it occurs within TracingController::add_allowed_messages() in caret_trace.

Related links

https://tier4.atlassian.net/browse/RT2-1592

Notes for reviewers

After adding an exclusive lock to add_allowed_messages(), the segmentation fault no longer occurred, but the cause was unknown.
Examination revealed that ‘writes’ were mixed in the shared lock and that the segmentation fault was caused by corruption of the map data due to thread contention.
The shared and exclusive locks were reviewed with a focus on TracingController and the following modifications were made.

  • Move add_allowed_messages() from is_allowed_publisher_handle_and_add_message() to ros_trace_rclcpp_intra_publish()
  • As a result, is_allowed_publisher_handle_and_add_message() can now be replaced by is_allowed_publisher_handle()
  • is_allowed_node() → std::shared_lock
  • is_allowed_subscription_handle() → changed to std::shared_lock
  • is_allowed_timer_handle() → changed to std::lock_guard *
  • is_allowed_state_machine() → changed to std::lock_guard *
  • add_allowed_messages() → std::lock_guard
    With the * mark, thread races could occur separately from this issue

Pre-review checklist for the PR author

In-review checklist for the PR reviewers

The PR reviewers must check the checkboxes below before approval.

  • The PR has been properly tested.
  • The PR has been reviewed.

Post-review checklist for the PR author

The PR author must check the checkboxes below before merging.

  • There are no open discussions or they are tracked via tickets.
  • The PR is ready for merge.

After all checkboxes are checked, anyone who has write access can merge the PR.

@codecov-commenter
Copy link

codecov-commenter commented May 15, 2024

Codecov Report

Attention: Patch coverage is 17.64706% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 55.18%. Comparing base (75e1c3f) to head (b4095fc).
Report is 15 commits behind head on main.

Files Patch % Lines
CARET_trace/src/tracing_controller.cpp 21.42% 7 Missing and 4 partials ⚠️
CARET_trace/src/ros_trace_points.cpp 0.00% 3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #285       +/-   ##
===========================================
+ Coverage   25.02%   55.18%   +30.16%     
===========================================
  Files          58       28       -30     
  Lines        3321     2180     -1141     
  Branches     1085     1215      +130     
===========================================
+ Hits          831     1203      +372     
+ Misses       1756      697     -1059     
+ Partials      734      280      -454     
Flag Coverage Δ
differential 55.18% <17.64%> (?)
total ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xygyo77 xygyo77 changed the title fix: add mutex lock to add_allowed_messages() that takes nesting into account. fix: segmentation fault occurrence due to ‘writes’ mixed in the scope of the shared lock and map multiple releases. May 16, 2024
@xygyo77 xygyo77 changed the title fix: segmentation fault occurrence due to ‘writes’ mixed in the scope of the shared lock and map multiple releases. fix: segmentation fault due to map corruption caused by thread contention due to ‘writes’ being mixed in the scope of the shared lock. May 16, 2024
@ymski ymski requested review from ymski and isp-uetsuki May 16, 2024 02:44
Copy link
Contributor

@isp-uetsuki isp-uetsuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xygyo77 xygyo77 changed the title fix: segmentation fault due to map corruption caused by thread contention due to ‘writes’ being mixed in the scope of the shared lock. fix: segmentation fault due to race condition. May 16, 2024
Copy link
Contributor

@ymski ymski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xygyo77 xygyo77 merged commit b4d8c98 into tier4:main May 21, 2024
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants