-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTEST/IB: Don't fatal if an SL is blocked #10197
base: master
Are you sure you want to change the base?
Conversation
can you pls post an output of filed test with this PR? |
Log is still dumped and it contains the corresponding SL. [ RUN ] rc_mlx5/test_uct_ib_sl.check_ib_sl_config/1 <rc_mlx5/mlx5_2:1>
[ INFO ] < Transport retry count exceeded on mlx5_2:1/IB (synd 0x15 vend 0x81 hw_synd 0/0) >
[ INFO ] < RC QP 0x6782 wqe[0]: SEND --e [inl len 18] [rqpn 0x6780 dlid=17 sl=8 port=1 src_path_bits=0 dgid=fe81::ec0d:9a03:c0:45bc flow_label=0x0 sgid_index=0 traffic_class=0] >
uct/ib/test_ib.cc:74: Failure
Expected equality of these values:
sizeof(send_data)
Which is: 8
recv_buffer->length
Which is: 0
...
[ INFO ] < Transport retry count exceeded on mlx5_2:1/IB (synd 0x15 vend 0x81 hw_synd 0/0) >
[ INFO ] < RC QP 0x679e wqe[0]: SEND --e [inl len 18] [rqpn 0x679c dlid=17 sl=15 port=1 src_path_bits=0 dgid=fe81::ec0d:9a03:c0:45bc flow_label=0x0 sgid_index=0 traffic_class=0] >
uct/ib/test_ib.cc:74: Failure
Expected equality of these values:
sizeof(send_data)
Which is: 8
recv_buffer->length
Which is: 0
[ FAILED ] rc_mlx5/test_uct_ib_sl.check_ib_sl_config/1, where GetParam() = rc_mlx5/mlx5_2:1 (129841 ms)
[ RUN ] rc_mlx5/test_uct_ib_sl.check_ib_sl_config/2 <rc_mlx5/mlx5_3:1>
[ SKIP ] (OOO SL mask couldn't be detected for mlx5_3:1)
[ OK ] rc_mlx5/test_uct_ib_sl.check_ib_sl_config/2 (59 ms)
[ RUN ] rc_mlx5/test_uct_ib_sl.check_ib_sl_config/3 <rc_mlx5/mlx5_4:1>
[ OK ] rc_mlx5/test_uct_ib_sl.check_ib_sl_config/3 (822 ms)
[----------] 4 tests from rc_mlx5/test_uct_ib_sl (131975 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (131975 ms total)
[ PASSED ] 3 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] rc_mlx5/test_uct_ib_sl.check_ib_sl_config/1, where GetParam() = rc_mlx5/mlx5_2:1
1 FAILED TEST
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure it solves the issue, since we still want to fail if the SL is mapped to a valid VL
The test still fails as it's behavior is not changed. What is changed is that previously when it would be crashing, now it fails instead. |
/azp run UCX PR |
Azure Pipelines successfully started running 1 pipeline(s). |
What
Proposal to keep running gtests when TB does not support some SL.
Why ?
We have test bed that does not let traffic through for SL > 7, and it is causing gtest crash. In this specific traffic test, we can keep going, signaling failure anyways.
How ?
Ignore fatal.