Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAL_ETH_ReleaseTxPacket() does not test gState and races with itself and TX functions when accessing TX descriptors and counters #72

Open
KarstenHohmeier opened this issue Sep 8, 2024 · 1 comment
Assignees
Labels
bug Something isn't working eth Ethernet-related issue or pull-request hal HAL-LL driver-related issue or pull-request. internal bug tracker Issue confirmed and reported into a ticket in the internal bug tracking system

Comments

@KarstenHohmeier
Copy link

HAL_ETH_ReleaseTxPacket() does NOT check gState before execution AT ALL.

https://github.com/STMicroelectronics/stm32h7xx_hal_driver/blob/ceda3ceeca2ee77a76d2de2ee6b560ad87c90a48/Src/stm32h7xx_hal_eth.c#L1400-L1413

It just runs and modifies TX descriptors, decrements buffer counters and reads state from &heth into local variables.
This buffered state can become stale/outdated if HAL_ETH_ReleaseTxPacket() gets preempted and other threads issuing calls to HAL_ETH_Transmit_IT() or HAL_ETH_Transmit() get scheduled instead.

Among other things HAL_ETH_ReleaseTxPacket() contains the decrement of dmatxdesclist->BuffersInUse.

https://github.com/STMicroelectronics/stm32h7xx_hal_driver/blob/ceda3ceeca2ee77a76d2de2ee6b560ad87c90a48/Src/stm32h7xx_hal_eth.c#L1477

for which the increment has been guarded by a critical section

https://github.com/STMicroelectronics/stm32h7xx_hal_driver/blob/ceda3ceeca2ee77a76d2de2ee6b560ad87c90a48/Src/stm32h7xx_hal_eth.c#L3333-L3340

So somebody suspected there might be race conditions there but failed to follow through ;)

You really, really, really need to lock this properly with atomic Test-and-Set of gState, because even the read-modify-write of TX descriptor OWN bits is potentially racy (see #71 #70 and #69)

This is not a theoretical bug! Under high incoming packet load (~8000 packets per second RX) there are frequent preemptions by the input thread which runs with realtime priority (code generated by CubeIDE).

grafik

This causes the TX calls and TX packet releases in other threads to be preempted and scheduled randomly and thrashes the data in &heth to a point where the system would not send TX packets any more. Under low packet load this can still occur and leads to unstable TX operation where systems will stop transmitting data after multiple days when these race conditions randomly occur but with much lower probability.

@ASEHSTM
Copy link
Contributor

ASEHSTM commented Sep 13, 2024

ST Internal Reference: 191088

@ASEHSTM ASEHSTM moved this from To do to Analyzed in stm32cube-mcu-hal-dashboard Sep 13, 2024
@ASEHSTM ASEHSTM added the internal bug tracker Issue confirmed and reported into a ticket in the internal bug tracking system label Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working eth Ethernet-related issue or pull-request hal HAL-LL driver-related issue or pull-request. internal bug tracker Issue confirmed and reported into a ticket in the internal bug tracking system
Projects
Development

No branches or pull requests

3 participants