-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HAL_ETH does non-atomic Test-and-Set operations on gState which is not thread-safe #70
Comments
Hello @KarstenHohmeier, Thank you for your contribution. This function has been reliably used in previous projects without any race condition issues, and tests show that the code functions correctly. If this point has caused you any issues, please inform us about your specific use case. We remain open to further discussion. Thank you for your understanding and cooperation. |
The code referenced is a general example of the code-pattern ST Micro uses to deal with concurrency and locking inside HAL_ETH. The real issues arise in the TX and RX functions in HAL_ETH where you omit this locking pattern entirely or execute this Test-and-Set pattern only half-way (test for STARTED but no set to BUSY) see #71 and #72. If you would try and fix #71 and #72 by using the gState locking pattern - as you do throughout HAL_ETH - it would still not be entirely thread-safe because you don't do it atomically. This is what this issue is about. To remind you of the locking pattern you seem to use and make the case that it is conceptually broken by being non-atomic and would thus not be suitable (without being fixed itself) for fixing the real concurrency issues related to the TX and RX function (#71, #72). This is also not something your tests would show. The realization as a developer that non-atomic test-and-set of a locking variable is flawed and should be fixed regardless of prior track record or existing tests. |
Hello @KarstenHohmeier, The HAL driver is designed to be a generic library, usable in a variety of projects, whether they use an RTOS or not. This flexibility means that certain responsibilities, such as managing race conditions, are delegated to the end user. In many cases, users do not need these mechanisms, especially in non-preemptive environments or simple applications. With Regards, |
If you do not guarantee thread-safety for HAL_ETH - which is a valid design decision - then you should add this as a giant warning to the API / function descriptions. Also you should probably talk to your colleagues that are responsible for generating If you are right, then the code they generate for
To think a little more about "non-preemptive environments": Imagine a scenario without preemption (no RTOS) where the user code runs in a main-loop and sends one TX packet every second. Then you connect a button to the system that triggers an interrupt and sends a time-critical notification packet directly from the IRQ handler. You would expect this to be ok, since the packets are handed to the DMA via descriptor lists. But effectively your IRQ code is preempting the main-loop and might cause the same concurrency issues that RTOS preemption would if the calls to HAL_ETH are not thread-safe. This gets even worse with nested IRQs where even interrupts can be interrupted. So even in a bare-metal scenario with only IRQs the user would have to guard the HAL_ETH usage themselves to make it function correctly under all circumstances. So whatever performance you think you gain by omitting thread-safety cannot be gained "legitimately" because the user has to add back thread-safety in the user code. But this is more thought experiment and colored by my opinion. I am sure you will do the right thing and sort this out in a sensible fashion and thank you for taking these bug reports seriously. |
HAL_ETH uses gState as a "poor man's mutex" to guard it's internal state and associated hardware registers against concurrent access and enforce a specific order of initialization.
When gState gets tested and set to BUSY for example, HAL_ETH does critical things to its own state or the hardware:
https://github.com/STMicroelectronics/stm32h7xx_hal_driver/blob/ceda3ceeca2ee77a76d2de2ee6b560ad87c90a48/Src/stm32h7xx_hal_eth.c#L863-L866
Yet these test and set operations are not done atomically (e.g. inside a critical section) so the HAL_ETH locking via gState is vulnerable to race conditions in situations with preemption (e.g. with lwIP + FreeRTOS)
The text was updated successfully, but these errors were encountered: