-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ia32-generic-qemu: intermittent system reboots during fork() and exit #1012
ia32-generic-qemu: intermittent system reboots during fork() and exit #1012
Comments
It may be related with: |
Why reopen? It wasn't related, this issue was caused by kstack overflow on ia32 |
|
Issue caught another time in exit test, here is the output with https://github.com/phoenix-rtos/phoenix-rtos-project/actions/runs/9072269435/job/24927640850
|
I've encounted this issue in an automatic test.
|
Issue encounted in psh-history test: https://github.com/phoenix-rtos/phoenix-rtos-project/actions/runs/9477002770/job/26110890308
|
I've noticed that in every crash report in this thread, there is some garbage data in FPU registers that looks like data from stack. AFAIK it is not supposed to happen, but in theory, it shouldn't damage the stack. |
I've decided to look into it. Since there is an issue with |
I've managed to reproduce the issue locally on QEMU 6.2.0 (4096M of RAM allocated for the machine). It is not very efficient, additional RAM is required to avoid crashes caused by zombie processes. In my case it crashed in the second execution of this program: #include <stdio.h>
#include <stdlib.h>
static void func(size_t id) {
if (fork() == 0) {
for (size_t i = 0; i < 10000000; ++i) {
__asm__ volatile ("fwait");
__asm__ volatile ("fldz");
__asm__ volatile ("nop");
}
}
else {
int xxx;
__asm__ volatile ("fwait");
__asm__ volatile ("fldz");
__asm__ volatile ("nop");
wait(&xxx);
printf("%u\n", id);
}
exit(0);
}
int main(void)
{
for (size_t i = 0; i < 12800; ++i) {
if (fork() == 0) {
func(i);
}
}
for (size_t i = 0; i < 12800; ++i) {
int id;
int ret = wait(&id);
}
puts("");
return 0;
} Crash register dump:
As you can see, once again there is an issue with garbage data in the FPU. I'll try to reproduce this error again, and then check if my patch works. EDIT: Another Triple fault.
|
I've found the reason for the triple fault. After we execute |
I've submitted changes that decrease likelihood of a crash in this branch: https://github.com/phoenix-rtos/phoenix-rtos-kernel/tree/astalke/RTOS-858 (at least in my test code, that I've included in one of comments above this one) Unfortunately these changes don't fix the issue and I think the last commit may cause errors in FPU calculations. Unfortunately I don't have enough time to make a proper fix. |
Update
The issue has been reopened as it may also be related to #885 and problems with the psh runfile test on ia32-generic-qemu. Currently, I haven't observed it occurring directly in exit tests.
Problem occures with merge of 28ab383e627fe1d26df5737b12a938fe5ec473a3 in phoenix-rtos-kernel
Encountering intermittent system reboots on the
ia32-generic-qemu
. Specifically, the issue occurs approximately5 out of 100
times when executing a test that involvesfork()
followed by test_common.test_exitPtr(EXIT_SUCCESS);. The expected behavior is for a SIGCHLD signal to be sent after the child process exits, but instead, the system reboots unexpectedly.Output from CI:
Example workflow from github:
https://github.com/phoenix-rtos/phoenix-rtos-ports/actions/runs/7846751690/job/21414331565
Project version: 0f35de2
The text was updated successfully, but these errors were encountered: