Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry Pi 2B fail to boot occasionally CPUx: failed to come online #253

Open
antoniosk opened this issue Jan 31, 2021 · 7 comments
Open

Comments

@antoniosk
Copy link

Kernel version:
Linux P1 5.4.83-v7+ #1379 SMP Mon Dec 14 13:08:57 GMT 2020 armv7l GNU/Linux

The first Pi runs the "lite" Raspbian Buster image and the second the "desktop and recommended software". Both Raspberries boot to the console and all the installed packages come from Raspbian repositories.

Since the last December however, both Pis occasionally fail to boot. The network does not come up and when I connect a monitor, the devices freeze to the login: prompt. When I unplug/plug the Pis, everything is back to normal.

While investigating this issue I found that on every unsuccessful boot a CPU core does not come up. The following is logged in kern.log:

On an unsuccessful boot:
Jan 18 21:42:08 PI kernel: [ 0.007635] smp: Bringing up secondary CPUs ...
Jan 18 21:42:08 PI kernel: [ 1.040987] CPU1: failed to come online
Jan 18 21:42:08 PI kernel: [ 1.042804] CPU2: update cpu_capacity 1024
Jan 18 21:42:08 PI kernel: [ 1.042816] CPU2: thread -1, cpu 2, socket 15, mpidr 80000f02
Jan 18 21:42:08 PI kernel: [ 1.044511] CPU3: update cpu_capacity 1024
Jan 18 21:42:08 PI kernel: [ 1.044524] CPU3: thread -1, cpu 3, socket 15, mpidr 80000f03
Jan 18 21:42:08 PI kernel: [ 1.044740] smp: Brought up 1 node, 3 CPUs
Jan 18 21:42:08 PI kernel: [ 1.044866] SMP: Total of 3 processors activated (115.20 BogoMIPS).

On a successful boot:
Jan 19 17:00:46 PI kernel: [ 0.007643] smp: Bringing up secondary CPUs ...
Jan 19 17:00:46 PI kernel: [ 0.009263] CPU1: update cpu_capacity 1024
Jan 19 17:00:46 PI kernel: [ 0.009276] CPU1: thread -1, cpu 1, socket 15, mpidr 80000f01
Jan 19 17:00:46 PI kernel: [ 0.011320] CPU2: update cpu_capacity 1024
Jan 19 17:00:46 PI kernel: [ 0.011333] CPU2: thread -1, cpu 2, socket 15, mpidr 80000f02
Jan 19 17:00:46 PI kernel: [ 0.012983] CPU3: update cpu_capacity 1024
Jan 19 17:00:46 PI kernel: [ 0.012995] CPU3: thread -1, cpu 3, socket 15, mpidr 80000f03
Jan 19 17:00:46 PI kernel: [ 0.013205] smp: Brought up 1 node, 4 CPUs
Jan 19 17:00:46 PI kernel: [ 0.013333] SMP: Total of 4 processors activated (153.60 BogoMIPS).

I don't know if this issue is related to issue #232 "CPU1: failed to come online with 5.4.51-v7l+" but I had not such problems with kernel 5.4.51.

Thank you in advance and hope you are all well and safe!

@clivem
Copy link

clivem commented Jan 31, 2021

I posted about this several times in the forum "Moving Linux Kernel to 5.10" thread. First thought it was something new with the 5.10.x kernel, which I was testing at the time, until I saw it with Pi2 and 5.4.83-v7+ kernel.

It seems this isn't new behaviour with 5.10. Just witnessed it on a Pi2 with "official" stable apt 5.4.83 kernel.

@pelwell
Copy link
Collaborator

pelwell commented Jan 31, 2021

Have a read through this issue for some history: #232

So far it seems like a problem in the CPUs that only appears before the caches are enabled. There is nothing wrong with the code being executed, but sometimes it doesn't work as it should. Code placement might be a factor, otherwise I can think of no explanation why some builds are affected and not others. The fact that the failure is probabilistic rather than guaranteed only makes it harder to diagnose.

@antoniosk
Copy link
Author

Hard to diagnose indeed. I am also using a 4GB Pi4 with the official Raspbian Buster and all the updates installed as a secondary desktop without any problem so far.

On my Pi2 B I can confirm that:

  1. The problem appears in 1 out of 7 to 9 reboots / cold boots.
  2. Only three raspberry images appear on the screen during an unsuccessful boot.
  3. Usually CPU1 fails. CPU2 failed only once in one of my Pi2 B.
  4. The problem first appeared in December. According to apt history.log the kernel (raspberrypi-kernel:armhf (1.20201126-1, 1.20201201-1)) was updated on 4th of December 2020. Unfortunately the oldest kern.log records refer to beginning of January 2021 (was changed now to cover a longer period).

I can provide any other information i.e. log files etc, should you need it.

@clivem
Copy link

clivem commented Feb 5, 2021

[    0.000000] Linux version 5.10.11-v7+ (dom@buildbot) (arm-linux-gnueabihf-gcc-8 (Ubuntu/Linaro 8.4.0-3ubuntu1) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1399 SMP Thu Jan 28 12:06:05 GMT 2021
[    0.000000] OF: fdt: Machine model: Raspberry Pi 2 Model B Rev 1.1
[    1.040915] CPU2: failed to come online

@pelwell
Copy link
Collaborator

pelwell commented Feb 5, 2021

Oh good. Having just fixed an interesting I2C bug I was looking for another rabbit hole to disappear down.

@antoniosk
Copy link
Author

That's interesting. Updated to kernel 5.10.11-v7+ on Thursday and the freeze problem after boot seems to be fixed. I made around 30 reboots from ssh without issue, but I also noticed the line

Feb 5 13:37:17 PI kernel: [ 1.040913] CPU2: failed to come online
in kern.log.

As I am unaware of Raspberry Pi internals such as revision numbers, variants etc that may be relevant to the issue, I am posting some information from /proc/cpuinfo which applies to both devices I own:

Hardware : BCM2835
Revision : a01041
Model : Raspberry Pi 2 Model B Rev 1.1

CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc07
CPU revision : 5

I will repeat the boot test from the console during the weekend checking kern.log for each reboot.

@antoniosk
Copy link
Author

Kernel 5.10.11-v7+ made it harder to reproduce. Here are my results:

  1. Appeared in 1 out of 30...50 reboots/cold boots.
  2. If CPU1 fails, network and keyboard do not work making the Pi to "freeze". The term "freeze" is not exactly accurate; I assume that the USB hub depends on CPU1 and if CPU1 is down, the hub does not work.
  3. If CPU2 fails, the USB hub (network and keyboard) works and /proc/cpuinfo reports only the 3 working cores 0, 1 and 3.
  4. Only CPU1 or CPU2 failed during my tests.

As a workaround for (2) and based on (3), I use a small shell script to check the number CPU cores on /proc/cpuinfo on every boot. If this number is less than 4, the script reboots the Pi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants