Red Hat Bugzilla – Bug 156905
System report badness, hangs on reboot on Dell 6450
Last modified: 2012-06-20 09:17:13 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; X11; Linux i686) Opera 7.54 [en]
Description of problem:
We have RHEL4 AS installed on a Dell 6450 4-way system with 8 GB of RAM. This
system has been unable to properly reboot since the release of RHEL3 (Bugzilla
102504). With RHEL4 the system still does not reboot unless you pass the
reboot=b,s option (both b and s are required, RHEL3 didn't work even with these
options). We were happy that our system were finally able to reboot properly
again with RHEL4.
In troubleshooting another issue (apparently a memory leak/OOM issue) we decided
to install the Beta U1 update for RHEL4. With the version the system no longer
will reboot again. This appears to be a problem with the reboot=s parameter.
Instead of rebooting the system gives a "Badness in smp_call_function" error and
simply hangs. Now we're back to a system that won't reboot with manual
The 2.6.9-5.ELsmp kernels all work fine, including 5.0.5, as long as we use the
We tested the 2.6.9-6.37.ELsmp, 2.6.9-6.40.ELsmp, and 2.6.9-7.ELsmp and they all
hang with the badness error and stack trace.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Boot a 4-way, Dell 6450 with a 2.6.9-6.37.ELsmp or newer kernel
2. Attempt to reboot the system with the reboot command
Actual Results: System hangs with "Badness in smp_call_function"
Expected Results: System should reboot
This is the stack trace, copied by hand so possibly with an error, but I attempted
to be accurate.
Badness in smp_call_function at arch/i386/kernel/smp.c:557
This happens with reboot=b,s or reboot=s. It does not happen with no reboot=
statement or reboot=b. The system hangs instead of rebooting in all cases.
Day 0 E1 kernel 2.6.9-5.0.3 is broken as well.
Well, the bug may exist in the 2.6.9-5.0.3 kernel, but for whatever reason it is
not triggered in my case with that kernel. We have three 6450's that all seem to
reboot fine with 2.6.9-5, 2.6.9-5.0.3, and 2.6.9-5.0.5 kernels as long as we use
the "reboot=b,s" option. I'm not saying that I've rebooted every one of them with
every kernel, but two of these systems are still running 2.6.9-5.0.3 and I just
remotely rebooted them Sunday without issues. The other is running 2.6.9-5.0.5
and was rebooted twice yesterday remotely.
With the 2.6.9-6.37 and 2.6.9-7 kernels I've not found any flags that will
sucessfully reboot any these system, although I've not exhausted every option
(things like reboot=s1,b).
I guess I'm not sure what you mean by "2.6.9-5.0.3" is broken as well. It sure
doesn't seem broken on my three servers.
After some investigation it seems the kexec patch that was included in 5.0.5
kernels but not in later kernels was somehow solving my problem.
I have attached a patch for 2.6.9-11 that changes the reboot.c code in the same
way that 2.6.9-5 does and this allows the system to reboot without issues.
Interestingly the code in 2.6.9-11 seems to be identical to the code in the
current 2.6.11 kernel.org tree and I'm actually reverting to different code.
I'm not sure what's correct or what the actual problem is. Any clues are
Created attachment 115141 [details]
Revert reboot.c to same as version 2.6.9-5.0.5
i can verify this problem - i have the same issue with our PE6450, with 4G of ram. it's running the most
up to date system firmware from Dell too (A14).
just letting you know this reboot problem (with the above SMP error message) still appears in RHEL4 QU2
with the 2.6.9-22.0.1 smp kernel.
the 6450 won't boot with reboot=bios and won't boot (with an error) with reboot=smp
i had a read of 102504 and noted the last entry there was a note saying dell l3 won't be supporting this
hardware and so it was 'closed'. i'm not sure if that means it's closed from dell's POV or whether this
issue is closed from the POV of the 6450 thread altogether.
to summarize across both issues:
o various combinations of reboot=s,b basically don't work
o 2 cpu servers seem to be able to reboot ok, 4 cpu ones don't
o it doesn't seem to be a memory issue
o it's frustating because the server can't be trusted if it can't be rebooted remotely
o dell's OMSA software provides a workaround (presumably via watchdog?) to allow it to reboot after a
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life.
Please See https://access.redhat.com/support/policy/updates/errata/
If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.