From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; X11; Linux i686) Opera 7.54 [en] Description of problem: We have RHEL4 AS installed on a Dell 6450 4-way system with 8 GB of RAM. This system has been unable to properly reboot since the release of RHEL3 (Bugzilla 102504). With RHEL4 the system still does not reboot unless you pass the reboot=b,s option (both b and s are required, RHEL3 didn't work even with these options). We were happy that our system were finally able to reboot properly again with RHEL4. In troubleshooting another issue (apparently a memory leak/OOM issue) we decided to install the Beta U1 update for RHEL4. With the version the system no longer will reboot again. This appears to be a problem with the reboot=s parameter. Instead of rebooting the system gives a "Badness in smp_call_function" error and simply hangs. Now we're back to a system that won't reboot with manual intervention. The 2.6.9-5.ELsmp kernels all work fine, including 5.0.5, as long as we use the reboot=b,s option. We tested the 2.6.9-6.37.ELsmp, 2.6.9-6.40.ELsmp, and 2.6.9-7.ELsmp and they all hang with the badness error and stack trace. Thanks, Tom Version-Release number of selected component (if applicable): kernel-2.6.9-7.ELsmp How reproducible: Always Steps to Reproduce: 1. Boot a 4-way, Dell 6450 with a 2.6.9-6.37.ELsmp or newer kernel 2. Attempt to reboot the system with the reboot command Actual Results: System hangs with "Badness in smp_call_function" Expected Results: System should reboot Additional info: This is the stack trace, copied by hand so possibly with an error, but I attempted to be accurate. Badness in smp_call_function at arch/i386/kernel/smp.c:557 [<c0116b47>] smp_call_function+0x50/0xc5 [<c011d9c7>] scheduler_tick+0x146/0x3e5 [<c011d863>] rebalance_tick+0x99/0xb7 [<c0116bfc>] smp_send_stop+0x13/0x1c [<c0115254>] machine_restart+0xba/0x12b [<c0116c4a>] smp_call_function_interrupt+0x3a/0x78 [<c02c7e32>] call_function_interrupt+0x1a/0x20 [<c0104018>] default_idle+0x0/0x2c [<c0104041>] default_idle+0x29/0x2c [<c010409d>] cpu_idle+0x26/0x3b [<c0384784>] start_kernel+0x194/0x198 This happens with reboot=b,s or reboot=s. It does not happen with no reboot= statement or reboot=b. The system hangs instead of rebooting in all cases.
Day 0 E1 kernel 2.6.9-5.0.3 is broken as well.
Well, the bug may exist in the 2.6.9-5.0.3 kernel, but for whatever reason it is not triggered in my case with that kernel. We have three 6450's that all seem to reboot fine with 2.6.9-5, 2.6.9-5.0.3, and 2.6.9-5.0.5 kernels as long as we use the "reboot=b,s" option. I'm not saying that I've rebooted every one of them with every kernel, but two of these systems are still running 2.6.9-5.0.3 and I just remotely rebooted them Sunday without issues. The other is running 2.6.9-5.0.5 and was rebooted twice yesterday remotely. With the 2.6.9-6.37 and 2.6.9-7 kernels I've not found any flags that will sucessfully reboot any these system, although I've not exhausted every option (things like reboot=s1,b). I guess I'm not sure what you mean by "2.6.9-5.0.3" is broken as well. It sure doesn't seem broken on my three servers. Later, Tom
After some investigation it seems the kexec patch that was included in 5.0.5 kernels but not in later kernels was somehow solving my problem. I have attached a patch for 2.6.9-11 that changes the reboot.c code in the same way that 2.6.9-5 does and this allows the system to reboot without issues. Interestingly the code in 2.6.9-11 seems to be identical to the code in the current 2.6.11 kernel.org tree and I'm actually reverting to different code. I'm not sure what's correct or what the actual problem is. Any clues are appreciated. Later, Tom
Created attachment 115141 [details] Revert reboot.c to same as version 2.6.9-5.0.5
i can verify this problem - i have the same issue with our PE6450, with 4G of ram. it's running the most up to date system firmware from Dell too (A14). -jason
just letting you know this reboot problem (with the above SMP error message) still appears in RHEL4 QU2 with the 2.6.9-22.0.1 smp kernel. the 6450 won't boot with reboot=bios and won't boot (with an error) with reboot=smp -jason
i had a read of 102504 and noted the last entry there was a note saying dell l3 won't be supporting this hardware and so it was 'closed'. i'm not sure if that means it's closed from dell's POV or whether this issue is closed from the POV of the 6450 thread altogether. to summarize across both issues: o various combinations of reboot=s,b basically don't work o 2 cpu servers seem to be able to reboot ok, 4 cpu ones don't o it doesn't seem to be a memory issue o it's frustating because the server can't be trusted if it can't be rebooted remotely o dell's OMSA software provides a workaround (presumably via watchdog?) to allow it to reboot after a hang -jason
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.