Bug 156905 - System report badness, hangs on reboot on Dell 6450
Summary: System report badness, hangs on reboot on Dell 6450
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Dave Anderson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-05-05 03:22 UTC by Tom Sightler
Modified: 2012-06-20 13:17 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 13:17:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Revert reboot.c to same as version 2.6.9-5.0.5 (2.42 KB, patch)
2005-06-03 21:24 UTC, Tom Sightler
no flags Details | Diff

Description Tom Sightler 2005-05-05 03:22:03 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; X11; Linux i686) Opera 7.54  [en]

Description of problem:
We have RHEL4 AS installed on a Dell 6450 4-way system with 8 GB of RAM.  This 
system has been unable to properly reboot since the release of RHEL3 (Bugzilla 
102504).  With RHEL4 the system still does not reboot unless you pass the 
reboot=b,s option (both b and s are required, RHEL3 didn't work even with these 
options).  We were happy that our system were finally able to reboot properly 
again with RHEL4.

In troubleshooting another issue (apparently a memory leak/OOM issue) we decided 
to install the Beta U1 update for RHEL4.  With the version the system no longer 
will reboot again.  This appears to be a problem with the reboot=s parameter.  
Instead of rebooting the system gives a "Badness in smp_call_function" error and 
simply hangs.  Now we're back to a system that won't reboot with manual 
intervention.

The 2.6.9-5.ELsmp kernels all work fine, including 5.0.5, as long as we use the 
reboot=b,s option.
We tested the 2.6.9-6.37.ELsmp, 2.6.9-6.40.ELsmp, and 2.6.9-7.ELsmp and they all 
hang with the badness error and stack trace.

Thanks,
Tom


Version-Release number of selected component (if applicable):
kernel-2.6.9-7.ELsmp

How reproducible:
Always

Steps to Reproduce:
1.  Boot a 4-way, Dell 6450 with a 2.6.9-6.37.ELsmp or newer kernel
2.  Attempt to reboot the system with the reboot command

  

Actual Results:  System hangs with "Badness in smp_call_function"

Expected Results:  System should reboot

Additional info:

This is the stack trace, copied by hand so possibly with an error, but I attempted 
to be accurate.

Badness in smp_call_function at arch/i386/kernel/smp.c:557
[<c0116b47>] smp_call_function+0x50/0xc5
[<c011d9c7>] scheduler_tick+0x146/0x3e5
[<c011d863>] rebalance_tick+0x99/0xb7
[<c0116bfc>] smp_send_stop+0x13/0x1c
[<c0115254>] machine_restart+0xba/0x12b
[<c0116c4a>] smp_call_function_interrupt+0x3a/0x78
[<c02c7e32>] call_function_interrupt+0x1a/0x20
[<c0104018>] default_idle+0x0/0x2c
[<c0104041>] default_idle+0x29/0x2c
[<c010409d>] cpu_idle+0x26/0x3b
[<c0384784>] start_kernel+0x194/0x198

This happens with reboot=b,s or reboot=s.  It does not happen with no reboot= 
statement or reboot=b.  The system hangs instead of rebooting in all cases.

Comment 2 Jeff Burke 2005-05-10 13:30:45 UTC
Day 0 E1 kernel 2.6.9-5.0.3 is broken as well.

Comment 3 Tom Sightler 2005-05-10 14:09:56 UTC
Well, the bug may exist in the 2.6.9-5.0.3 kernel, but for whatever reason it is 
not triggered in my case with that kernel.  We have three 6450's that all seem to 
reboot fine with 2.6.9-5, 2.6.9-5.0.3, and 2.6.9-5.0.5 kernels as long as we use 
the "reboot=b,s" option.  I'm not saying that I've rebooted every one of them with 
every kernel, but two of these systems are still running 2.6.9-5.0.3 and I just 
remotely rebooted them Sunday without issues.  The other is running 2.6.9-5.0.5 
and was rebooted twice yesterday remotely.

With the 2.6.9-6.37 and 2.6.9-7 kernels I've not found any flags that will 
sucessfully reboot any these system, although I've not exhausted every option 
(things like reboot=s1,b).

I guess I'm not sure what you mean by "2.6.9-5.0.3" is broken as well.   It sure 
doesn't seem broken on my three servers.

Later,
Tom


Comment 4 Tom Sightler 2005-06-03 21:22:17 UTC
After some investigation it seems the kexec patch that was included in 5.0.5
kernels but not in later kernels was somehow solving my problem.

I have attached a patch for 2.6.9-11 that changes the reboot.c code in the same
way that 2.6.9-5 does and this allows the system to reboot without issues.

Interestingly the code in 2.6.9-11 seems to be identical to the code in the
current 2.6.11 kernel.org tree and I'm actually reverting to different code. 
I'm not sure what's correct or what the actual problem is.  Any clues are
appreciated.

Later,
Tom


Comment 5 Tom Sightler 2005-06-03 21:24:05 UTC
Created attachment 115141 [details]
Revert reboot.c to same as version 2.6.9-5.0.5

Comment 6 jason andrade 2005-09-15 06:42:33 UTC
i can verify this problem - i have the same issue with our PE6450, with 4G of ram.  it's running the most 
up to date system firmware from Dell too (A14).

-jason

Comment 7 jason andrade 2005-11-13 23:44:09 UTC
just letting you know this reboot problem (with the above SMP error message) still appears in RHEL4 QU2 
with the 2.6.9-22.0.1 smp kernel.

the 6450 won't boot with reboot=bios and won't boot (with an error) with reboot=smp

-jason

Comment 8 jason andrade 2005-11-14 00:06:53 UTC
i had a read of 102504 and noted the last entry there was a note saying dell l3 won't be supporting this 
hardware and so it was 'closed'.  i'm not sure if that means it's closed from dell's POV or whether this
issue is closed from the POV of the 6450 thread altogether.

to summarize across both issues:

o various combinations of reboot=s,b basically don't work
o 2 cpu servers seem to be able to reboot ok, 4 cpu ones don't
o it doesn't seem to be a memory issue
o it's frustating because the server can't be trusted if it can't be rebooted remotely
o dell's OMSA software provides a workaround (presumably via watchdog?) to allow it to reboot after a 
hang

-jason

Comment 11 Jiri Pallich 2012-06-20 13:17:13 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.


Note You need to log in before you can comment on or make changes to this bug.