I'm experiencing the exact same symptoms described in Bug 127689 except I'm
running Fedora 5. My system is a PowerEdge 6450 with 4 x Xeon 900MHz
processors, 2GB RAM, and a PERC 2/DC card attached to a JBOD running on
As described in the RHEL 3 bug, adding "reboot=b,s" HAS fixed the problem. But
I thought this should be brought out into the open anyway.
If any other information is needed to verify and monkey with this, just let me
+++ This bug was initially created as a clone of Bug #127689 +++
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; Linux i686; U) Opera 7.51 [en]
Description of problem:
We have three Dell PowerEdge 6450 servers that were upgrded from RHEL
AS 2.1 to 3.0 several months ago. After the upgrade it was
discovered that all of these servers would hang when attempting to
reboot. After some research we discovered several reports on the web
about the same issue and the fix seemed was to add "reboot=b,s" to
the boot command line. This indeed did fix the issue for two of the
three servers, however, the third server continued to fail to reboot.
The only difference between the servers that would reboot and the
servers that wouldn't is that the one server that fails has 4 CPU's
while the others only have 2 CPU's.
I continued to try several different variations of the "reboot="
option such as "reboot=b,s0", "reboot=b,s1", etc., hoping that
perhaps linux was simply selecting the incorrect processor to preform
the reboot, however, no option that I tried corrected this issue. We
also tried several other combinations with other "reboot=" options
such as w, c, and h. Nothing has succeeded in getting this issue
For additional testing I tried the following kernels and list their
success or failure:
Redhat AS 2.1 -- 2.4.9-e.38 -- Works
Redhat 9 -- 2.4.20-31.9 -- Fails
Fedora Core 1 -- 2.4.22-1.2197.nptl -- Works
Redhat AS 3 -- 2.4.21-15.EL UP -- Works
I tested several variants of the Redhat AS kernels, all SMP version
failed, from 2.4.21-4.EL through the latest 2.4.21-15.0.3.EL,
however, all UP kernel rebooted without issues.
There are other reports of the issue that can be turned up with a
quick search on Google, some have success with "reboot=b,s" others do
not. I'm very suspcious that the people who do not have success are
people with 4 CPU's.
Please let me know what other information needs to be provided.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Boot Dell PowerEdge 6450 with for processors with any AS 3 kernel
2. Type 'reboot' at command line
Actual Results: System will hang at "System Rebooting..."
Expected Results: System should reboot
We have worked around this issue by installing Dell Server
Administrator which can detect a hung OS and use the systems embedded
service processor to power cycle the system. Interestingly it
detects this state as a hung OS and preforms the recovery. Its a
crude workaround that shouldn't be required and adds an extra five
minutes to an already long reboot process (these systems POST very
slowly) but at least it allows us to reboot the server remotely even
with this kernel bug.
-- Additional comment from email@example.com on 2004-07-12 14:23 EST --
I don't have one of these machines to work with, so I'll have to
work through you.
One question re: the Dell Server Administrator. Is it possible for
it to report the PC of each processor? If this is a kernel-specific
problem, I would first like to rule out the possibility that the
IPI sent out by the rebooting cpu is not being received by one of
the other cpus. If any of the processors for whatever reason are
sitting in a spin_lock_irq(), then they won't ever respond to the IPI,
and the rebooting system would block forever in machine_restart()
and act as you describe. If you can get the PC of each cpu, it's
possible that one of the cpus may show that it is operating in an
address range that can be identified as a spin lock text area.
If not, will you be able to run debug RHEL3 kernels that I create?
I'd like to add a bunch of printk's in the machine_restart() function
to figure out what's going on.
-- Additional comment from firstname.lastname@example.org on 2004-07-12 21:01 EST --
Unfortunately I don't think that Dell Server Admin can get at that
level of information, at least via any user accessible method that I
I guess that leaves us with the option of running a debug kernel,
which I can do, but only during limited times as the system is a
production Oracle box. That being said, we plan to upgrade the other
two system to 4 CPU's this week and I'm anticipating that after we do
that they will experience the same issue. If that turns out to be
the case I can probably move the services of one of the servers to
one of our lab servers temporarily which would free up a system to
test with. In the meantime I can schedule times to test the reboot
functionality on the existing server, but that probably means only
one good test a day.
I'm almost sure that the original beta kernels for RHEL 3 didn't have
this problem. I may see if I still have one of those lying around
just to test the reboot functionality as it might give us another
data point that is closer to the current kernel than the RH9 or FC1
kernels. Then we could run some diff to see what changed.
-- Additional comment from email@example.com on 2004-07-13 13:09 EST --
Ok -- if you want to test an earlier RHEL3 kernel version, I can
make it available for you.
-- Additional comment from firstname.lastname@example.org on 2004-07-14 15:27 EST --
This is a duplicate of bug 102504
(havent tried with the betas)
-- Additional comment from email@example.com on 2004-07-14 15:41 EST --
Thanks, Greg -- closing this as a duplicate.
*** This bug has been marked as a duplicate of 102504 ***
-- Additional comment from firstname.lastname@example.org on 2004-07-14 16:43 EST --
How do I get access to that bug? I can view it but cannot add
comments or add myself to the CC: list. It appears to be restricted
to group members.
I missed it during my search because it was files against the Beta.
-- Additional comment from email@example.com on 2004-07-14 16:55 EST --
You are already on its cc: list, so you'll receive all
subsequent input into the case.
As to the restriction, it does appear to be restricted to
Red Hat development, but since you are now on the cc: list,
you are allowed to view it. I don't personally know how
to change that behavior, but I can add your comments.
-- Additional comment from firstname.lastname@example.org on 2004-07-23 15:37 EST --
I still am unable to post comments on Bug 102504, presumably because
it is for the Beta (I get the message "You are not permitted to edit
bugs in product Red Hat Enterprise Linux Beta").
I am interested to know what steps I should take next to assist with
resolving this issue. We are upgrading two of our 6450's from 2 to 4
CPU's tonight. Currently both of these systems will reboot with the
"reboot=s,b" parameter but our 4 CPU system will not. We are
anticaipating that after the upgrade we will then have 3 systems that
fail to reboot.
Is there a debug kernel we need to try?
-- Additional comment from email@example.com on 2004-08-21 11:07 EST --
I, too, am experiencing this problem. I have several Dell 6450s with
4 processors in each that fail to recycle after outputting the
'restarting system' message. They are running RH Enterprise Linux AS
3 Update 2. Is there a solution for this problem, perhaps in bug
#102504 that I cannot at present access.
-- Additional comment from firstname.lastname@example.org on 2004-08-23 08:28 EST --
-- Additional comment from email@example.com on 2004-09-16 18:57 EST --
Has this been resolved in update 3?
I have to install a 6450 with 4 cpus at a customer location soon. If
this is still an issue, I'll just install RHEL2.1...
-- Additional comment from firstname.lastname@example.org on 2004-09-16 22:17 EST --
I don't believe the issue is resolved, it certainly doesn't seem to
be for me, on top of this I've had random lockups and multiple
servers after upgrading to the 2.4.21-20.EL kernels in U3 and am in
the process of reverting to the previous kernels.
You can easily work around the reboot issue with the Dell Server
Administrator Auto Recovery feature, but I can't argue with running
RHEL 2.1 unless you really need some of the RHEL 3 features. I ran
2.1 for quite a while on my 6450's and they were solid. Since
upgrading to RHEL 3 over nine months ago we've had nothing but
trouble with every kernel release having some bug that seems to make
it worse than the last one, I sometimes wish it was easy to go back.
-- Additional comment from email@example.com on 2004-10-21 11:34 EST --
I took a drive that had AS 3.2 installed in a Dell PE 1650 and
installed in my 6450. The 6450 would reboot with the 1650 drive. The
1650 would NOT reboot with the 6450 drive.
-- Additional comment from firstname.lastname@example.org on 2004-12-07 11:48 EST --
In addition to seeing the previously reported behavior on 6450s, I'm
also seeing this on our 1600s. All with 4 processors.
This is a "resolved duplicate"? Some one please tell Redhat's support
staff so they can tell me the fix.
-- Additional comment from email@example.com on 2004-12-07 11:58 EST --
This bugzilla was closed as a duplicate of another open bugzilla.
Unfortunately the problem at hand is not resolved.
-- Additional comment from firstname.lastname@example.org on 2005-02-09 11:54 EST --
PING: metoo. Please unclassify the tracker for this.
-- Additional comment from email@example.com on 2006-01-23 23:17 EST --
Has you guys fix this problem. I am running centos 3.6
and it is doing the same thing to me with 2 cpus and i just added 4.
-- Additional comment from firstname.lastname@example.org on 2006-02-21 14:04 EST --
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.
-- Additional comment from email@example.com on 2006-04-22 05:03 EST --
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.9.EL).
-- Additional comment from firstname.lastname@example.org on 2006-04-22 12:08 EST --
(In reply to comment #22)
> A fix for this problem has just been committed to the RHEL3 U8
> patch pool this evening (in kernel version 2.4.21-40.9.EL).
How did you fix it?
-- Additional comment from email@example.com on 2006-04-23 00:14 EST --
Created an attachment (id=128122)
fix committed to RHEL3 U8 for this bug
Hi, Greg. The attached patch is what was committed to U8. It simply
adds "black list" entries for the Dell PowerEdge 6400 and 6450 systems
that make reboots go through the BIOS (via setting "reboot_thru_bios").
-- Additional comment from firstname.lastname@example.org on 2006-04-28 17:43 EST --
Adding a couple dozen bugs to CanFix list so I can complete the stupid advisory.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed. See bug 207474 for further details.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.
(this is a mass-close to kernel bugs in NEEDINFO state)
As indicated previously there has been no update on the progress of this bug
therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue
still occurs for you and I will try to assist in its resolution. Thank you for
taking the time to report the initial bug.
If you believe that this bug was closed in error, please feel free to reopen