From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; Linux i686; U) Opera 7.51 [en]
Description of problem:
We have three Dell PowerEdge 6450 servers that were upgrded from RHEL
AS 2.1 to 3.0 several months ago. After the upgrade it was
discovered that all of these servers would hang when attempting to
reboot. After some research we discovered several reports on the web
about the same issue and the fix seemed was to add "reboot=b,s" to
the boot command line. This indeed did fix the issue for two of the
three servers, however, the third server continued to fail to reboot.
The only difference between the servers that would reboot and the
servers that wouldn't is that the one server that fails has 4 CPU's
while the others only have 2 CPU's.
I continued to try several different variations of the "reboot="
option such as "reboot=b,s0", "reboot=b,s1", etc., hoping that
perhaps linux was simply selecting the incorrect processor to preform
the reboot, however, no option that I tried corrected this issue. We
also tried several other combinations with other "reboot=" options
such as w, c, and h. Nothing has succeeded in getting this issue
For additional testing I tried the following kernels and list their
success or failure:
Redhat AS 2.1 -- 2.4.9-e.38 -- Works
Redhat 9 -- 2.4.20-31.9 -- Fails
Fedora Core 1 -- 2.4.22-1.2197.nptl -- Works
Redhat AS 3 -- 2.4.21-15.EL UP -- Works
I tested several variants of the Redhat AS kernels, all SMP version
failed, from 2.4.21-4.EL through the latest 2.4.21-15.0.3.EL,
however, all UP kernel rebooted without issues.
There are other reports of the issue that can be turned up with a
quick search on Google, some have success with "reboot=b,s" others do
not. I'm very suspcious that the people who do not have success are
people with 4 CPU's.
Please let me know what other information needs to be provided.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Boot Dell PowerEdge 6450 with for processors with any AS 3 kernel
2. Type 'reboot' at command line
Actual Results: System will hang at "System Rebooting..."
Expected Results: System should reboot
We have worked around this issue by installing Dell Server
Administrator which can detect a hung OS and use the systems embedded
service processor to power cycle the system. Interestingly it
detects this state as a hung OS and preforms the recovery. Its a
crude workaround that shouldn't be required and adds an extra five
minutes to an already long reboot process (these systems POST very
slowly) but at least it allows us to reboot the server remotely even
with this kernel bug.
I don't have one of these machines to work with, so I'll have to
work through you.
One question re: the Dell Server Administrator. Is it possible for
it to report the PC of each processor? If this is a kernel-specific
problem, I would first like to rule out the possibility that the
IPI sent out by the rebooting cpu is not being received by one of
the other cpus. If any of the processors for whatever reason are
sitting in a spin_lock_irq(), then they won't ever respond to the IPI,
and the rebooting system would block forever in machine_restart()
and act as you describe. If you can get the PC of each cpu, it's
possible that one of the cpus may show that it is operating in an
address range that can be identified as a spin lock text area.
If not, will you be able to run debug RHEL3 kernels that I create?
I'd like to add a bunch of printk's in the machine_restart() function
to figure out what's going on.
Unfortunately I don't think that Dell Server Admin can get at that
level of information, at least via any user accessible method that I
I guess that leaves us with the option of running a debug kernel,
which I can do, but only during limited times as the system is a
production Oracle box. That being said, we plan to upgrade the other
two system to 4 CPU's this week and I'm anticipating that after we do
that they will experience the same issue. If that turns out to be
the case I can probably move the services of one of the servers to
one of our lab servers temporarily which would free up a system to
test with. In the meantime I can schedule times to test the reboot
functionality on the existing server, but that probably means only
one good test a day.
I'm almost sure that the original beta kernels for RHEL 3 didn't have
this problem. I may see if I still have one of those lying around
just to test the reboot functionality as it might give us another
data point that is closer to the current kernel than the RH9 or FC1
kernels. Then we could run some diff to see what changed.
Ok -- if you want to test an earlier RHEL3 kernel version, I can
make it available for you.
This is a duplicate of bug 102504
(havent tried with the betas)
Thanks, Greg -- closing this as a duplicate.
*** This bug has been marked as a duplicate of 102504 ***
How do I get access to that bug? I can view it but cannot add
comments or add myself to the CC: list. It appears to be restricted
to group members.
I missed it during my search because it was files against the Beta.
You are already on its cc: list, so you'll receive all
subsequent input into the case.
As to the restriction, it does appear to be restricted to
Red Hat development, but since you are now on the cc: list,
you are allowed to view it. I don't personally know how
to change that behavior, but I can add your comments.
I still am unable to post comments on Bug 102504, presumably because
it is for the Beta (I get the message "You are not permitted to edit
bugs in product Red Hat Enterprise Linux Beta").
I am interested to know what steps I should take next to assist with
resolving this issue. We are upgrading two of our 6450's from 2 to 4
CPU's tonight. Currently both of these systems will reboot with the
"reboot=s,b" parameter but our 4 CPU system will not. We are
anticaipating that after the upgrade we will then have 3 systems that
fail to reboot.
Is there a debug kernel we need to try?
I, too, am experiencing this problem. I have several Dell 6450s with
4 processors in each that fail to recycle after outputting the
'restarting system' message. They are running RH Enterprise Linux AS
3 Update 2. Is there a solution for this problem, perhaps in bug
#102504 that I cannot at present access.
Has this been resolved in update 3?
I have to install a 6450 with 4 cpus at a customer location soon. If
this is still an issue, I'll just install RHEL2.1...
I don't believe the issue is resolved, it certainly doesn't seem to
be for me, on top of this I've had random lockups and multiple
servers after upgrading to the 2.4.21-20.EL kernels in U3 and am in
the process of reverting to the previous kernels.
You can easily work around the reboot issue with the Dell Server
Administrator Auto Recovery feature, but I can't argue with running
RHEL 2.1 unless you really need some of the RHEL 3 features. I ran
2.1 for quite a while on my 6450's and they were solid. Since
upgrading to RHEL 3 over nine months ago we've had nothing but
trouble with every kernel release having some bug that seems to make
it worse than the last one, I sometimes wish it was easy to go back.
I took a drive that had AS 3.2 installed in a Dell PE 1650 and
installed in my 6450. The 6450 would reboot with the 1650 drive. The
1650 would NOT reboot with the 6450 drive.
In addition to seeing the previously reported behavior on 6450s, I'm
also seeing this on our 1600s. All with 4 processors.
This is a "resolved duplicate"? Some one please tell Redhat's support
staff so they can tell me the fix.
This bugzilla was closed as a duplicate of another open bugzilla.
Unfortunately the problem at hand is not resolved.
PING: metoo. Please unclassify the tracker for this.
Has you guys fix this problem. I am running centos 3.6
and it is doing the same thing to me with 2 cpus and i just added 4.
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.9.EL).
(In reply to comment #22)
> A fix for this problem has just been committed to the RHEL3 U8
> patch pool this evening (in kernel version 2.4.21-40.9.EL).
How did you fix it?
Created attachment 128122 [details]
fix committed to RHEL3 U8 for this bug
Hi, Greg. The attached patch is what was committed to U8. It simply
adds "black list" entries for the Dell PowerEdge 6400 and 6450 systems
that make reboots go through the BIOS (via setting "reboot_thru_bios").
Adding a couple dozen bugs to CanFix list so I can complete the stupid advisory.