Bug 127400 - Grub hangs during serial console boot
Grub hangs during serial console boot
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: grub (Show other bugs)
3.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Peter Jones
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-07-07 14:39 EDT by Trevin Beattie
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-03-30 15:41:57 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Trevin Beattie 2004-07-07 14:39:53 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20031210

Description of problem:
I've encountered a strange problem that is mostly a minor annoyance. 
I have a PowerEdge 1750 server that is primarily accessed remotely, so
the BIOS is configured to use the serial port as a console.  RedHat
WS3 Update 2 has been installed via NFS, again using the serial port
as the primary console during setup.

When the machine is booted, Grub writes out "GRUB Loading stage2..."
followed by "Press any key to continue." on both the serial port and
the VGA console.  Pressing a key brings up the Grub boot menu, from
which point you can continue loading.  If no key is pressed within a
short period of time, the prompt is repeated.

A few times when I wasn't paying attention, I ended up with about 7 of
these prompts, then some blank lines.  When I try pressing a key at
that point, nothing happens.  It doesn't work from either the serial
port or the VGA console.

Usually I would give up at that point, walk down the hall to the
server room, and power-cycle the box.  This last time, I just left the
machine sitting while looking for info on this problem.  After several
minutes, it suddenly decided to start booting.


Version-Release number of selected component (if applicable):
grub-0.93-4

How reproducible:
Sometimes

Steps to Reproduce:
1. Install RHEL using the serial port as the primary console.
2. Reboot, and wait for "GRUB Loading stage2..."
3. Wait for a bunch of "Press any key to continue." messages, followed
by blank lines and a pause.
4. Now try to press any key.

    

Actual Results:  Nothing happens.  Loading does not continue; at least
not for a few minutes.


Expected Results:  Should have brought up the Grub boot menu.

Actually, I would much rather prefer that stage2 just time out and
automatically boot the default kernel if no key is pressed.


Additional info:

Dell PowerEdge 1750, dual Broadcom BCM5704 NetXtreme ethernet
controllers.  RHEL WS3 Update 2.

/boot/grub/grub.conf contains the following extra parameters:

serial --unit=0 --speed=9600
terminal --timeout=10 serial console

and the kernel command line includes "console=ttyS0,9600" at the end.
Comment 1 Trevin Beattie 2004-07-19 13:01:11 EDT
I recently tried this with RHEL WS3 Update 1.  The problem exists
there as well.  It's fairly consistent, but I haven't determined
exactly how long a wait is required before grub stops responding.
Comment 2 Brian Crumrine 2004-08-06 02:52:47 EDT
We just setup RHEL Update 2 on a new 1750 and experienced the same 
problem until we had the idea to match the bit rate between the Dell 
console redirection and the grub configuration (and everything else) -
 we evidently experienced some kind of sync up problem or dual 
console thing going on.  

Once everything was using the same bit rate, we didn't have to touch 
a thing and boot happened perfectly and consoles worked as they 
should (there was a little glitch with kudzu not accepting keyboard 
input).  Of course, you may have set the bit rate in the Dell BIOS - 
we just went with the default there.

So our grub configuration looks like this:
serial --unit=0 --speed=115200
terminal --timeout=10 serial console

with the kernel line addition being:
console=ttyS0,115200

One thing you didn't mention, which is also required, because just 
those two changes won't direct the login tty to the serial port, is a 
change to the inittab file - adding something like:
0:12345:respawn:/sbin/agetty ttyS0 115200

We also added ttyS0 to the /etc/securetty file to allow root to login 
to the serial console - not required, but our machine is in a locked 
cabinet.

Hope this helps.
Brian
Comment 3 Trevin Beattie 2004-08-09 19:00:59 EDT
There are a couple of problems with that suggestion:

1. I don't see any option in the Dell 1750 BIOS to change the baud
rate.  Since we are getting valid characters at 9600 baud, I would
assume that is the rate at which the BIOS is set.

2. The hang occurs before loading the kernel, so whatever we have in
inittab (which, BTW, is "co:2345:respawn:/sbin/agetty ttyS0 9600
vt100") is irrelevant at that point.
Comment 4 Karl Burkett 2004-10-06 10:40:50 EDT
Eureka:  I too was having the same problem.  My grub.conf
configuration (to keep things simple) is much like Trevin Beattie's. 
Brian's comments, though correct as far as they go, do not
have anything to do with the problem. He's just matching the baud
rates at given times in the boot process, so I do agree with Trevin's
last comment about the baud rates not being a part of the problem.I'd
suggest, if I may, that the serial terminal connected to the serial
port may be configured to listen at 115200 and hence this is why it
works for Brian.

So, I went back to first principles and examined the BIOS settings for
console redirection:  On the 1750, there are three settings:
  First to enable console redirection to serial port 1.
  Second to pick the terminal type.
  Third (and most important), Redirect after boot should be
"Disabled".  I did have it enabled and was having the described problems.

Why did this work? I suspect that there was a control argument going
on in the hardware. First, boot, in this instance, refers to the end
of the BIOS bootup, where at that time, grub trys to take command of
the serial resource, but the BIOS won't let it take control, so there
initiates a long argument over who is going to control the serial
resource.  This lasts untill some timeout happens and the system
continues to boot as expected.  

Hope this help.

From:
burkett@rice.edu
Comment 5 Brian Crumrine 2004-10-06 11:14:44 EDT
We did have what sounds like the same problem as originally described
by Trevin until we matched the bit rates.  Since the kernel option was
different than the Dell BIOS setting, we would not see the kernel
boot, and kudzu, etc. because of the mismatch, so it looked like it
would lock up for a while (while it was booting and trying to
configure a new-found serial console).  

We moved everything to 115200 early on in our setup and only used 9600
briefly, so that could have been it.  I don't know if we ever touched
the redirect after boot option - I suspect it was left at the default
which was probably Disabled.

We are setting up another 1750 in a couple days, I will be able to try
a 9600 boot and some of the Dell BIOS options and see if we get
anything different.
Comment 6 Trevin Beattie 2004-10-06 12:27:32 EDT
Confirmed.  Turning off "redirect after boot" in the BIOS solved the
conflict.  Grub no longer hangs on our systems now.
Comment 7 Hugh Sutton-Gee 2005-03-09 16:34:25 EST
Also confirmed.
This was happening on our Sun v20z's. Grub would just hang at:
"stage2 ..."

Turning off the console redirect setting in the bios solved the problem.
Comment 8 Peter Jones 2005-03-30 15:41:57 EST
Thanks for working this out, and providing the solution.
Comment 9 kloczek 2006-02-01 10:34:03 EST
(In reply to comment #8)
> Thanks for working this out, and providing the solution.

But disable serial console redirection on SP on v20z ins't solution :>
Disabling redirection dissalow remote control grub boot process.

I also have the same problems Hugh Sutton-Gee on v20z but on grub from Fedora
devel (grub-0.97-2).

Note You need to log in before you can comment on or make changes to this bug.