Red Hat Bugzilla – Bug 144050
Kernel boot hangs at boot after initrd message (before decompressing kernel)
Last modified: 2007-11-30 17:10:57 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Description of problem:
kernel-2.6.9-1.724_FC3 hangs at boot, after selection from grub
screen. Last message displayed is the initrd message (it never gets
to decompressing kernel). Previous kernels (e.g.,
kernel-2.6.9-1.681_FC3) boot fine.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1.yum upgrade kernel
3.select kernel-2.6.9-1.724_FC3 from grub menu
Actual Results: hang
Expected Results: Decompressing kernel and normal boot.
Boot disk is IDE RAID-1.
Just a clarification, the boot disk is IDE software RAID-1.
do you have 'quiet' on the boot command line ? If so, can you remove
it, and find out where its hanging ?
No, I removed the quiet parameter first thing. It just hangs right
after the initrd messages. I ctrl-alt-del reboot, select the previous
kernel and all is well, so it doesn't look like a grub issue. The two
grub entries are:
title Fedora Core (2.6.9-1.724_FC3)
kernel /vmlinuz-2.6.9-1.724_FC3 ro root=/dev/VolGroup00/LogVol00
title Fedora Core (2.6.9-1.681_FC3)
kernel /vmlinuz-2.6.9-1.681_FC3 ro
root=/dev/VolGroup00/LogVol00 noapic pci=usepirqmask quiet
I needed the "noapic pci=usepirqmask" because of bug 131404 that was
causing DMA timeouts on the prior kernels. The fix is in the
2.6.9-1.724 kernel, so I don't need those parameters on that kernel
(in case you were wondering). The initrd file looks reasonable:
-rw-r--r-- 1 root root 1600271 Dec 11 16:59
-rw-r--r-- 1 root root 1598498 Jan 3 21:00
as do the kernels:
-rw-r--r-- 1 root root 1732455 Nov 18 15:23 /boot/vmlinuz-2.6.9-1.681_FC3
-rw-r--r-- 1 root root 1727262 Jan 2 15:53 /boot/vmlinuz-2.6.9-1.724_FC3
This is an ASUS K8V SE Deluxe motherboard, with an AMD64 3200+ CPU
(512 KB cache). The drives are RAID-1 (md0 is /boot):
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hda2 hdg2
2097024 blocks [2/2] [UU]
md2 : active raid1 hda3 hdg3
75977920 blocks [2/2] [UU]
md0 : active raid1 hda1 hdg1
102208 blocks [2/2] [UU]
unused devices: <none>
did the updates-testing kernels also hang for you ?
You can still grab them (-698 and -715) from
OK, I grabbed the -698 and -715 kernels (from the x86_64 directory).
The 698 kernel boots OK, but the 715 kernel hangs the same as 724. I
hope that helps narrow the changes some.
Well, if it helps any, I've eliminated patches 207, 208, 209, 1148,
1150, 1151, 1890, 1950, 1951, 1952, and 10000 as the source of the
problem. Compiling the kernel without these patches doesn't help, the
-724 kernel still hangs on boot.
If anyone has a specific suspect patch in mind, let me know.
does the 2.6.10-1.735 kernel at
any better ?
No, same problem.
I just want to chime in. I have the same problem with my AMD64 x86
platform. (MSI K8T neo motherboard based on the VIA K8T800 chip
set) I'm adding myself to the cc list so that I can follow the
Created attachment 109586 [details]
New 2.6.10-737 config based on 2.6.9-681 config
Created attachment 109588 [details]
Diff between 2.6.9-681 config and newly created 2.6.10-737 config
OK, I have a resolution, but not a specific cause. After
incrementally eliminating all the patches in -724 (going back to
vanila 2.6.9) and still having the no boot problem on my AMD64, I
figured it must be the kernel config file.
I downloaded the new 2.6.10-737 kernel (which also wouldn't boot), and
built it using the 2.6.9-681 config file (taking mostly defaults for
the new configuration parameters). It boots just fine now.
Attached are the 2.6.10-737 config I used, and the diff from the
2.6.9-681 config. Hopefully someone will notice a configuration
option that was causing the grief.
Created attachment 109589 [details]
Diff between newly created 2.6.10-737 config and Fedora original 2.6.10-737 config
Another me too: I have the same problem with 2.6.10-737 (also
2.6.10-1.1075_FC4 from devel) on my Athlon XP 2000; Gigabyte GA7VAXP
mobo, VIA KT400 chipset. I also had kernel-2.6.9-1.681_FC3 working.
I isolated the problem causing the boot hang in post -681 released
kernels to the CONFIG_EDD option.
In 2.6.9-681's config:
# CONFIG_EDD is not set
In 2.6.10-737's config:
It looks like the function will need a blacklist for incompatible systems.
excellent, thanks for chasing this down. you should be able to boot
with edd=off boot parameter. I'll add Matt Domsch to the cc of this
bug, as he's the upstream maintainer of this code.
Yup, confirming edd=off allows an unmodified kernel to boot.
Just a little follow-up information... dmidecode shows that (extracted):
Vendor: American Megatrends Inc.
Release Date: 11/29/2004
EDD is supported
But attempting to 'modprobe edd' manually after boot returns:
BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
EDD information not available.
Not surprising that boot failed. Perhaps in a case like this, EDD can
self-disable, since it's not going to be of any use.
Motherboard is ASUS K8V SE Deluxe, AMD64 3200+, latest available BIOS.
Mace, can you try booting with 'edd=skipmbr' rather than 'edd=off',
to help debug it a little further. =off disables EDD completely, but
=skipmbr just skips reading the boot sector of each disk, but leaves
the EDD BIOS calls in place.
Interesting. Yes, edd=skipmbr works too. Now a "modprobe edd" reports:
BIOS EDD facility v0.16 2004-Jun-25, 3 devices found
Since there's something to report now, as you request on your web
page, I'll attach the output of:
find /sys/firmware/edd -type f -not -name raw_data -print -exec cat
find /sys/firmware/edd -type f -name raw_data -print -exec hexdump -C
Created attachment 109692 [details]
EDD data collection
OK, so it's not the BIOS query code, but the MBR reading that is
causing problems for you. Good to know. Some people have reported
30 second pauses while reading the MBRs of each disk. Are you sure
you didn't just need to wait longer?
Some things I notice from the data immediately. You've got disks on
Promise adapters. Nearly all EDD failure reports so far have been
with Promise (one was on an ACARD).
Do you actually have 3 hard disks? One 80GB IDE disk attached to the
onboard controller as the boot disk, one 80GB IDE disk attached
somewhere else (probably onboard controller), and one 250GB disk
attached via USB.
The second 80GB disk is showing up on the wrong PCI address (0:2.0
rather than where it really is, I can't tell though from this data).
Shouldn't matter for this purpose though it does show your BIOS is at
least somewhat buggy.
The USB controller disk is showing up at the wrong PCI address too
(0:0.0 rather than the correct 00:0d. or 00:10.).
If you unplug the USB-connected disk, and remove 'edd=skipmbr', does
it work? That would narrow down the faulty BIOS component to the USB
Can you try with the secodary IDE controller either enabled or
disabled? One report of success happened when, in BIOS setup, the
second controller was enabled, just nothing attached. If disabled,
FWIW, the EDD code that reads the MBR uses bog-standard int13 fn02
(READ SECTORS) calls to read the first sector of each BIOS-reported
disk. Most boot loaders only read the first disk using int13, so
it's generally only a problem when reading disk >0.
Blacklists, as has been suggested, are really difficult to implement
that early on in real mode kernel startup. I'd really hate to have
to write a DMI parser in real mode assembly, but would entertain a
patch if someone else wanted to write such. :-)
A data point.
I'm having boot problems with this kernel and my system has a promise
ide raid controller. I'm not sure if it makes any sense in removing it
since it does provide raid service to 4 IDE disks installed on my
system. Could it be possible to disable EDD through boot line
parameter? (i.e. override the compiled in EDD option?)
I let it sit for 60 seconds (clock over the PC), with no response on
each boot. Since it normally boots almost instantly, that seemed long
Yes, I have three HD; two internal 80GB on different controllers
arranged as RAID-1 (including the boot partition), and an external
(USB 2.0 HiSpeed) 250GB used for archival backup and temporary
storage. However, from the BIOS perspective, I think it considers the
two 80GB drives and the CD/DVD player bootable. This motherboard has
multiple onboard controllers
I have a PCI (non-RAID) Promise controller as well. Disabling
controllers will make the CD/DVD drive and/or the CDRW drive and/or
one of the RAID drives unusable, so while that might be an interesting
test, it's not really feasible (the system is a low usage web server).
Regardless of which component is at fault, the root problem is that
the EDD code is blocking the boot. Rather than a blacklist, now that
I understand the function a little better, a fall-back mechanism would
seem a more practical approach. For example, in the event of a
timeout (30 seconds seems excessive, even if successful), if the MBR
can't be read fallback to the non-EDD boot code. The same behavior
should be used for any error condition that would prevent a boot
(e.g., invalid data from the BIOS).
It would be very useful, if possible, to add some status messages to
the process. In a normal boot they will fly by and never be seen.
But when things go wrong they are invaluable. Something along the lines:
"Now attempting EDD boot", followed by either a "success" or "fallback
to non-EDD". If fallbacks aren't implemented, then a message like the
following would be useful:
"If this is the last response you see, boot with kernel parameter
edd=off or edd=skipmbr"
It would have saved about a week of effort hunting down the cause, in
I'm not entirely clear what value reading the MBR has, when GRUB has
already booted and provided the boot disk (nth BIOS mapped drive).
Perhaps making skipmbr the default would be a better solution?
Disabling EDD also solves my boot problems on my older 32-bit Athlon
XP. I also have a secondary Promise RAID/IDE controller on my mobo,
with some secondary drives attached there -- no usb drives or scsi or
anything though. If it would be helpful, I would be happy to post the
debug information, or otherwise play around with things.
In my case I also don't think I am mistaking a slow read as a lockup
-- I left it for a few minutes.
I have an older Pentium 4 machine with an MSI mainboard based on the SiS 645 chipset. I had the same
problem as described here where I could not boot any FC3 update kernel since 2.6.9-681. In each case,
including the latest 2.6.11 update kernel, it would hang right after grub's initrd statement. However, once
I added the edd=skipmbr trick given here to my kernel boot params, all was well and happy. Thanks
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which
may contain a fix for your problem. Please update to this new kernel, and
report whether or not it fixes your problem.
If you have updated to Fedora Core 4 since this bug was opened, and the problem
still occurs with the latest updates for that release, please change the version
field of this bug to 'fc4'.
I upgraded the kernel to 2.6.12-1.1372_FC3, and tried booting without
edd=skipmbr. The boot got to the same point, but output a string of garbage
(about 40 random characters), then hung again. Rebooted with edd=skipmbr and it
came up fine.
I have a Dell PowerEdge 2300 which is not using hardware RAID, but DOES have two
Promise controllers in it. Its boot/root device is a single SCSI HD on the
builtin AIC7XXX. Any kernel I use after 2.6.9-1.681_FC3 does NOT boot (as
described above) _unless_ I use "edd=skipmbr". This includes the recommended
kernel-2.6.12-1.1372_FC3 . This is _not_ fixed in the 1372 kernel. However, I
do have a workaround for the moment so I may finally be able to ugrade this box
from FC3 to FC4. I have left it at the "hung" point for a very long time in the
past month or so in the hopes that it might finally continue, but I believe I'd
left it for upwards of an hour so a drive-timeout issue is NOT the case. hth.
Created attachment 117556 [details]
I'll upload a patch which may help, if someone who is experiencing a boot
failure unless they use "edd=skipmbr" on the kernel command line. This patch
does a "Get Disk Type" call to the BIOS, before doing the "Read Sectors" call.
Per Ralf Brown's Interrupt List, this may be necessary for some BIOSs.
If you are able to build a kernel with this patch and report back success
without using "edd=skipmbr", I'd very much like to hear.
Before I tested this patch, ASUS issued a BIOS update (1007 for the K8V SE
Deluxe motherboard), that corrected this problem. They don't document that
they've changed anything in this area, but after applying the BIOS update, I can
now boot normally without using "edd=skipmbr", using the stock
Matt, this seems to have fallen by the wayside. FC3 is going to reach end of
life in a month or two, and this bug will be closed then. Might be worth trying
to get that diff upstream if you still think its needed.
We can always migrate this to an FC4 bug later if Peter is still around for testing.
This is a mass-update to all currently open Fedora Core 3 kernel bugs.
Fedora Core 3 support has transitioned to the Fedora Legacy project.
Due to the limited resources of this project, typically only
updates for new security issues are released.
As this bug isn't security related, it has been migrated to a
Fedora Core 4 bug. Please upgrade to this newer release, and
test if this bug is still present there.
This bug has been placed in NEEDINFO_REPORTER state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
Unable to reproduce in FC4 with current motherboard BIOS; closing.