From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041111 Firefox/1.0 Description of problem: kernel-2.6.9-1.724_FC3 hangs at boot, after selection from grub screen. Last message displayed is the initrd message (it never gets to decompressing kernel). Previous kernels (e.g., kernel-2.6.9-1.681_FC3) boot fine. Version-Release number of selected component (if applicable): kernel-2.6.9-1.724_FC3 How reproducible: Always Steps to Reproduce: 1.yum upgrade kernel 2.reboot 3.select kernel-2.6.9-1.724_FC3 from grub menu 4.hang Actual Results: hang Expected Results: Decompressing kernel and normal boot. Additional info: Boot disk is IDE RAID-1.
Just a clarification, the boot disk is IDE software RAID-1.
do you have 'quiet' on the boot command line ? If so, can you remove it, and find out where its hanging ?
No, I removed the quiet parameter first thing. It just hangs right after the initrd messages. I ctrl-alt-del reboot, select the previous kernel and all is well, so it doesn't look like a grub issue. The two grub entries are: title Fedora Core (2.6.9-1.724_FC3) root (hd0,0) kernel /vmlinuz-2.6.9-1.724_FC3 ro root=/dev/VolGroup00/LogVol00 initrd /initrd-2.6.9-1.724_FC3.img title Fedora Core (2.6.9-1.681_FC3) root (hd0,0) kernel /vmlinuz-2.6.9-1.681_FC3 ro root=/dev/VolGroup00/LogVol00 noapic pci=usepirqmask quiet initrd /initrd-2.6.9-1.681_FC3.img I needed the "noapic pci=usepirqmask" because of bug 131404 that was causing DMA timeouts on the prior kernels. The fix is in the 2.6.9-1.724 kernel, so I don't need those parameters on that kernel (in case you were wondering). The initrd file looks reasonable: -rw-r--r-- 1 root root 1600271 Dec 11 16:59 /boot/initrd-2.6.9-1.681_FC3.img -rw-r--r-- 1 root root 1598498 Jan 3 21:00 /boot/initrd-2.6.9-1.724_FC3.img as do the kernels: -rw-r--r-- 1 root root 1732455 Nov 18 15:23 /boot/vmlinuz-2.6.9-1.681_FC3 -rw-r--r-- 1 root root 1727262 Jan 2 15:53 /boot/vmlinuz-2.6.9-1.724_FC3 This is an ASUS K8V SE Deluxe motherboard, with an AMD64 3200+ CPU (512 KB cache). The drives are RAID-1 (md0 is /boot): # cat /proc/mdstat Personalities : [raid1] md1 : active raid1 hda2[0] hdg2[1] 2097024 blocks [2/2] [UU] md2 : active raid1 hda3[0] hdg3[1] 75977920 blocks [2/2] [UU] md0 : active raid1 hda1[0] hdg1[1] 102208 blocks [2/2] [UU] unused devices: <none>
did the updates-testing kernels also hang for you ? You can still grab them (-698 and -715) from http://download.fedora.redhat.com/pub/fedora/linux/core/updates/testing/3/i386/
OK, I grabbed the -698 and -715 kernels (from the x86_64 directory). The 698 kernel boots OK, but the 715 kernel hangs the same as 724. I hope that helps narrow the changes some.
Well, if it helps any, I've eliminated patches 207, 208, 209, 1148, 1150, 1151, 1890, 1950, 1951, 1952, and 10000 as the source of the problem. Compiling the kernel without these patches doesn't help, the -724 kernel still hangs on boot. If anyone has a specific suspect patch in mind, let me know.
does the 2.6.10-1.735 kernel at http://people.redhat.com/davej/kernels/Fedora/FC3/RPMS.kernel/ work any better ?
No, same problem.
I just want to chime in. I have the same problem with my AMD64 x86 platform. (MSI K8T neo motherboard based on the VIA K8T800 chip set) I'm adding myself to the cc list so that I can follow the debug.
Created attachment 109586 [details] New 2.6.10-737 config based on 2.6.9-681 config
Created attachment 109588 [details] Diff between 2.6.9-681 config and newly created 2.6.10-737 config
OK, I have a resolution, but not a specific cause. After incrementally eliminating all the patches in -724 (going back to vanila 2.6.9) and still having the no boot problem on my AMD64, I figured it must be the kernel config file. I downloaded the new 2.6.10-737 kernel (which also wouldn't boot), and built it using the 2.6.9-681 config file (taking mostly defaults for the new configuration parameters). It boots just fine now. Attached are the 2.6.10-737 config I used, and the diff from the 2.6.9-681 config. Hopefully someone will notice a configuration option that was causing the grief.
Created attachment 109589 [details] Diff between newly created 2.6.10-737 config and Fedora original 2.6.10-737 config
Another me too: I have the same problem with 2.6.10-737 (also 2.6.10-1.1075_FC4 from devel) on my Athlon XP 2000; Gigabyte GA7VAXP mobo, VIA KT400 chipset. I also had kernel-2.6.9-1.681_FC3 working.
I isolated the problem causing the boot hang in post -681 released kernels to the CONFIG_EDD option. In 2.6.9-681's config: # CONFIG_EDD is not set In 2.6.10-737's config: CONFIG_EDD=m It looks like the function will need a blacklist for incompatible systems.
excellent, thanks for chasing this down. you should be able to boot with edd=off boot parameter. I'll add Matt Domsch to the cc of this bug, as he's the upstream maintainer of this code.
Yup, confirming edd=off allows an unmodified kernel to boot.
Just a little follow-up information... dmidecode shows that (extracted): Vendor: American Megatrends Inc. Version: 1005.006 Release Date: 11/29/2004 EDD is supported But attempting to 'modprobe edd' manually after boot returns: BIOS EDD facility v0.16 2004-Jun-25, 0 devices found EDD information not available. Not surprising that boot failed. Perhaps in a case like this, EDD can self-disable, since it's not going to be of any use. Motherboard is ASUS K8V SE Deluxe, AMD64 3200+, latest available BIOS.
Mace, can you try booting with 'edd=skipmbr' rather than 'edd=off', to help debug it a little further. =off disables EDD completely, but =skipmbr just skips reading the boot sector of each disk, but leaves the EDD BIOS calls in place. Thanks, Matt
Interesting. Yes, edd=skipmbr works too. Now a "modprobe edd" reports: BIOS EDD facility v0.16 2004-Jun-25, 3 devices found Since there's something to report now, as you request on your web page, I'll attach the output of: find /sys/firmware/edd -type f -not -name raw_data -print -exec cat \{\} \; find /sys/firmware/edd -type f -name raw_data -print -exec hexdump -C \{\} \; lspci -vv lsmod cat /proc/scsi/scsi dmidecode
Created attachment 109692 [details] EDD data collection
OK, so it's not the BIOS query code, but the MBR reading that is causing problems for you. Good to know. Some people have reported 30 second pauses while reading the MBRs of each disk. Are you sure you didn't just need to wait longer? Some things I notice from the data immediately. You've got disks on Promise adapters. Nearly all EDD failure reports so far have been with Promise (one was on an ACARD). Do you actually have 3 hard disks? One 80GB IDE disk attached to the onboard controller as the boot disk, one 80GB IDE disk attached somewhere else (probably onboard controller), and one 250GB disk attached via USB. The second 80GB disk is showing up on the wrong PCI address (0:2.0 rather than where it really is, I can't tell though from this data). Shouldn't matter for this purpose though it does show your BIOS is at least somewhat buggy. The USB controller disk is showing up at the wrong PCI address too (0:0.0 rather than the correct 00:0d.[0123] or 00:10.[01234]). If you unplug the USB-connected disk, and remove 'edd=skipmbr', does it work? That would narrow down the faulty BIOS component to the USB adapters. Can you try with the secodary IDE controller either enabled or disabled? One report of success happened when, in BIOS setup, the second controller was enabled, just nothing attached. If disabled, it hung. FWIW, the EDD code that reads the MBR uses bog-standard int13 fn02 (READ SECTORS) calls to read the first sector of each BIOS-reported disk. Most boot loaders only read the first disk using int13, so it's generally only a problem when reading disk >0. Blacklists, as has been suggested, are really difficult to implement that early on in real mode kernel startup. I'd really hate to have to write a DMI parser in real mode assembly, but would entertain a patch if someone else wanted to write such. :-)
A data point. I'm having boot problems with this kernel and my system has a promise ide raid controller. I'm not sure if it makes any sense in removing it since it does provide raid service to 4 IDE disks installed on my system. Could it be possible to disable EDD through boot line parameter? (i.e. override the compiled in EDD option?)
I let it sit for 60 seconds (clock over the PC), with no response on each boot. Since it normally boots almost instantly, that seemed long enough. Yes, I have three HD; two internal 80GB on different controllers arranged as RAID-1 (including the boot partition), and an external (USB 2.0 HiSpeed) 250GB used for archival backup and temporary storage. However, from the BIOS perspective, I think it considers the two 80GB drives and the CD/DVD player bootable. This motherboard has multiple onboard controllers http://usa.asus.com/prog/spec.asp?m=K8V%20SE%20Deluxe&langs=09 I have a PCI (non-RAID) Promise controller as well. Disabling controllers will make the CD/DVD drive and/or the CDRW drive and/or one of the RAID drives unusable, so while that might be an interesting test, it's not really feasible (the system is a low usage web server). Regardless of which component is at fault, the root problem is that the EDD code is blocking the boot. Rather than a blacklist, now that I understand the function a little better, a fall-back mechanism would seem a more practical approach. For example, in the event of a timeout (30 seconds seems excessive, even if successful), if the MBR can't be read fallback to the non-EDD boot code. The same behavior should be used for any error condition that would prevent a boot (e.g., invalid data from the BIOS). It would be very useful, if possible, to add some status messages to the process. In a normal boot they will fly by and never be seen. But when things go wrong they are invaluable. Something along the lines: "Now attempting EDD boot", followed by either a "success" or "fallback to non-EDD". If fallbacks aren't implemented, then a message like the following would be useful: "If this is the last response you see, boot with kernel parameter edd=off or edd=skipmbr" It would have saved about a week of effort hunting down the cause, in this case. I'm not entirely clear what value reading the MBR has, when GRUB has already booted and provided the boot disk (nth BIOS mapped drive). Perhaps making skipmbr the default would be a better solution?
Disabling EDD also solves my boot problems on my older 32-bit Athlon XP. I also have a secondary Promise RAID/IDE controller on my mobo, with some secondary drives attached there -- no usb drives or scsi or anything though. If it would be helpful, I would be happy to post the debug information, or otherwise play around with things. In my case I also don't think I am mistaking a slow read as a lockup -- I left it for a few minutes.
I have an older Pentium 4 machine with an MSI mainboard based on the SiS 645 chipset. I had the same problem as described here where I could not boot any FC3 update kernel since 2.6.9-681. In each case, including the latest 2.6.11 update kernel, it would hang right after grub's initrd statement. However, once I added the edd=skipmbr trick given here to my kernel boot params, all was well and happy. Thanks bugzilla!!!
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
I upgraded the kernel to 2.6.12-1.1372_FC3, and tried booting without edd=skipmbr. The boot got to the same point, but output a string of garbage (about 40 random characters), then hung again. Rebooted with edd=skipmbr and it came up fine.
I have a Dell PowerEdge 2300 which is not using hardware RAID, but DOES have two Promise controllers in it. Its boot/root device is a single SCSI HD on the builtin AIC7XXX. Any kernel I use after 2.6.9-1.681_FC3 does NOT boot (as described above) _unless_ I use "edd=skipmbr". This includes the recommended kernel-2.6.12-1.1372_FC3 . This is _not_ fixed in the 1372 kernel. However, I do have a workaround for the moment so I may finally be able to ugrade this box from FC3 to FC4. I have left it at the "hung" point for a very long time in the past month or so in the hopes that it might finally continue, but I believe I'd left it for upwards of an hour so a drive-timeout issue is NOT the case. hth.
Created attachment 117556 [details] edd-get-disk-type-before-read.patch I'll upload a patch which may help, if someone who is experiencing a boot failure unless they use "edd=skipmbr" on the kernel command line. This patch does a "Get Disk Type" call to the BIOS, before doing the "Read Sectors" call. Per Ralf Brown's Interrupt List, this may be necessary for some BIOSs. If you are able to build a kernel with this patch and report back success without using "edd=skipmbr", I'd very much like to hear. Thanks, Matt
Before I tested this patch, ASUS issued a BIOS update (1007 for the K8V SE Deluxe motherboard), that corrected this problem. They don't document that they've changed anything in this area, but after applying the BIOS update, I can now boot normally without using "edd=skipmbr", using the stock kernel-2.6.12-1.1372_FC3.
Matt, this seems to have fallen by the wayside. FC3 is going to reach end of life in a month or two, and this bug will be closed then. Might be worth trying to get that diff upstream if you still think its needed. We can always migrate this to an FC4 bug later if Peter is still around for testing.
This is a mass-update to all currently open Fedora Core 3 kernel bugs. Fedora Core 3 support has transitioned to the Fedora Legacy project. Due to the limited resources of this project, typically only updates for new security issues are released. As this bug isn't security related, it has been migrated to a Fedora Core 4 bug. Please upgrade to this newer release, and test if this bug is still present there. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. Thank you.
Unable to reproduce in FC4 with current motherboard BIOS; closing.