Bug 241338
Summary: | ide0=noprobe kills the kernel | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Gerd Hoffmann <kraxel> | ||||||||||
Component: | kernel | Assignee: | Michal Schmidt <mschmidt> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 5.0 | CC: | alan, ddomingo, ddutile, xen-maint | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | RHBA-2008-0314 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2008-05-21 14:43:19 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 251292, 391221, 425461 | ||||||||||||
Attachments: |
|
Description
Gerd Hoffmann
2007-05-25 09:31:54 UTC
Created attachment 155438 [details]
boot log
other machine, different versions (some upcoming rhel-5.1 xen bits). Guest boots up fine, up to the point where it would mount the root filesystem. Close to the place where the other kernel crashed it complains about interrupts being enabled though: [ ... ] SMP: Allowing 1 CPUs, 0 hotplug CPUs Built 1 zonelists. Total pages: 127971 Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=tty1 console=ttyS0,115200 ide0=noprobe ide1=noprobe ide_setup: ide0=noprobe ide_setup: ide1=noprobe Initializing CPU#0 PID hash table entries: 2048 (order: 11, 16384 bytes) start_kernel(): bug: interrupts were enabled early ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Console: colour VGA+ 80x25 [ ... ] So this looks like a lucky game to me. When an interrupt comes in early it takes down the machine, if not it boots up fine ... Created attachment 159196 [details]
band aid fix
pci_find_device() enables irqs as side effect, probably due to device list
locking using a rwsem. So avoid calling it. Patch cripples pre-PCI ide
controllers.
call chain:
ide_setup()
init_ide_data()
init_hwif_default()
ide_default_io_base()
pci_find_device()
So any ide=foo on the kernel command line triggers this.
NAK This is a revert for older systems. Fix pci_find_device not to enable IRQs by mistake This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Created attachment 162064 [details]
different approach to fix it ...
Don't bother taking the rwsem (and enable irqs as side effect) to walk the list
if the list is empty anyway in pci_find_device().
Bit of a hack but solves the problem and with minimal risk - ok by me dduile asked for details why irqs get enabled for the patch comment. I think it is in lib/rwsem-spinlock.c, function __down_read(), spin_unlock_irq() call. Probably happens on x86_64 only because i386 doesn't use the generic, spinlock-based rw semaphores. Which likely also is the reason it went unnoticed so far because on modern, 64bit capable hardware you'll rarely have a need to specify ide=something on the kernel command line ... Created attachment 175821 [details] another fix, more like upstream Backport of the upstream fix, introduces no_pci_devices() function. Upstream fixed in in git commit ed4aaadb1a7913f509f05d3e67840541a180713f ('fix jvc cdrom drive lockup'). It introduced a new exported function no_pci_devices(). I made a scratch build with this patch included: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=938718 Gerd, would you test if it fixes your problem? Works fine for me. in 2.6.18-60.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 added to RHEL5.2 release notes under "Kernel-Related Updates": <quote> The kernel parameter ide0=noprobe no longer causes a kernel panic. This was fixed through the introduction of a new function, no_pci_devices(). </quote> please advise if any further revisions are required. thanks! no_pci_devices() is an implementation detail. This should be enough: <quote> The kernel parameter ide0=noprobe no longer causes a kernel panic. </quote> thanks Michal, revising as requested. Confirmed the bugfix is in the -85.el5 kernel. I wasn't able to reproduce the problem with the -53 kernel on any of the xen guests I tried. Hi, the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at which point no further additions or revisions will be entertained. a mockup of the RHEL5.2 release notes can be viewed at the following link: http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html please use the aforementioned link to verify if your bugzilla is already in the release notes (if it needs to be). each item in the release notes contains a link to its original bug; as such, you can search through the release notes by bug number. Cheers, Don An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html |