Bug 241338

Summary: ide0=noprobe kills the kernel
Product: Red Hat Enterprise Linux 5 Reporter: Gerd Hoffmann <kraxel>
Component: kernelAssignee: Michal Schmidt <mschmidt>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: alan, ddomingo, ddutile, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 14:43:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 251292, 391221, 425461    
Attachments:
Description Flags
boot log
none
band aid fix
none
different approach to fix it ...
none
another fix, more like upstream none

Description Gerd Hoffmann 2007-05-25 09:31:54 UTC
Description of problem:
ide0=noprobe kills the kernel

Version-Release number of selected component (if applicable):
2.6.18-8.1.4.el5

How reproducible:
boot the kernel with ide0=noprobe on the command line and see
it crash quite early.  Needs earlyprintk to actually see something ;)

Actual results:
kernel gets a GPF and panics.

Expected results:
kernel boots up with ide0 disabled.

Additional info:
This happened within a Xen HVM machine, while trying to get the
ide driver out of the way, so we can use paravirtual drivers
after booting.

Comment 1 Gerd Hoffmann 2007-05-25 09:31:54 UTC
Created attachment 155438 [details]
boot log

Comment 2 Gerd Hoffmann 2007-07-03 14:16:56 UTC
other machine, different versions (some upcoming rhel-5.1 xen bits).

Guest boots up fine, up to the point where it would mount the root filesystem. 
Close to the place where the other kernel crashed it complains about interrupts
being enabled though:

[ ... ]
SMP: Allowing 1 CPUs, 0 hotplug CPUs
Built 1 zonelists.  Total pages: 127971
Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=tty1
console=ttyS0,115200 ide0=noprobe ide1=noprobe
ide_setup: ide0=noprobe
ide_setup: ide1=noprobe
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 16384 bytes)
start_kernel(): bug: interrupts were enabled early
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Console: colour VGA+ 80x25
[ ... ]

So this looks like a lucky game to me.  When an interrupt comes in early it
takes down the machine, if not it boots up fine ...


Comment 4 Gerd Hoffmann 2007-07-13 15:03:12 UTC
Created attachment 159196 [details]
band aid fix

pci_find_device() enables irqs as side effect, probably due to device list
locking using a rwsem.	So avoid calling it.  Patch cripples pre-PCI ide
controllers.

call chain:

ide_setup()
  init_ide_data()
    init_hwif_default()
      ide_default_io_base()
	pci_find_device()

So any ide=foo on the kernel command line triggers this.

Comment 5 Alan Cox 2007-07-23 17:19:31 UTC
NAK

This is a revert for older systems.

Fix pci_find_device not to enable IRQs by mistake


Comment 6 RHEL Program Management 2007-07-31 13:45:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Gerd Hoffmann 2007-08-22 14:30:07 UTC
Created attachment 162064 [details]
different approach to fix it ...

Don't bother taking the rwsem (and enable irqs as side effect) to walk the list
if the list is empty anyway in pci_find_device().

Comment 8 Alan Cox 2007-08-22 15:12:28 UTC
Bit of a hack but solves the problem and with minimal risk - ok by me


Comment 9 Gerd Hoffmann 2007-08-23 07:52:26 UTC
dduile asked for details why irqs get enabled for the patch comment.

I think it is in lib/rwsem-spinlock.c, function __down_read(), spin_unlock_irq()
call.  Probably happens on x86_64 only because i386 doesn't use the generic,
spinlock-based rw semaphores.  Which likely also is the reason it went unnoticed
so far because on modern, 64bit capable hardware you'll rarely have a need to
specify ide=something on the kernel command line ...


Comment 10 Michal Schmidt 2007-08-28 10:27:02 UTC
Created attachment 175821 [details]
another fix, more like upstream

Backport of the upstream fix, introduces no_pci_devices() function.

Upstream fixed in in git commit ed4aaadb1a7913f509f05d3e67840541a180713f ('fix
jvc cdrom drive lockup'). It introduced a new exported function
no_pci_devices().

I made a scratch build with this patch included:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=938718

Gerd, would you test if it fixes your problem?

Comment 11 Gerd Hoffmann 2007-08-28 11:03:31 UTC
Works fine for me.

Comment 13 Don Zickus 2007-12-14 18:41:44 UTC
in 2.6.18-60.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 15 Don Domingo 2008-02-07 05:13:16 UTC
added to RHEL5.2 release notes under "Kernel-Related Updates":

<quote>
 The kernel parameter ide0=noprobe no longer causes a kernel panic. This was
fixed through the introduction of a new function, no_pci_devices().


</quote>

please advise if any further revisions are required. thanks!

Comment 16 Michal Schmidt 2008-02-07 10:26:45 UTC
no_pci_devices() is an implementation detail. This should be enough:

<quote>
The kernel parameter ide0=noprobe no longer causes a kernel panic.
</quote>


Comment 17 Don Domingo 2008-02-07 23:02:59 UTC
thanks Michal, revising as requested. 

Comment 18 Mike Gahagan 2008-03-18 20:58:55 UTC
Confirmed the bugfix is in the -85.el5 kernel. I wasn't able to reproduce the
problem with the -53 kernel on any of the xen guests I tried.


Comment 19 Don Domingo 2008-04-02 02:10:03 UTC
Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don

Comment 21 errata-xmlrpc 2008-05-21 14:43:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html