This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 241338 - ide0=noprobe kills the kernel
ide0=noprobe kills the kernel
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Michal Schmidt
Martin Jenner
:
Depends On:
Blocks: 251292 RHEL5u2_relnotes 425461
  Show dependency treegraph
 
Reported: 2007-05-25 05:31 EDT by Gerd Hoffmann
Modified: 2008-05-21 10:43 EDT (History)
4 users (show)

See Also:
Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 10:43:19 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
boot log (3.49 KB, text/plain)
2007-05-25 05:31 EDT, Gerd Hoffmann
no flags Details
band aid fix (594 bytes, patch)
2007-07-13 11:03 EDT, Gerd Hoffmann
no flags Details | Diff
different approach to fix it ... (617 bytes, patch)
2007-08-22 10:30 EDT, Gerd Hoffmann
no flags Details | Diff
another fix, more like upstream (2.17 KB, patch)
2007-08-28 06:27 EDT, Michal Schmidt
no flags Details | Diff

  None (edit)
Description Gerd Hoffmann 2007-05-25 05:31:54 EDT
Description of problem:
ide0=noprobe kills the kernel

Version-Release number of selected component (if applicable):
2.6.18-8.1.4.el5

How reproducible:
boot the kernel with ide0=noprobe on the command line and see
it crash quite early.  Needs earlyprintk to actually see something ;)

Actual results:
kernel gets a GPF and panics.

Expected results:
kernel boots up with ide0 disabled.

Additional info:
This happened within a Xen HVM machine, while trying to get the
ide driver out of the way, so we can use paravirtual drivers
after booting.
Comment 1 Gerd Hoffmann 2007-05-25 05:31:54 EDT
Created attachment 155438 [details]
boot log
Comment 2 Gerd Hoffmann 2007-07-03 10:16:56 EDT
other machine, different versions (some upcoming rhel-5.1 xen bits).

Guest boots up fine, up to the point where it would mount the root filesystem. 
Close to the place where the other kernel crashed it complains about interrupts
being enabled though:

[ ... ]
SMP: Allowing 1 CPUs, 0 hotplug CPUs
Built 1 zonelists.  Total pages: 127971
Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=tty1
console=ttyS0,115200 ide0=noprobe ide1=noprobe
ide_setup: ide0=noprobe
ide_setup: ide1=noprobe
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 16384 bytes)
start_kernel(): bug: interrupts were enabled early
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Console: colour VGA+ 80x25
[ ... ]

So this looks like a lucky game to me.  When an interrupt comes in early it
takes down the machine, if not it boots up fine ...
Comment 4 Gerd Hoffmann 2007-07-13 11:03:12 EDT
Created attachment 159196 [details]
band aid fix

pci_find_device() enables irqs as side effect, probably due to device list
locking using a rwsem.	So avoid calling it.  Patch cripples pre-PCI ide
controllers.

call chain:

ide_setup()
  init_ide_data()
    init_hwif_default()
      ide_default_io_base()
	pci_find_device()

So any ide=foo on the kernel command line triggers this.
Comment 5 Alan Cox 2007-07-23 13:19:31 EDT
NAK

This is a revert for older systems.

Fix pci_find_device not to enable IRQs by mistake
Comment 6 RHEL Product and Program Management 2007-07-31 09:45:47 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Gerd Hoffmann 2007-08-22 10:30:07 EDT
Created attachment 162064 [details]
different approach to fix it ...

Don't bother taking the rwsem (and enable irqs as side effect) to walk the list
if the list is empty anyway in pci_find_device().
Comment 8 Alan Cox 2007-08-22 11:12:28 EDT
Bit of a hack but solves the problem and with minimal risk - ok by me
Comment 9 Gerd Hoffmann 2007-08-23 03:52:26 EDT
dduile asked for details why irqs get enabled for the patch comment.

I think it is in lib/rwsem-spinlock.c, function __down_read(), spin_unlock_irq()
call.  Probably happens on x86_64 only because i386 doesn't use the generic,
spinlock-based rw semaphores.  Which likely also is the reason it went unnoticed
so far because on modern, 64bit capable hardware you'll rarely have a need to
specify ide=something on the kernel command line ...
Comment 10 Michal Schmidt 2007-08-28 06:27:02 EDT
Created attachment 175821 [details]
another fix, more like upstream

Backport of the upstream fix, introduces no_pci_devices() function.

Upstream fixed in in git commit ed4aaadb1a7913f509f05d3e67840541a180713f ('fix
jvc cdrom drive lockup'). It introduced a new exported function
no_pci_devices().

I made a scratch build with this patch included:
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=938718

Gerd, would you test if it fixes your problem?
Comment 11 Gerd Hoffmann 2007-08-28 07:03:31 EDT
Works fine for me.
Comment 13 Don Zickus 2007-12-14 13:41:44 EST
in 2.6.18-60.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 15 Don Domingo 2008-02-07 00:13:16 EST
added to RHEL5.2 release notes under "Kernel-Related Updates":

<quote>
 The kernel parameter ide0=noprobe no longer causes a kernel panic. This was
fixed through the introduction of a new function, no_pci_devices().


</quote>

please advise if any further revisions are required. thanks!
Comment 16 Michal Schmidt 2008-02-07 05:26:45 EST
no_pci_devices() is an implementation detail. This should be enough:

<quote>
The kernel parameter ide0=noprobe no longer causes a kernel panic.
</quote>
Comment 17 Don Domingo 2008-02-07 18:02:59 EST
thanks Michal, revising as requested. 
Comment 18 Mike Gahagan 2008-03-18 16:58:55 EDT
Confirmed the bugfix is in the -85.el5 kernel. I wasn't able to reproduce the
problem with the -53 kernel on any of the xen guests I tried.
Comment 19 Don Domingo 2008-04-01 22:10:03 EDT
Hi,
the RHEL5.2 release notes will be dropped to translation on April 15, 2008, at
which point no further additions or revisions will be entertained.

a mockup of the RHEL5.2 release notes can be viewed at the following link:
http://intranet.corp.redhat.com/ic/intranet/RHEL5u2relnotesmockup.html

please use the aforementioned link to verify if your bugzilla is already in the
release notes (if it needs to be). each item in the release notes contains a
link to its original bug; as such, you can search through the release notes by
bug number.

Cheers,
Don
Comment 21 errata-xmlrpc 2008-05-21 10:43:19 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.