Bug 160858 - [Stratus RHEL 4.5 bug] kernel oops on SCSI HBA hot-unplug
[Stratus RHEL 4.5 bug] kernel oops on SCSI HBA hot-unplug
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Kimball Murray
Brian Brock
: 160861 160862 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2005-06-17 16:51 EDT by Dan Duval
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-11-10 12:19:57 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Dan Duval 2005-06-17 16:51:00 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
While working on hardening the SCSI stack against surprise HBA
removal, we've come across an issue you might be interested in.

This issue is relevant any time a SCSI HBA is hot-unplugged while
I/O is going on to any of that HBA's devices.  Since more and more
platforms (including our own) are supporting PCI hotplug, we felt
that we should bring this problem to your attention as early as

In vanilla linux kernel kernel, the routine scsi_remove_host()
(in drivers/scsi/hosts.c) begins as follows:

        scsi_host_cancel(shost, 0);

In the patch for kernel 2.6.9, Mike Anderson changed this to the

        scsi_host_cancel(shost, 0);

The problem is that at least some of the functionality of
scsi_host_cancel() *must* come before at least some of the
functionality of scsi_forget_host().  The reason is the
following call chain:

    scsi_forget_host() ->

        /* Loop over host's associated devices... */
        scsi_remove_device() ->

            device_del() ->


This removes the SCSI device's sdev_gendev from the list rooted
at the HBA's host_gendev.children.  The result is that the HBA's
list of children ends up empty.

Later, we see the following call chain:

    scsi_host_cancel() ->

        device_for_each_child(..., scsi_device_cancel_cb)

The intent is to loop over the HBA's children and cancel
all outstanding I/O requests for each child (== SCSI device).
The list of children is, however, now empty, so no call to
scsi_device_cancel_cb() is ever made.

The practical effect of this is that the requests never get
canceled, so their timers are still active.  Thirty seconds
later, some request times out and gets queued to the error-
handling thread and that thread is awakened.  By this time,
however, we have torn down nearly all of the data structures
associated with the HBA.  In our testing, the error handler
ended up trying to requeue the request to the block layer,
whose elevator data structure had been freed.  Instant OOPS.

We reproduced this under RHEL AS4 by actually hot-unplugging
a QLogic qla1280 HBA while doing I/O to it.  A read-through of
the code suggests that, if hardware that allows hot-unplug is
not available, the problem can also be triggered by rmmod'ing
a LLD while I/O is going to any of its disks.

In kernel 2.6.10, the problem has been eliminated by modifying
scsi_host_cancel() so that it doesn't use the "children" list
to iterate over the devices, but rather uses the list rooted
at shost->__devices and linked via sdev->siblings.  This list is,
of course, not modified by device_del().

This problem may be related to:


but I don't believe they are duplicates.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Initiate some I/O to a SCSI device (via, e.g., "dd").

2. Hot-unplug the SCSI HBA or, failing that, rmmod the
   lower-level driver's kernel module.

Actual Results:  A kernel oops resulted.

Expected Results:  The system should degrade gracefully.  Outstanding I/O
request should be failed back to the application(s).

Additional info:
Comment 1 Dan Duval 2005-06-20 10:13:35 EDT
*** Bug 160861 has been marked as a duplicate of this bug. ***
Comment 2 Dan Duval 2005-06-20 10:14:37 EDT
*** Bug 160862 has been marked as a duplicate of this bug. ***
Comment 3 Rik van Riel 2005-06-27 17:04:36 EDT
Kimball, do you happen to have a patch for this problem ? ;)
Comment 4 Kimball Murray 2005-06-28 10:08:41 EDT
Rick, I spoke with Dan a moment ago.  He says that this has been fixed upstream
in a different way than his original patch.  He will try and get a new patch to
us soon.  Once I have that I'll adjust/check it against the current U2 snapshot
Comment 5 Andrius Benokraitis 2005-06-29 11:58:03 EDT
Kimball, should we reassign this bug to you in the meantime?
Comment 6 Kimball Murray 2005-06-29 13:10:59 EDT
Yes, you can assign it to me.
Comment 7 Dan Duval 2005-06-30 10:37:46 EDT
After further investigation of this problem, we have found that the
mechanism by which it arises is a little different from what we've
seen on the "vanilla" 2.6.9 kernel, as was described in the original
bug report.  Nonetheless, the result is the same: I/O requests that
are outstanding in the LLDD do not get cancelled when the HBA goes

The following is an amended description of what's going on, followed
by a suggested fix.

As of the AS4 U1 kernel, here is the call chain that provokes the failure:

    scsi_remove_host() ->

        scsi_forget_host() ->

	    /* Loop over the host's devices... */
scsi_remove_device() has the effect of setting the device's state
(sdev->sdev_state) to SDEV_CANCEL, then to SDEV_DEL, and the devices
are all in the latter state when scsi_forget_host() returns.
scsi_remove_host() next calls scsi_host_cancel() to cancel the
outstanding I/O requests, leading to the following sequence:

    scsi_host_cancel() ->

	/* This macro, defined in include/scsi/scsi_device.h,
	 * is supposed to loop over the host's devices.
	shost_for_each_device() ->


scsi_for_each_device() repeatedly invokes a helper function, called
__scsi_iterate_devices(), to do its work.  On each invocation,
__scsi_iterate_devices() finds the "next" device in the host's
list, and does a


on that device to get a reference to it.  Unfortunately,
scsi_device_get() will not take out a reference on a device that
is in the SDEV_DEL or SDEV_CANCEL state.  So, scsi_device_get()
returns an error code to __scsi_iterate_devices(), which causes
that routine to skip that device.  The net result is that
scsi_device_cancel() never gets called, and I/O requests
outstanding to that device are not cancelled, but remain active.

The suggested fix is simply to reverse the order of the calls
in scsi_remove_host(), so that now it reads as:

    scsi_host_cancel(shost, 0);

(This is at about line 76 in drivers/scsi/hosts.c)

This change just allows the cancellations to proceed before
scsi_forget_host() can meddle with the devices' states.

We have tested this fix in our lab, with good results.  Recent
discussions on the linux-scsi mailing list indicate that this
is the approach others have adopted as well.
Comment 8 Kimball Murray 2005-07-01 09:33:31 EDT
I did a quick repro just to capture a stack trace from the Oops.  It behaved as
described.  About 30 seconds after the HBA is removed, SCSI error handling wakes
up and steps into a pit.  Here is the output:

scsi(1): Resetting Cmnd=0xd51e5500, Handle=0x00000003, action=0x0
Unable to handle kernel paging request at virtual address e081800a
 printing eip:                                            
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: ftmod(U) fosil(U) ipmi_msghandler(U) autofs4(U) 
i2c_dev(U) i2
c_core(U) md5(U) ipv6(U) sunrpc(U) dm_mod(U) button(U) battery(U) ac(U) 
) ohci_hcd(U) e1000(U) e100(U) mii(U) ext3(U) jbd(U) raid1(U) qla1280(U) 
U) scsi_mod(U)
CPU:    0
EIP:    0060:[<e0825abf>]    Tainted: PF     VLI
EFLAGS: 00010086   (2.6.9-11.21ELstrat) 
EIP is at qla1280_debounce_register+0x3/0x34 [qla1280]
eax: e081800a   ebx: dfa92000   ecx: e081800a   edx: d51e5628
esi: dfa92258   edi: df849f44   ebp: df849fa4   esp: df849ef4
ds: 007b   es: 007b   ss: 0068
Process scsi_eh_1 (pid: 216, threadinfo=df849000 task=c1728e30)
Stack: d51e5628 e082343c df849f20 00000003 d51e5628 c0320a80 00000000 
       c1407d60 ded02c80 c0320bf0 df849f7c c02cd938 c17293b0 df821eb4 
       00000082 df821eb4 c011cd1e 00000000 00000000 00000001 dead4ead 
Call Trace:
 [<e082343c>] qla1280_error_action+0xba/0x3b5 [qla1280]
 [<c02cd938>] schedule+0x844/0x87a
 [<c011cd1e>] try_to_wake_up+0x225/0x230
 [<e083d0c1>] scsi_try_to_abort_cmd+0x3f/0x58 [scsi_mod]
 [<e083d1f3>] scsi_eh_abort_cmds+0x52/0xbc [scsi_mod]
 [<e083ddc6>] scsi_unjam_host+0x147/0x16b [scsi_mod]
 [<e083defc>] scsi_error_handler+0x112/0x15a [scsi_mod]
 [<e083ddea>] scsi_error_handler+0x0/0x15a [scsi_mod]
 [<c01041f1>] kernel_thread_helper+0x5/0xb
Code: b6 83 88 08 00 00 24 ef 88 83 88 08 00 00 eb 12 68 6d 6a 82 e0 e8 d4 
c4 8f
 df 89 d8 e8 6d f4 ff ff 58 5b 89 f8 5e 5f c3 52 89 c1 <0f> b7 01 0f b7 c0 
66 89
 44 24 02 0f b7 01 0f b7 c0 66 89 04 24 
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception

After reversing the order of the two calls in host.c as described above, I was
able to remove the HBA cleanly every time.
Comment 9 Andrius Benokraitis 2005-07-06 16:19:46 EDT
Kimball, has this been submitted for upstream acceptance? What are the next
steps? I have this down for inclusion into RHEL4U3, so there is no immediate
urgency on this.
Comment 10 Kimball Murray 2005-07-06 16:41:59 EDT
Dan Duval last told me that for U2 we have a work-around for this.  For U3, we
need to propose something better than the suggested patch in this bug report. 
Several RH engineers pointed out that it would break some things, which is why
it hasn't been done upstream.
Comment 11 Andrius Benokraitis 2005-07-06 17:03:51 EDT
Great, thanks for the update... will be opening an IT to help track this through
U3 in the meantime...
Comment 12 Andrius Benokraitis 2005-09-13 14:20:18 EDT
Kimball, any updates on this? I need to know ASAP whether there indeed is a fix
that works and is upstream before proposing it for RHEL4U3. If not, we need to
propose it for U4 or later. Comments?
Comment 13 Linda Wang 2005-10-12 13:12:39 EDT
Per Kimball's email reply: "... bug 160858 will not be fixed in the U3 time
frame. It is still under discussion upstream, and it will involve a substantial
architectural change to SCSI to fix it. So, in the meantime, Stratus will
continue  provide its own driver that wil work around this bug, as we have in U2."
Comment 14 Andrius Benokraitis 2005-10-12 14:10:08 EDT
Deferring to RHEL4 U4.
Comment 17 Andrius Benokraitis 2006-06-30 00:50:34 EDT
Deferring to RHEL 4.5. 

Action for Stratus: Kimball/Dan, what's the status on this? I have not
officially requested this for 4.5 yet.
Comment 20 Andrius Benokraitis 2006-11-07 15:04:52 EST
Dan @ Stratus: Last call for this request, what is the status of this request
from you all?
Comment 21 Andrius Benokraitis 2006-11-10 12:19:57 EST
Closing at the request of Chas at Stratus:

"After consultation here at Stratus, we believe thisshould be closed/dropped. 
This is too long-term to continue to spend cycles on at present."

Note You need to log in before you can comment on or make changes to this bug.