From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7 Description of problem: scsi_scan.c:scsi_sequential_lun_scan(), when it encounters a LUN whose Inquiry returns with the PQ bits set to 0x3, will leave the partially initialized sdev structure around. If a LLDD then attempts to unload, an oops with a stack trace similar to the following: [<a000000100019980>] show_stack+0x80/0xa0 sp=e000000071fc7810 bsp=e000000071fc1420 [<a00000010003d1e0>] die+0x1c0/0x2e0 sp=e000000071fc79e0 bsp=e000000071fc13e0 [<a00000010003d470>] ia64_fault+0x110/0x1060 sp=e000000071fc79e0 bsp=e000000071fc1388 [<a000000100010180>] ia64_leave_kernel+0x0/0x260 sp=e000000071fc7bf0 bsp=e000000071fc1388 [<a00000010020de90>] sysfs_hash_and_remove+0x10/0x2a0 sp=e000000071fc7dc0 bsp=e000000071fc1348 [<a000000100210830>] sysfs_remove_link+0x30/0x60 sp=e000000071fc7dc0 bsp=e000000071fc1320 [<a0000001003d7e20>] class_device_del+0x340/0x380 sp=e000000071fc7dc0 bsp=e000000071fc12d8 [<a0000001003d7e80>] class_device_unregister+0x20/0x60 sp=e000000071fc7dc0 bsp=e000000071fc12b8 [<a0000002008a84b0>] scsi_remove_device+0xb0/0x2e0 [scsi_mod] sp=e000000071fc7dc0 bsp=e000000071fc1288 [<a0000002008a42a0>] scsi_forget_host+0xe0/0x1e0 [scsi_mod] sp=e000000071fc7dc0 bsp=e000000071fc1258 [<a000000200896210>] scsi_remove_host+0x30/0xe0 [scsi_mod] sp=e000000071fc7dc0 bsp=e000000071fc1238 [<a00000020099e8d0>] lpfc_pci_remove_one+0x410/0x860 [lpfc] sp=e000000071fc7dc0 bsp=e000000071fc11c8 [<a000000100331720>] pci_device_remove+0xc0/0xe0 sp=e000000071fc7dc0 bsp=e000000071fc11a0 [<a0000001003d6080>] device_release_driver+0x120/0x140 sp=e000000071fc7dc0 bsp=e000000071fc1170 [<a0000001003d6160>] bus_remove_driver+0xc0/0x1c0 sp=e000000071fc7dc0 bsp=e000000071fc1138 [<a0000001003d6a50>] driver_unregister+0x30/0xc0 sp=e000000071fc7dc0 bsp=e000000071fc1118 [<a000000100331c50>] pci_unregister_driver+0x30/0x160 sp=e000000071fc7dc0 bsp=e000000071fc10d8 [<a0000002009a7610>] lpfc_exit+0x30/0x70 [lpfc] sp=e000000071fc7dc0 bsp=e000000071fc10c0 [<a0000001000f7290>] sys_delete_module+0x370/0x620 sp=e000000071fc7dc0 bsp=e000000071fc1040 [<a000000100010000>] ia64_ret_from_syscall+0x0/0x20 sp=e000000071fc7e30 bsp=e000000071fc1040 [<a000000000010640>] 0xa000000000010640 sp=e000000071fc8000 bsp=e000000071fc1040 The following patch (which I'm sure won't match your sources) corrects the issue: --- a/drivers/scsi/scsi_scan.c 2005-12-12 18:49:19.000000000 -0500 +++ ./scsi_scan.c 2006-01-26 11:28:55.000000000 -0500 @@ -816,6 +816,8 @@ static void scsi_sequential_lun_scan(str uint id, int bflags, int lun0_res, int scsi_level, int rescan) { unsigned int sparse_lun, lun, max_dev_lun; + struct scsi_device *sdev; + int res; SCSI_LOG_SCAN_BUS(3, printk(KERN_INFO "scsi scan: Sequential scan of" " host %d channel %d id %d\n", shost->host_no, @@ -880,11 +882,14 @@ static void scsi_sequential_lun_scan(str * until we reach the max, or no LUN is found and we are not * sparse_lun. */ - for (lun = 1; lun < max_dev_lun; ++lun) - if ((scsi_probe_and_add_lun(shost, channel, id, lun, - NULL, NULL, rescan) != SCSI_SCAN_LUN_PRESENT) && - !sparse_lun) + for (lun = 1; lun < max_dev_lun; ++lun) { + res = scsi_probe_and_add_lun(shost, channel, id, lun, + NULL, &sdev, rescan); + if (res == SCSI_SCAN_LUN_IGNORED) + scsi_scan_remove(sdev); + if ((res != SCSI_SCAN_LUN_PRESENT) && !sparse_lun) return; + } } /** Version-Release number of selected component (if applicable): kernel 2.6.9-24 and higher How reproducible: Always Steps to Reproduce: 1. Must have a device in which Inquiries to unconfigured luns return w/ PQ bits = 0x03 2.after scanning and seeing luns, rmmod the LLDD 3. Actual Results: oops!! Expected Results: no oops :) Additional info:
Hi James, I think we have this fixed in the newest kernel. Could you try this kernel real quick http://people.redhat.com/~jbaron/rhel4/ I just looked at the source and I am pretty sure it is fixed. For pq=3 and sdevp=NULL we remove the device for the caller. I do not have a box that returns pq=3, but I will download pat's scsi_debug patch again, and try out the sequential scan path to verify this.
I've looked at the patch, and believe it should be ok. However, it seems messy. If nothing else, the code could be much clearer. Right now, you have to remember that if lun0 scan by scsi_scan_target(), sdevp is set and you need to leave the sdev, while if called by a report_luns or sequential scan, sdevp is Null. Also, it's leaving the "if (res == SCSI_SCAN_LUN_IGNORED) scsi_scan_remove(sdev)" snippets in scsi_scan_target() and __scsi_add_device(). So there's essentially 2 pieces of logic doing the same thing. As it should work - I'm trying to have some test put together to ensure it. However, it may take a couple of days. Also - I take it that this fix is currently planned to be part of U3, right ? Thanks -- james
In the current code if you the caller needs the sdev pointer then it removes the device when it is done with it. This is why the second call to scsi_scan_target and __scsi_add_device do this. And if the caller does not need the sdev pointer so it passes NULL then the removal is handled for it. We should have done the starget stuff, like upstream but did not have time. I guess we could have just made the sdev pointer a mandatory argument to scsi_probe_and_add_lun but it seemed like a waste to have all the callers check this and remove it if necessary. Did you have something else in mind?
We can keep it as is. We put it through some tests and everything looks good. Thank You. -- james
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html