Emulex resolved the following problem for Morgan Stanley. For some types of storage device (Clariion is one example) an extra, non-existent target device is configured. This can also cause real LUNs to not be configured. The problem is a side-effect of a change made in U4. There is a one-line fix, in scan_scsis_single: if (lun != 0 && (scsi_result[0] >> 5) == 1) { scsi_release_request(SRpnt); + scsi_release_commandblocks(SDpnt); return 0; } I have reproduced the problem. It will require 2 hours to prepare the patch and test it. Additional regression testing will be needed, but the risk is low. Detailed analysis from Emulex: We found the "ghost" issue on RHEL3U4 with Morgan Stanley. It's an issue in scsi_scan.c when the device returns a PQ = 1. Attached is a summary of the bug, and I've also attached some debugging that we did internally, including a 1 line fix that resolves it. Note: one thing I didn't point out is, once you hit this PQ error, if the next target exists, it will actually have a incorrect LUN 0 created (bad inquiry data, state hosed, etc) as it's using the old command data. Things don't straighten out until LUN 1 is probed on the next target I'd assume it's in U5 as well. We believe we understand the "Ghost Target" issue (issue #3) and understand how this could be causing the non-detection of luns as well (issue #2). This is a midlayer bug, but we happen to have a workaround in the driver already that can be utilized. We will be working with Red Hat and the upstream kernel to resolve the bug. Please test with the workaround suggested and let us know the results. Update 4/7/05: The array in question is returning a Peripheral Qualifier (PQ) value of 001b (not present, but could be). The SCSI midlayer has a bug whereby it exits the scan when it sees this value, but does not free command blocks. When it attempts to scan the next target/lun, it reuses the command blocks, which has the old address information in it. The midlayer thinks it's scanning the next tgt/lun combination, but the request to the driver is scanning the old address that returned the PQ value of 1. Additionally, the midlayer makes exceptions for Lun 0 if it returns a PQ value of 1, allowing it to be added to the system (thus the potential for a ghost target). Note: If the array returns a PQ value of 011b (not present), then the midlayer takes a similar code path, but it frees the command As such, the addressing information for the next scan will always be correct. As the PQ value returned for an unconfigured lun is device-specific (and both are allowed by SCSI spec), some devices may exhibit the "ghost" target, while others will not. Work Around: The lpfc driver has a parameter that will cause us to replace Peripheral Qualifier values of 001b with 011b. To enable this feature, turn on the following options for the lpfc driver in modules.conf: options lpfc lpfc_inq_pqb_filter=1 Note: We instituted this feature as we had encountered lun skip issues in the past and had noticed that the Qlogic adapter was silently replacing PQ values before handling the results to the driver. We default this parameter to 0 (off) so that traffic remains un-modified unless instructed by the admin. Formally Fixing the Issue: We will be communicating this problem the Red Hat and potentially to the 2.4 kernel maintainers. Detailed Example of the bug: Configuration: 1 target (tgt #0), luns 0 and 2 Midlayer starts scan: Allocates temporary device struct. No command blocks allocated. Midlayer scans Target 0 lun 0 : As no command blocks, allocate and initialize to tgt 0 lun 0 Inquiry sent, returns PQ value of 0 (present) Device struct linked into system, new temporary struct allocated (no command blocks for it allocated). Midlayer scans Target 0 lun 1 : As no command blocks, allocate and initialize to tgt 0 lun 1 Inquiry sent, returns PQ value of 1 (not present, but could be) As not lun 0 and PQ=1, exit scan (bug here) Midlayer scans Target 0 lun 2 : As command blocks exist, don't allocate new (thus they still have the old address info) Inquiry sent. Note: Driver sees address tgt 0 lun 1 in the command blocks. As such, it sends it to tgt 0 lun 1, which responds with PQ=1 again. As not lun 0 and PQ=1, exit scan (bug here) ... This continues until the midlayer cycles to the next target id Midlayer scans Target 1 lun 0 : As command blocks exist, don't allocate new (thus they still have the old address info) Inquiry sent. Note: Driver sees address tgt 0 lun 1 in the command blocks. As such, it sends it to tgt 0 lun 1, which responds with PQ=1 again. As the midlayer believes it is lun 0 and PQ=1, device struct is linked into system, and a new temporary struct is allocated (with no command blocks). Midlayer scans Target 1 lun 1 : As no command blocks, allocate and initialize to tgt 0 lun 1 (everything is valid at this point and normal discovery resumes) The result of the above is: Target 0 lun 0 is found Target 0 lun 2 is not found as we never sent it an i/o Target 1 lun 0 is erroneously found. ============================================ The mid-layer starts it's bus scan. It calls scan_scsis which goes into a loop scanning all channels, targets, and luns. For each unique channel:target:lun it calls scan_scsis_single. At this level the mid-layer is doing the correct thing. For each channel and target it's looping from lun 0 to 255. The problem is somwhere between scan_scsis_single and queuecommand. When scan_scsis_single is called with anything greater than 0:0:3 lpfc_queuecommand is ALWAYS being called with 0:0:2. Here is section from the log: JIMP: scan_scsis_single - channel: 0, dev: 0, lun: 3, lun0_scsi_level: 4, max_dev_lun: 256, sparse_lun: 1 JIMP: lpfc_queuecommand - cmd: 0x12, channel: 0, target: 0, lun: 2 lpfc0:0205:DIi:Create SCSI LUN 2 on Target 0 lpfc0:0729:FPw:FCP cmd x12 failed, x0 x2, status: x1 result: x3f Data: xd x5 lpfc0:0730:FPw:FCP command failed: RSP Data: x8 x0 x3f x0 x0 x0 lpfc0:0716:FPi:FCP Read Underrun, expected 256, residual 63 Data: x100 x12 x0 Notice that scan_scsis_single is being called with lun 3 but lpfc_queuecommand is being called with lun 2. Now, it get's real interesting when the mid-layer tries to scan 0:1:0. Here is the logging for that: JIMP: scan_scsis_single - channel: 0, dev: 1, lun: 0, lun0_scsi_level: 3, max_dev_lun: 1, sparse_lun: 0 JIMP: lpfc_queuecommand - cmd: 0x12, channel: 0, target: 0, lun: 2 lpfc0:0205:DIi:Create SCSI LUN 2 on Target 0 lpfc0:0729:FPw:FCP cmd x12 failed, x0 x2, status: x1 result: x3f Data: x10a x102 lpfc0:0730:FPw:FCP command failed: RSP Data: x8 x0 x3f x0 x0 x0 lpfc0:0716:FPi:FCP Read Underrun, expected 256, residual 63 Data: x100 x12 x0 Vendor: DGC Model: Rev: 0207 Type: Direct-Access ANSI SCSI revision: 04 Notice the midlayer is trying to scan channel 0, target 1, lun 0 but the inquiry is sent to lpfc_queuecommand with channel 0, target 0, lun 2. Because the inquiry to 0:0:2 completes successfully it assumes there is a target 1 and creates an sd device. So the mid-layer is obviously using stale data somewhere. In looking at the mid-layer diffs I noticed a small change made between RHEL3U3 and RHEL3U4 in scsi_build_commandblocks. The change was: + /* + * Only init things once. + */ + if (SDpnt->has_cmdblocks) + return; This causes scsi_build_commandblocks to exit at the top of the routine if the scsi device has existing scsi_cmnd blocks. Which means we won't create new scsi_cmnd blocks with the correct lun, target, channel. We seem to have hit a condition in the mid-layer where the has_cmdblocks bit is not properly cleared. As a result the scsi_cmnd blocks being used are still populated with 0:0:2. When we eventual "discover" the ghost target as seen above this condition clears. Looking at the scan_scsis_single command there is only one place where we can bail out without clearing the has_cmdblocks bit after issuing an INQUIRY. That's where we check the peripheral qualifier bit: if (lun != 0 && (scsi_result[0] >> 5) == 1) { scsi_release_request(SRpnt); return 0; } I think this should really be: if (lun != 0 && (scsi_result[0] >> 5) == 1) { scsi_release_request(SRpnt); + scsi_release_commandblocks(SDpnt); return 0; } So setting the lpfc_inq_pqb_filter parameter to 1 really works by accident in this case by forcing us down a different code path. I tested a custom kernel with the one line addition above and it solves the problem.
A fix for this problem has just been committed to the RHEL3 U6 patch pool this evening (in kernel version 2.4.21-32.7.EL).
I did a test to confirm this fix. I used Emulex attached to Clariion. Saw the problem on 2.4.21-32.ELsmp. Saw it was fixed on 2.4.21-32.8.ELsmp.
FWIW, we saw a related problem, with the same fix. Wish I'd found this bug before I debugged it independently :) In our case, we have a raid in RDAC mode, where luns 1-3 belong to one controller, and luns 4-6 belong to another controller. When sending the INQUIRY command to lun 1, owned by the opposite controller, we got back the offline status. As above, the commandblocks would be re-used, and the original offline lun would continue to be queried, even when we should have moved on to the luns which -are- available on this controller. So, if lun 1 was offline, no luns were found on that channel. So in our case, we did not see ghost luns, but rather had missing luns. Glad to see you've fixed it, eagerly awaiting U6 :) -Eric
Yes. It is kernel version 2.4.21-32.7.EL, and later.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-663.html