Description of Problem: A bug fix in 7.1 errata has been rebroken in 7.2 regarding masked LUNs. The system sees all LUNs regardless if the are masked and then attempts to access them. This results in a read capacity failure of the masked LUN although operation of the non-masked LUNs remains unaffected. In linux-2.4/drivers/scsi.c we have: line 501: SDpnt->online = TRUE; line 622: if (((scsi_result[0] >> 5) & 7) == 1 ) SDpnt->online = FALSE; line 623: else SDpnt->online = TRUE; line 755: SDpnt->online = TRUE; line 771: SDpnt->online = TRUE; Apparantly, online is being forced to true for some other bug fix which is resulting in the recurrance of this older bug. Version-Release number of selected component (if applicable): How Reproducible: Very Steps to Reproduce: 1. Set-up a masked LUN 2. Reboot 3. Actual Results: Expected Results: Additional Information:
Doug, please look into this...
Unfortunately, this is a no win situation currently. If we re-instate the previous behavior then there are devices that break, and if we leave it alone, then the device here breaks. The problem is that the bits in the INQUIRY data that we are trying to use to determine if a device is online or offline aren't used consistently amongst different vendors (meaning that different vendors of disk devices and raid devices use those bits differently in order to mean different things). For example, there is at least one raid vendor that marks their raid devices as offline until they are fully ready for use, however they need the SCSI subsystem to add them as disk units so that ioctls can be sent to the devices (otherwise they might never be able to get them online). If I remember correctly, the failure here is because the AMI firmware adds a blank entry on every lun where they *might* add an array device in the future, but where there isn't one now. I personally find this behaviour, ummm, sub-optimal. If you guys will get me the vendor strings related to this particular problem, I'll see about maybe making a change in the BLIST flags to handle this problem.
Created attachment 33922 [details] /proc/scsi/scsi from system
Created attachment 33923 [details] end of /var/log/messages showing read capacity failures
The ordering of the devices needs to be preserved and found again for scsi_reserve situations so that if the shared scsi device is/was sdb, it stays sdb. currently if there is a failure and a restart the reserve device is removed and sdc sdd etc get bumped down into the sdb slot.
OK, the problem here is different than I originally thought. The two lines that set SDpnt->online = TRUE; come *after* we have already linked the current device into the sd chain and have moved to a new device. In other words, lines 755 and 771 are setting the online state of the *next* device to be scanned, not the one that we have already marked offline. That means that the problem is we are not marking this device offline. Well, that's a simple answer. We aren't marking it offline because it is LUN 0 and unless I'm mistaken (which is possible because I haven't looked this up in the SCSI spec in a long time), LUN 0 has to be valid before other LUNs are valid. So, if you make LUN 0 an on-line device, then mask LUN 1, then make LUN 2 a valid device, it *should* work as you are expecting. Please try that out and report back here. Whether it works or not will tell me more specifics about how it is failing for you now.
I'm not sure this addresses our problem. In our case we have a LUN 0 owned by Windows, which I think fits under the definition of valid (?). So here LUN 0 is valid and all the other LUNs are valid except since 0 is Windows-owned we should not be trying to access it. We do and this results in the read capacity failure. Perhaps I've misunderstood your meaning of the word valid?
(Note: Dell/EMC/Oracle combo is a primary target of Pensacola, hence our urgency on this issue!) In the case where LUN 0 is masked off from us (and bits in the peripheral qualifier field of the INQUIRY show as such), EMC still needs LUN 0 to be marked online so that they can send management commands via sg to LUN 0 to manage the whole array. Else LUN 0 is marked offline, they can't send those commands. But, Linux really shouldn't be sending read/write commands to any LUN when it's masked off. On the Dell PV650 and 660 at least, and other EMC storage arrays, I/Os to masked LUNs fail, while other SCSI commands for management passed via sg succeed. But, that's vendor-specific. We can't be certain that *all* vendors who mask off LUN 0, but which Linux leaves online, will reject read/write commands sent to the masked LUN. At present, EMC distributes in their Attach Kit a patch: --- linux-2.4/drivers/scsi/scsi_scan.c.orig Sun Sep 2 16:06:57 2001 +++ linux-2.4/drivers/scsi/scsi_scan.c Sun Sep 2 16:09:32 2001 @@ -601,8 +601,8 @@ SDpnt->removable = (0x80 & scsi_result[1]) >> 7; /* Use the peripheral qualifier field to determine online/offline */ - if (((scsi_result[0] >> 5) & 7) == 1) SDpnt->online = FALSE; - else SDpnt->online = TRUE; + if (lun) {if (((scsi_result[0] >> 5) & 7) == 1) SDpnt->online = FALSE; + else SDpnt->online = TRUE;} SDpnt->lockable = SDpnt->removable; SDpnt->changed = 0; SDpnt->access_count = 0; which effectively leaves LUN 0 marked online always, so they can send commands. Their Attach Kit instructions tell people to recompile their kernel to get this functionality (and so their management apps work). Maybe a whitelist flag for "KEEP_LUN0_ONLINE_ALWAYS" for the various products is the right way to go (much as I hate keeping whitelists). It just seems that different vendors choose to use the peripheral qualifier bits differently. -Matt
OK. Time for a few clarifications. First, the original problem report in here was that a device that was *suppossed* to be offline was in fact getting marked as online and then linux was trying to access it. That was incorrect. My reading of the code suggested that the problem was that the device was actually reporting itself as online, but unavailable. The linux SCSI subsystem knows about offline, it has no concept of inaccessible. It was therefore trying to send the READ_CAPACITY command to the lun 0 volume and the PV660 was returning an error. Technically, this is a harmless error. It is purely cosmetic and doesn't harm the Windows owned array in any way. This happens because the lun0 device is in fact marked on-line. Now, the EMC attach kit patch marks *all* lun 0 devices as being online. This will not help the current READ_CAPACITY problem. If anything, it makes it worse because now we aren't marking lun0 on the PV660 online because it reports itself as being online, we are marking it online due to a forced override. That can't possibly be considered helping the situation. So, to respond to Matt's assertion, yes different vendors use the peripheral qualifier bits differently and that is the cause of this problem. At one point in time I added a test to scsi_scan.c to catch the problem on the PV660 units, and evidently that has been lost (I think it went into the mainstream kernel, and so came out of our specific patches, and then someone else removed it from the mainstream kernel because it interferred with some other device, and was hence lost). So, I think fixing the PV660 now is going to require a PV660 specific hack, using a new whitelist flag, and a new test to mark devices offline. In order to make that work, I need the actual byte value of the SCSI INQUIRY byte 0 return value from an on-line and off-line (masked) PV660 array volume.
Gary, please provide for PV660 (which I believe is already attached in /proc/scsi/scsi - those with PSEUDO being masked off, those without being masked on), and get with EMC ASAP to see what other devices they want on the whitelist. Need /proc/scsi/scsi info for all of them.
Ok, need to re-focus in on this bug now that we've cleared away the EMC association to it. Basically, the issue is that LUNs used to be assumed to be off until the above code turned them on as necessary. Since that point, the philosiphy has been changed to make all LUNs on regardless if you own them or not. This means, generally, that code needs to be changed to selectively mark LUNs off that shouldn't be on, instead. This is what needs to be fixed and those LUNs need to marked off before any reading is attempted to avert any read capacity failure. Is this enough info to get this working. I don't think /proc/scsi/scsi information would be any help in implementing this fix but let me know if you need anything to facilitate your work. Thanks.
Gary, what you wrote is partly true. The linux SCSI subsystem has *always* assumed that a device was on and then only selectively turned them off. That hasn't changed. When this problem cropped up before (which it did 2 or 3 releases ago), it was the same issue that it is now. Specifically, the PV660 uses a non-standard bit pattern in the INQUIRY return data byte 0 to indicate a masked LUN. I'm not saying the pattern is wrong. That byte is defined in a vague enough manner that it doesn't make any sense to assign a right/wrong label. It's just unique compared to the rest of the devices that we scan. Now, last time I corrected this problem I added a new test to disable the device if it matched the PV660's bit pattern. That was sent to the mainstream kernel and included, and therefore removed from our RPMs, then it was evidently yanked back out of the kernel because it did bad things on other devices that want to use the bits in byte 0 differently. So, now, in order to fix the problem (again), I need to know that bit pattern from byte 0 (which isn't in /proc/scsi/scsi, I need the raw INQUIRY response data) so that I can re-add the test to disable the device, but this time I will key it to the PV660 PSUEDO device name as well so that it doesn't interfere with other devices (aka, I'll add a blacklist flag and then make the test check that blacklist flag). Actually, if you can guarantee me that the PV660 PSUEDO device is *always* masked off, I'll just do a blacklist and not worry about the bit pattern of byte 0. Let me know how you want it handled and if you want me to check the bit pattern on byte 0 then please include the hex value of byte 0 of the INQUIRY response data so I can know what to check for.
From jacob_cherian: The SCSI driver should not be attempting to to a read capacity on a msked LUN. I think the error happens because the SCSI driver treats all LUNs that donot return 0x7f (011/1 1111) for the Peripheral Qulaifier/Peripheral Device Type fields in the Standard Inquiry data as LUNs that can be accessible by the host. LUNs may retun 0x20 (PQ/PDT - 001/0 0000) in the case of a SAN attached storage device where the LUN exists but cannot be accessed by the server. According to SPC-2, this status is to be returned when the device server is capable of supporting the specified peripheral device type on the specific LU#, but currently the device with the LU# is not connected to the device server. This is consistent with the view of the storage device as seen from a SAN attached server. LUNs should return 0x00 (PQ/PDT) in the case of LUNs that are visible. The fix would be to have the SCSI driver look for this value instead of checking for ! (0x7F) to decide whether a LUN is valid. Thanks, Jacob Cherian Storage Systems ESG, Dell Computer Corporation Ph: (512) 723 3247
Closing...fixed with ghost LUN fix.