Bug 54333

Summary: read capacity failed on masked lun
Product: [Retired] Red Hat Linux Reporter: Gary Lerhaupt <gary_lerhaupt>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED WORKSFORME QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: copeland, jacob_cherian, john_hull, johnsonm, matt_domsch, mferris, michael_e_brown, ykang
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-04-01 22:31:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/proc/scsi/scsi from system
none
end of /var/log/messages showing read capacity failures none

Description Gary Lerhaupt 2001-10-04 14:54:50 UTC
Description of Problem:
A bug fix in 7.1 errata has been rebroken in 7.2 regarding masked LUNs. 
The system sees all LUNs regardless if the are masked and then attempts to
access them.  This results in a read capacity failure of the masked LUN
although operation of the non-masked LUNs remains unaffected. 

In linux-2.4/drivers/scsi.c we have:
line 501: SDpnt->online = TRUE;

line 622: if (((scsi_result[0] >> 5) & 7) == 1 ) SDpnt->online = FALSE;
line 623: else SDpnt->online = TRUE;

line 755: SDpnt->online = TRUE;
line 771: SDpnt->online = TRUE;

Apparantly, online is being forced to true for some other bug fix which is
resulting in the recurrance of this older bug.

Version-Release number of selected component (if applicable):


How Reproducible:
Very

Steps to Reproduce:
1. Set-up a masked LUN
2. Reboot
3. 

Actual Results:


Expected Results:


Additional Information:

Comment 1 Michael K. Johnson 2001-10-11 15:23:36 UTC
Doug, please look into this...

Comment 2 Doug Ledford 2001-10-11 17:32:56 UTC
Unfortunately, this is a no win situation currently.  If we re-instate the
previous behavior then there are devices that break, and if we leave it alone,
then the device here breaks.  The problem is that the bits in the INQUIRY data
that we are trying to use to determine if a device is online or offline aren't
used consistently amongst different vendors (meaning that different vendors of
disk devices and raid devices use those bits differently in order to mean
different things).  For example, there is at least one raid vendor that marks
their raid devices as offline until they are fully ready for use, however they
need the SCSI subsystem to add them as disk units so that ioctls can be sent to
the devices (otherwise they might never be able to get them online).  If I
remember correctly, the failure here is because the AMI firmware adds a blank
entry on every lun where they *might* add an array device in the future, but
where there isn't one now.  I personally find this behaviour, ummm, sub-optimal.
 If you guys will get me the vendor strings related to this particular problem,
I'll see about maybe making a change in the BLIST flags to handle this problem.

Comment 3 Gary Lerhaupt 2001-10-11 19:46:24 UTC
Created attachment 33922 [details]
/proc/scsi/scsi from system

Comment 4 Gary Lerhaupt 2001-10-11 19:48:23 UTC
Created attachment 33923 [details]
end of /var/log/messages showing read capacity failures

Comment 5 Eido Inoue 2001-10-12 20:11:19 UTC
The ordering of the devices needs to be preserved and found again for
scsi_reserve situations so that if the shared scsi device is/was sdb, it stays
sdb. currently if there is a failure and a restart the reserve device is removed
and sdc sdd etc get bumped down into the sdb slot.


Comment 6 Doug Ledford 2002-02-13 21:29:51 UTC
OK, the problem here is different than I originally thought.  The two lines that
set SDpnt->online = TRUE; come *after* we have already linked the current device
into the sd chain and have moved to a new device.  In other words, lines 755 and
771 are setting the online state of the *next* device to be scanned, not the one
that we have already marked offline.  That means that the problem is we are not
marking this device offline.  Well, that's a simple answer.  We aren't marking
it offline because it is LUN 0 and unless I'm mistaken (which is possible
because I haven't looked this up in the SCSI spec in a long time), LUN 0 has to
be valid before other LUNs are valid.  So, if you make LUN 0 an on-line device,
then mask LUN 1, then make LUN 2 a valid device, it *should* work as you are
expecting.  Please try that out and report back here.  Whether it works or not
will tell me more specifics about how it is failing for you now.

Comment 7 Gary Lerhaupt 2002-02-15 15:34:37 UTC
I'm not sure this addresses our problem.  In our case we have a LUN 0 owned by 
Windows, which I think fits under the definition of valid (?).  So here LUN 0 
is valid and all the other LUNs are valid except since 0 is Windows-owned we 
should not be trying to access it.  We do and this results in the read capacity 
failure.  Perhaps I've misunderstood your meaning of the word valid?

Comment 8 Matt Domsch 2002-03-07 22:04:55 UTC
(Note: Dell/EMC/Oracle combo is a primary target of Pensacola, hence our 
urgency on this issue!)

In the case where LUN 0 is masked off from us (and bits in the peripheral 
qualifier field of the INQUIRY show as such), EMC still needs LUN 0 to be 
marked online so that they can send management commands via sg to LUN 0 to 
manage the whole array.  Else LUN 0 is marked offline, they can't send those 
commands.

But, Linux really shouldn't be sending read/write commands to any LUN when it's 
masked off.  On the Dell PV650 and 660 at least, and other EMC storage arrays, 
I/Os to masked LUNs fail, while other SCSI commands for management passed via 
sg succeed.  But, that's vendor-specific.  We can't be certain that *all* 
vendors who mask off LUN 0, but which Linux leaves online, will reject 
read/write commands sent to the masked LUN.

At present, EMC distributes in their Attach Kit a patch:
--- linux-2.4/drivers/scsi/scsi_scan.c.orig	Sun Sep  2 16:06:57 2001
+++ linux-2.4/drivers/scsi/scsi_scan.c	Sun Sep  2 16:09:32 2001
@@ -601,8 +601,8 @@
 
 	SDpnt->removable = (0x80 & scsi_result[1]) >> 7;
 	/* Use the peripheral qualifier field to determine online/offline */
-	if (((scsi_result[0] >> 5) & 7) == 1) 	SDpnt->online = FALSE;
-	else SDpnt->online = TRUE;
+	if (lun) {if (((scsi_result[0] >> 5) & 7) == 1) 	SDpnt->online = 
FALSE;
+	else SDpnt->online = TRUE;}
 	SDpnt->lockable = SDpnt->removable;
 	SDpnt->changed = 0;
 	SDpnt->access_count = 0;

which effectively leaves LUN 0 marked online always, so they can send 
commands.  Their Attach Kit instructions tell people to recompile their kernel 
to get this functionality (and so their management apps work).

Maybe a whitelist flag for "KEEP_LUN0_ONLINE_ALWAYS" for the various products 
is the right way to go (much as I hate keeping whitelists).  It just seems that 
different vendors choose to use the peripheral qualifier bits differently.
-Matt


Comment 9 Doug Ledford 2002-03-08 21:58:15 UTC
OK.  Time for a few clarifications.

First, the original problem report in here was that a device that was
*suppossed* to be offline was in fact getting marked as online and then linux
was trying to access it.  That was incorrect.  My reading of the code suggested
that the problem was that the device was actually reporting itself as online,
but unavailable.  The linux SCSI subsystem knows about offline, it has no
concept of inaccessible.  It was therefore trying to send the READ_CAPACITY
command to the lun 0 volume and the PV660 was returning an error.  Technically,
this is a harmless error.  It is purely cosmetic and doesn't harm the Windows
owned array in any way.  This happens because the lun0 device is in fact marked
on-line.

Now, the EMC attach kit patch marks *all* lun 0 devices as being online.  This
will not help the current READ_CAPACITY problem.  If anything, it makes it worse
because now we aren't marking lun0 on the PV660 online because it reports itself
as being online, we are marking it online due to a forced override.  That can't
possibly be considered helping the situation.

So, to respond to Matt's assertion, yes different vendors use the peripheral
qualifier bits differently and that is the cause of this problem.  At one point
in time I added a test to scsi_scan.c to catch the problem on the PV660 units,
and evidently that has been lost (I think it went into the mainstream kernel,
and so came out of our specific patches, and then someone else removed it from
the mainstream kernel because it interferred with some other device, and was
hence lost).

So, I think fixing the PV660 now is going to require a PV660 specific hack,
using a new whitelist flag, and a new test to mark devices offline.  In order to
make that work, I need the actual byte value of the SCSI INQUIRY byte 0 return
value from an on-line and off-line (masked) PV660 array volume.

Comment 10 Matt Domsch 2002-03-08 22:24:19 UTC
Gary, please provide for PV660 (which I believe is already attached 
in /proc/scsi/scsi - those with PSEUDO being masked off, those without being 
masked on), and get with EMC ASAP to see what other devices they want on the 
whitelist.  Need /proc/scsi/scsi info for all of them.

Comment 11 Gary Lerhaupt 2002-03-27 20:36:46 UTC
Ok, need to re-focus in on this bug now that we've cleared away the EMC 
association to it.  Basically, the issue is that LUNs used to be assumed to be 
off until the above code turned them on as necessary.  Since that point, the 
philosiphy has been changed to make all LUNs on regardless if you own them or 
not.  This means, generally, that code needs to be changed to selectively mark 
LUNs off that shouldn't be on, instead.  

This is what needs to be fixed and those LUNs need to marked off before any 
reading is attempted to avert any read capacity failure.  Is this enough info 
to get this working.  I don't think /proc/scsi/scsi information would be any 
help in implementing this fix but let me know if you need anything to 
facilitate your work.  Thanks.

Comment 12 Doug Ledford 2002-03-28 17:22:59 UTC
Gary, what you wrote is partly true.  The linux SCSI subsystem has *always*
assumed that a device was on and then only selectively turned them off.  That
hasn't changed.  When this problem cropped up before (which it did 2 or 3
releases ago), it was the same issue that it is now.  Specifically, the PV660
uses a non-standard bit pattern in the INQUIRY return data byte 0 to indicate a
masked LUN.  I'm not saying the pattern is wrong.  That byte is defined in a
vague enough manner that it doesn't make any sense to assign a right/wrong
label.  It's just unique compared to the rest of the devices that we scan.  Now,
last time I corrected this problem I added a new test to disable the device if
it matched the PV660's bit pattern.  That was sent to the mainstream kernel and
included, and therefore removed from our RPMs, then it was evidently yanked back
out of the kernel because it did bad things on other devices that want to use
the bits in byte 0 differently.  So, now, in order to fix the problem (again), I
need to know that bit pattern from byte 0 (which isn't in /proc/scsi/scsi, I
need the raw INQUIRY response data) so that I can re-add the test to disable the
device, but this time I will key it to the PV660 PSUEDO device name as well so
that it doesn't interfere with other devices (aka, I'll add a blacklist flag and
then make the test check that blacklist flag).  Actually, if you can guarantee
me that the PV660 PSUEDO device is *always* masked off, I'll just do a blacklist
and not worry about the bit pattern of byte 0.  Let me know how you want it
handled and if you want me to check the bit pattern on byte 0 then please
include the hex value of byte 0 of the INQUIRY response data so I can know what
to check for.

Comment 13 Gary Lerhaupt 2002-04-01 22:31:06 UTC
From jacob_cherian:

The SCSI driver should not be attempting to to a read capacity on a msked LUN. 
I think the error happens because the SCSI driver treats all LUNs that donot 
return 0x7f (011/1 1111)  for the Peripheral Qulaifier/Peripheral Device Type 
fields in the Standard Inquiry data as LUNs that can be accessible by the host. 

LUNs may retun 0x20 (PQ/PDT - 001/0 0000)  in the case of a SAN attached 
storage device where the LUN exists but cannot be accessed by the server. 
According to SPC-2, this status is to be returned when the device server is 
capable of supporting the specified peripheral device type on the specific LU#, 
but currently the device with the LU# is not connected to the device server. 
This is consistent with the view of the storage device as seen from a SAN 
attached server.

LUNs should return 0x00 (PQ/PDT) in the case of LUNs that are visible. The fix 
would be to have the SCSI driver look for this value instead of checking for !
(0x7F) to decide whether a LUN is valid.

Thanks,

Jacob Cherian
Storage Systems 
ESG, Dell Computer Corporation
Ph: (512) 723 3247


Comment 14 John A. Hull 2002-07-23 22:37:48 UTC
Closing...fixed with ghost LUN fix.