Bug 1975368

Summary: Pure Storage iSCSI Driver: LUN with id >255 can't be connected properly when flat addressing is used
Product: Red Hat OpenStack Reporter: Takashi Kajinami <tkajinam>
Component: openstack-cinderAssignee: Gorka Eguileor <geguileo>
Status: NEW --- QA Contact: Evelina Shames <eshames>
Severity: medium Docs Contact: RHOS Documentation Team <rhos-docs>
Priority: medium    
Version: 13.0 (Queens)CC: abishop, eharney, geguileo
Target Milestone: ---Keywords: OtherQA, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takashi Kajinami 2021-06-23 14:10:58 UTC
Description of problem:

Pure Storage by default uses peripheral addressing for LUN ID < 256 while it uses flat addressing for LUN >= 256 .

When peripheral addressing is used then LUN IDs presented in iscsi portals on the storage equipment look exactly same as LUN IDs.
 0x0001 > LUN  1
 0x000a > LUN 10
 0x0021 > LUN 33

On the other hand, when flat addressing is used, LUN IDs look like having additional 0x400.
This is because Pure Storage uses that 0x4000 as a flag to indicate that it is using flat addressing.
 0x4100 > LUN 256
 0x4104 > LUN 260
 0x4110 > LUN 272

However a problem is that RHEL doesn't treat that 0x4000 separately but it mixes up it with the raw LUN ID.
This means that if RHEL scans a volume with ID >= 256, it can't use the raw LUN ID but need to increase it by 16384 so that device can be detected as expected.

$ echo '0 0 220' | sudo tee -a /sys/class/scsi_host/host<host>/scan
 -> This works because LUN ID < 256

$ echo '0 0 261' | sudo tee -a /sys/class/scsi_host/host<host>/scan
 -> This doesn't work because LUN ID >= 256

$ echo '0 0 16645' | sudo tee -a /sys/class/scsi_host/host<host>/scan
 -> This works because 261 + 16384 = 16645

Currently pure storage driver and os-brick are not aware of this behavior.
The pure storage driver returns a raw lun id (like 261) even when lun id is greater than 255, and os-brick uses that raw lun id, and RHEL can't detect the scsi device properly.

As per discussion with Pure Storage support it is likely that this problem can be solved by setting host personality to oracle-vm-server(this makes all LUN presented use peripheral addressing method) but it'd be useful if cinder or os-brick can detect flat addressing automatically and use proper lun id when scanning scsi devices.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Create a volume
2. Attach a volume to an instance

Actual results:
Volume attachment fails because of no iscsi device found if LUN ID > 255 and flat addressing is used.

Expected results:
Volume attachment succeeds even if LUN ID > 255 and flat addressing is used.


Additional info:

Comment 3 Alan Bishop 2021-06-24 17:54:38 UTC
I realize a customer is tripping over this issue, and from reading the links in comment #2 it seems to be somewhat controversial. I found another post from 2-1/2 years ago with comments from the sample people: https://github.com/hreinecke/sg3_utils/issues/31

The sticking point for me is the firm statement from the subject matter expert (SME) in the third link suggests the SCSI standards really don't lend themselves to inferring the target's addressing mode. It seems like many factors can be involved:

- The target's behavior
- The initiator's HBA
- The Linux kernel
- The userspace tools (e.g. sg_utils)

It doesn't seem a good idea to me for cinder and os-brick to jump into the middle and magically resolve a problem the other communities continue to struggle with. If the issue can be handled by tuning the cinder Pure driver's configuration (setting the host personality to oracle-vm-server) then that seems prudent. This is my own opinion, and other RH cinder team members may have a different view.

Comment 5 Gorka Eguileor 2023-02-10 16:35:57 UTC
As it was said in previous comments the reason why the device doesn't appear is because the scan is failing, and since we are using the manual scan feature in OpeniSCSI to prevent races the device doesn't appear without a successful scan.

Just making os-brick detect that LUN > 255 and then assuming that the storage system is using flat space addressing is probably not the right thing to do.

If the storage system is using SAM-2 commands, then os-brick needs to pass 256 to the scan command, whereas if it's using SCSI-3, then it needs to pass 16640.

The solution would be to add a parameter to the connection information that tells the addressing mode when a conversion is needed (defaults to no conversion), and add the code to handle different values in os-brick.

Comment 6 Gorka Eguileor 2023-02-21 19:30:00 UTC
I've looked a bit more into it, and my last comment is not correct.

SAM: Transparent 64bits
SAM-2:
 - LUN < 256 uses peripheral addressing, which is equivalent to transparent since MSB high bits are 00b
 - LUN >= 256 uses flat addressing, which has an offset of 16384 because the MSB high bits are 01b
 SAM-3:
  - LUN < 256 the storage array can chose the addressing mode between peripheral addressing or flat addressing (same offset)
  - LUN >= 256 flat addressing

And these are just some of the addressing modes, since peripheral can do multi-level.

I have proposed a wip patch to add support for some of the basic addressing modes in os-brick as well as another patch to Pure iSCSI and FC drivers that leverage it.
Pure will be testing this manually upstream since usual testing don't have that many LUNs mapped to the same host.