Bug 1657468
Summary: | different behaviors when create storage pool via NPIV in RHEL7.5 and RHEL7.6 | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Adam Xu <xingya.xu> | |
Component: | libvirt | Assignee: | John Ferlan <jferlan> | |
Status: | CLOSED ERRATA | QA Contact: | yisun | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 7.6 | CC: | gveitmic, hhan, jdenemar, jferlan, meili, mkalinin, sirao, xingya.xu, xuzhang, yalzhang | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | libvirt-4.5.0-11.el7 | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1687715 (view as bug list) | Environment: | ||
Last Closed: | 2019-08-06 13:14:02 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1687715 |
Description
Adam Xu
2018-12-08 16:27:29 UTC
Hi Adam, For the issue you met, pls help to check following stuff: 1. How's the "multipath -ll" shows? in my 7.6 env, things as follow: (scsi_host13 is the vHBA) # lsscsi .... [13:0:0:0] disk IBM 1726-4xx FAStT 0617 /dev/sdd [13:0:0:1] disk IBM 1726-4xx FAStT 0617 /dev/sde [13:0:1:0] disk IBM 1726-4xx FAStT 0617 /dev/sdf [13:0:1:1] disk IBM 1726-4xx FAStT 0617 /dev/sdg # multipath -ll mpathd (3600a0b80005b10ca00005e115729093f) dm-14 IBM ,1726-4xx FAStT size=10G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=6 status=active | `- 13:0:0:1 sde 8:64 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 13:0:1:1 sdg 8:96 active ghost running mpathc (3600a0b80005b0acc00004f875728fe8e) dm-13 IBM ,1726-4xx FAStT size=10G features='1 queue_if_no_path' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=6 status=active | `- 13:0:1:0 sdf 8:80 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 13:0:0:0 sdd 8:48 active ghost running # virsh vol-list test Name Path ------------------------------------------------------------------------------ unit:0:0:1 /dev/disk/by-path/pci-0000:95:00.0-vport-0x2101001b32a90000-fc-0x203500a0b85b0acc-lun-1 unit:0:1:0 /dev/disk/by-path/pci-0000:95:00.0-vport-0x2101001b32a90000-fc-0x203400a0b85b0acc-lun-0 <== so only path which is "ready running" shown in vol list if they're just multipath devices pointing to a same backend lun. 2. For your step 4, it's a expected change. Actually when start a npiv pool, if wwnn/wwpn indicated, libvirt will try to create a vhba with that wwpn/wwnn. So existing wwnn/wwpn will block the process, an error reported. You don't need to reboot kvm server, you can "nodedev-destroy" the existing vhba and then define/start the pool. In the RHEL 7.6 server, it shows # multipath -ll mpatha (360002ac000000000000000270001e23d) dm-3 3PARdata,VV size=256G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 13:0:0:0 sdb 8:16 active ready running |- 13:0:1:0 sdc 8:32 active ready running |- 13:0:2:0 sdd 8:48 active ready running |- 13:0:3:0 sde 8:64 active ready running |- 14:0:0:0 sdf 8:80 active ready running |- 14:0:1:0 sdg 8:96 active ready running |- 14:0:2:0 sdh 8:112 active ready running `- 14:0:3:0 sdi 8:128 active ready running in another rhel 7.5 server, it shows: [root@kvm2 ~]# multipath -ll mpatha (360002ac0000000000000005b0001e23d) dm-3 3PARdata,VV size=256G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw `-+- policy='service-time 0' prio=50 status=active |- 13:0:0:0 sdb 8:16 active ready running |- 13:0:1:0 sdd 8:48 active ready running |- 13:0:2:0 sdf 8:80 active ready running |- 13:0:3:0 sdi 8:128 active ready running |- 14:0:0:0 sdc 8:32 active ready running |- 14:0:1:0 sde 8:64 active ready running |- 14:0:2:0 sdg 8:96 active ready running `- 14:0:3:0 sdh 8:112 active ready running almost same in 7.5 and 7.6. # virsh vol-list poolname in RHEL 7.6, it shows: Name Path ------------------------------------------------------------------------------ unit:0:0:0 /dev/disk/by-path/pci-0000:05:00.0-vport-0x5001a4a000000001-fc-0x21010002ac01e23d-lun-0 in another 7.5, it shows: Name Path ------------------------------------------------------------------------------ unit:0:0:0 /dev/disk/by-path/pci-0000:04:00.0-vport-0x5001a4ac1102f84d-fc-0x21010002ac01e23d-lun-0 unit:0:1:0 /dev/disk/by-path/pci-0000:04:00.0-vport-0x5001a4ac1102f84d-fc-0x22010002ac01e23d-lun-0 unit:0:2:0 /dev/disk/by-path/pci-0000:04:00.0-vport-0x5001a4ac1102f84d-fc-0x23010002ac01e23d-lun-0 unit:0:3:0 /dev/disk/by-path/pci-0000:04:00.0-vport-0x5001a4ac1102f84d-fc-0x20010002ac01e23d-lun-0 because of the changes, a vm with vhba will boot with failure. error like "unit:0:3:0 missing". (In reply to Adam Xu from comment #3) > In the RHEL 7.6 server, it shows > # multipath -ll This is quite like a intentional change, but I do not have a exactly same environment to debug this. Could you pls attach the libvirtd log by following steps? 1. turn on libvirtd debug log by edit conf: # vim /etc/libvirt/libvirtd.conf log_outputs="1:file:/var/log/libvirtd-debug.log" log_level=1 2. restart libvirtd to enable the debug log: # service libvirtd restart Redirecting to /bin/systemctl restart libvirtd.service 3. clear your env to make sure there is no existing npiv pool or vhbas. 4. clear the debug log by: # echo "" > /var/log/libvirtd-debug.log 5. create/start the npiv pool again 6. upload the debug log /var/log/libvirtd-debug.log here in bugzilla ================================= For example in my env, the log for the ghost running luns will be something as follow: 2018-12-11 09:45:54.673+0000: 22615: debug : virStorageBackendSCSIFindLUs:4125 : Found possible LU '16:0:0:0' 2018-12-11 09:45:54.673+0000: 22615: debug : processLU:4054 : Processing LU 16:0:0:0 2018-12-11 09:45:54.673+0000: 22615: debug : getDeviceType:4025 : Device type is 0 2018-12-11 09:45:54.673+0000: 22615: debug : processLU:4071 : 16:0:0:0 is a Direct-Access LUN 2018-12-11 09:45:54.673+0000: 22615: debug : getNewStyleBlockDevice:3852 : Looking for block device in '/sys/bus/scsi/devices/16:0:0:0/block' 2018-12-11 09:45:54.673+0000: 22615: debug : getNewStyleBlockDevice:3861 : Block device is 'sdd' 2018-12-11 09:45:54.673+0000: 22615: debug : virStorageBackendSCSINewLun:3788 : Trying to create volume for '/dev/sdd' 2018-12-11 09:45:55.205+0000: 22615: warning : virStorageBackendDetectBlockVolFormatFD:1463 : ignoring failed saferead of file '/dev/disk/by-path/pci-0000:95:00.0-vport-0x2101001b32a90000-fc-0x203500a0b85b0acc-lun-0' 2018-12-11 09:45:55.205+0000: 22615: debug : virFileClose:111 : Closed fd 24 2018-12-11 09:45:55.205+0000: 22615: debug : processLU:4082 : Failed to create new storage volume for 16:0:0:0 some like that: =================================== 2018-12-12 02:34:46.205+0000: 17898: debug : virFileClose:111 : Closed fd 22 2018-12-12 02:34:46.205+0000: 17898: debug : virFileClose:111 : Closed fd 25 2018-12-12 02:34:46.205+0000: 17898: debug : virStorageBackendSCSIFindLUs:4125 : Found possible LU '10:0:0:0' 2018-12-12 02:34:46.205+0000: 17898: debug : processLU:4054 : Processing LU 10:0:0:0 2018-12-12 02:34:46.205+0000: 17898: debug : getDeviceType:4025 : Device type is 0 2018-12-12 02:34:46.205+0000: 17898: debug : processLU:4071 : 10:0:0:0 is a Direct-Access LUN 2018-12-12 02:34:46.205+0000: 17898: debug : getNewStyleBlockDevice:3852 : Looking for block device in '/sys/bus/scsi/devices/10:0:0:0/block' 2018-12-12 02:34:46.205+0000: 17898: debug : getNewStyleBlockDevice:3861 : Block device is 'sdb' 2018-12-12 02:34:46.206+0000: 17898: debug : virStorageBackendSCSINewLun:3788 : Trying to create volume for '/dev/sdb' 2018-12-12 02:34:46.207+0000: 17898: debug : virStorageBackendDetectBlockVolFormatFD:1485 : cannot determine the target format for '/dev/disk/by-path/pci-0000:04:00.1-vport-0x5001a4a000000007-fc-0x20010002ac01e23d-lun-0' 2018-12-12 02:34:46.207+0000: 17898: debug : virFileClose:111 : Closed fd 22 2018-12-12 02:34:46.207+0000: 17898: debug : virCommandRunAsync:2476 : About to run /lib/udev/scsi_id --replace-whitespace --whitelisted --device /dev/disk/by-path/pci-0000:04:00.1-vport-0x5001a4a000000007-fc-0x20010002ac01e23d-lun-0 2018-12-12 02:34:46.208+0000: 17898: debug : virFileClose:111 : Closed fd 22 2018-12-12 02:34:46.208+0000: 17898: debug : virFileClose:111 : Closed fd 25 2018-12-12 02:34:46.208+0000: 17898: debug : virFileClose:111 : Closed fd 27 2018-12-12 02:34:46.208+0000: 17898: debug : virCommandRunAsync:2479 : Command result 0, with PID 18044 2018-12-12 02:34:46.227+0000: 17898: debug : virCommandRun:2327 : Result status 0, stdout: '360002ac000000000000005860001e23d ' stderr: '2018-12-12 02:34:46.216+0000: 18044: debug : virFileClose:111 : Closed fd 25 2018-12-12 02:34:46.216+0000: 18044: debug : virFileClose:111 : Closed fd 27 2018-12-12 02:34:46.216+0000: 18044: debug : virFileClose:111 : Closed fd 22 ===================================== full log can be got from ms onedrive: https://1drv.ms/u/s!AgdSC5Ad4nVqx1oDlZljFFOP2pwa I forgot to add something. in the above example. I created one pool and when I run # lsblk there are extra four block devices can be found. --------------------------------- sdb 8:16 0 60G 0 disk sdc 8:32 0 60G 0 disk sdd 8:48 0 60G 0 disk sde 8:64 0 60G 0 disk ---------------------------------- (In reply to Adam Xu from comment #5) > some like that: > =================================== > 2018-12-12 02:34:46.205+0000: 17898: debug : virFileClose:111 : Closed fd 22 > 2018-12-12 02:34:46.205+0000: 17898: debug : virFileClose:111 : Closed fd 25 > 2018-12-12 02:34:46.205+0000: 17898: debug : > virStorageBackendSCSIFindLUs:4125 : Found possible LU '10:0:0:0' > 2018-12-12 02:34:46.205+0000: 17898: debug : processLU:4054 : Processing LU > 10:0:0:0 > 2018-12-12 02:34:46.205+0000: 17898: debug : getDeviceType:4025 : Device > type is 0 > 2018-12-12 02:34:46.205+0000: 17898: debug : processLU:4071 : 10:0:0:0 is a > Direct-Access LUN > 2018-12-12 02:34:46.205+0000: 17898: debug : getNewStyleBlockDevice:3852 : > Looking for block device in '/sys/bus/scsi/devices/10:0:0:0/block' > 2018-12-12 02:34:46.205+0000: 17898: debug : getNewStyleBlockDevice:3861 : > Block device is 'sdb' > 2018-12-12 02:34:46.206+0000: 17898: debug : > virStorageBackendSCSINewLun:3788 : Trying to create volume for '/dev/sdb' > 2018-12-12 02:34:46.207+0000: 17898: debug : > virStorageBackendDetectBlockVolFormatFD:1485 : cannot determine the target > format for > '/dev/disk/by-path/pci-0000:04:00.1-vport-0x5001a4a000000007-fc- > 0x20010002ac01e23d-lun-0' > 2018-12-12 02:34:46.207+0000: 17898: debug : virFileClose:111 : Closed fd 22 > 2018-12-12 02:34:46.207+0000: 17898: debug : virCommandRunAsync:2476 : About > to run /lib/udev/scsi_id --replace-whitespace --whitelisted --device > /dev/disk/by-path/pci-0000:04:00.1-vport-0x5001a4a000000007-fc- > 0x20010002ac01e23d-lun-0 > 2018-12-12 02:34:46.208+0000: 17898: debug : virFileClose:111 : Closed fd 22 > 2018-12-12 02:34:46.208+0000: 17898: debug : virFileClose:111 : Closed fd 25 > 2018-12-12 02:34:46.208+0000: 17898: debug : virFileClose:111 : Closed fd 27 > 2018-12-12 02:34:46.208+0000: 17898: debug : virCommandRunAsync:2479 : > Command result 0, with PID 18044 > 2018-12-12 02:34:46.227+0000: 17898: debug : virCommandRun:2327 : Result > status 0, stdout: '360002ac000000000000005860001e23d > ' stderr: '2018-12-12 02:34:46.216+0000: 18044: debug : virFileClose:111 : > Closed fd 25 > 2018-12-12 02:34:46.216+0000: 18044: debug : virFileClose:111 : Closed fd 27 > 2018-12-12 02:34:46.216+0000: 18044: debug : virFileClose:111 : Closed fd 22 > ===================================== > full log can be got from ms onedrive: > https://1drv.ms/u/s!AgdSC5Ad4nVqx1oDlZljFFOP2pwa this log seems the vol is created. btw i cannot open the url of the log. @John, Did we change the logical of the scsi pool to remove duplicated multipath devices? thx I put it in Google drive. https://drive.google.com/open?id=1xAnnzjYUUGOessg7oPnzWnG92LObdRWP Hope it works. since there's less unit in pool of libvirt 4.5, will the performance of the multipath device lower that before? take my vm for example, there were 8 sdx device before and there are 2 sdx device left now. (In reply to Adam Xu from comment #8) > I put it in Google drive. > https://drive.google.com/open?id=1xAnnzjYUUGOessg7oPnzWnG92LObdRWP > Hope it works. > > since there's less unit in pool of libvirt 4.5, will the performance of the > multipath device lower that before? > take my vm for example, there were 8 sdx device before and there are 2 sdx > device left now. Thx a lot, the log can be accessed. Hi John, I saw the log has error at: =============================== 2018-12-12 02:34:46.227+0000: 17898: debug : virStorageBackendSCSIFindLUs:4125 : Found possible LU '10:0:1:0' 2018-12-12 02:34:46.227+0000: 17898: debug : processLU:4054 : Processing LU 10:0:1:0 2018-12-12 02:34:46.227+0000: 17898: debug : getDeviceType:4025 : Device type is 0 2018-12-12 02:34:46.227+0000: 17898: debug : processLU:4071 : 10:0:1:0 is a Direct-Access LUN 2018-12-12 02:34:46.227+0000: 17898: debug : getNewStyleBlockDevice:3852 : Looking for block device in '/sys/bus/scsi/devices/10:0:1:0/block' 2018-12-12 02:34:46.227+0000: 17898: debug : getNewStyleBlockDevice:3861 : Block device is 'sdc' 2018-12-12 02:34:46.227+0000: 17898: debug : virStorageBackendSCSINewLun:3788 : Trying to create volume for '/dev/sdc' 2018-12-12 02:34:46.228+0000: 17898: debug : virStorageBackendDetectBlockVolFormatFD:1485 : cannot determine the target format for '/dev/disk/by-path/pci-0000:04:00.1-vport-0x5001a4a000000007-fc-0x21010002ac01e23d-lun-0' 2018-12-12 02:34:46.228+0000: 17898: debug : virFileClose:111 : Closed fd 22 2018-12-12 02:34:46.228+0000: 17898: debug : virCommandRunAsync:2476 : About to run /lib/udev/scsi_id --replace-whitespace --whitelisted --device /dev/disk/by-path/pci-0000:04:00.1-vport-0x5001a4a000000007-fc-0x21010002ac01e23d-lun-0 2018-12-12 02:34:46.229+0000: 17898: debug : virFileClose:111 : Closed fd 22 2018-12-12 02:34:46.229+0000: 17898: debug : virFileClose:111 : Closed fd 25 2018-12-12 02:34:46.229+0000: 17898: debug : virFileClose:111 : Closed fd 27 2018-12-12 02:34:46.229+0000: 17898: debug : virCommandRunAsync:2479 : Command result 0, with PID 18045 2018-12-12 02:34:46.240+0000: 17898: debug : virCommandRun:2327 : Result status 0, stdout: '360002ac000000000000005860001e23d ' stderr: '2018-12-12 02:34:46.237+0000: 18045: debug : virFileClose:111 : Closed fd 25 2018-12-12 02:34:46.237+0000: 18045: debug : virFileClose:111 : Closed fd 27 2018-12-12 02:34:46.237+0000: 18045: debug : virFileClose:111 : Closed fd 22 ' 2018-12-12 02:34:46.240+0000: 17898: debug : virFileClose:111 : Closed fd 26 2018-12-12 02:34:46.241+0000: 17898: debug : virFileClose:111 : Closed fd 23 2018-12-12 02:34:46.241+0000: 17898: info : virObjectNew:248 : OBJECT_NEW: obj=0x7f2b941e1030 classname=virStorageVolObj 2018-12-12 02:34:46.241+0000: 17898: error : virHashAddOrUpdateEntry:341 : internal error: Duplicate key 2018-12-12 02:34:46.241+0000: 17898: info : virObjectUnref:344 : OBJECT_UNREF: obj=0x7f2b941e1030 2018-12-12 02:34:46.241+0000: 17898: info : virObjectUnref:346 : OBJECT_DISPOSE: obj=0x7f2b941e1030 2018-12-12 02:34:46.241+0000: 17898: debug : processLU:4087 : Created new storage volume for 10:0:1:0 successfully =============================== In commit be1bb6c9, your added virHashAddEntry(volumes->objsKey, voldef->key, volobj) in virStoragePoolObjAddVol(). And since Adam's luns in comment 0 under /dev/disk/by-path/ are actually having the same backend device, they're just multipath. So the "/lib/udev/scsi_id --replace-whitespace --whitelisted --device /dev/disk/by-path/all_thest_devices" will return the same scsi id and assigned to vol->key. And this will cause a "Duplicate key" error when the second/third.. vol being created. So I assume this was an intentional change by you. If so, could be a NOTABUG. But one thing confusing me is that when "Duplicate key" error reported, all the functions will return a -1, so seems it should never enter the code to log "processLU:4087 : Created new storage volume for 10:0:1:0 successfully". Maybe something wrong? thx. First off - let me go back to the problem statement... Not that this is the problem here, but perhaps a "slight misconception" over the steps to create vHBAs. In particular the described steps: > # virsh nodedev-create vhba_host1.xml > 3.show wwnn and wwpn id of the vhba > # virsh nodedev-dumpxml scsi_host13 > ... > <wwnn>5001a4a695e4124d</wwnn> > <wwpn>5001a4ad261be7ed</wwpn> > ... > 4.create storage pool: > # cat vhbapool_host1.xml > ... > <source> > <adapter type='fc_host' wwnn='5001a4a695e4124d' wwpn='5001a4ad261be7ed'/> > </source> > ... > > # virsh pool-define vhbapool_host1.xml > it will succeed in rhel 7.5. but failed in rhel 7.6. errors like "the wwnn and wwpn id have been used by a hba device" > I have to reboot the kvm server and run that command again, this time, it succeed. The description on https://wiki.libvirt.org/page/NPIV_in_libvirt indicates creating a vHBA is "either directly using the node device driver or indirectly via a libvirt storage pool". What you attempted to do was create the "same" vHBA when creating the storage pool. The "reboot" of the kvm server would have been unnecessary if you had done a 'virsh nodedev-destroy scsi_hostXX' where the XX is replaced by the scsi_hostXX that was created by the nodedev-create command (in your case scsi_host13). This method is an example of a dynamic or transient vHBA. The "reason" for the storage pool definition is to remove that transientness and create a more persistent definition that uses a storage pool (and storage pool's can be transient too if using virsh pool-create instead of virsh pool-define). So the 'reason' why your reboot worked for you was that the transient scsi_host13 was removed, but I digress. So, back to the question... Commit be1bb6c9 was to move volumes from a forward linked list into a hash table for "faster" search capabilities, so it's not an intentional change as it relates to vHBA LUN additions. Volumes are "stored" by 3 different lookup keys (key, name, path) with the goal being to use any of the keys to access the volume. The "theory" is that each should be unique, but that doesn't seem to be the case here since your research shows perhaps a flaw for at least how vHBA LUNs are handled using the /lib/udev/scsi_id command. Perhaps a different way to 'construct' the key needs to be done when the LUN search is being done for a vHBA. I have a couple of ideas (although they get ugly fast). I found that scsi_id also has an --export option which for a vHBA and vHBA LUNs will print a "ID_TARGET_PORT=#" string. That string could be used when creating the serial/key value to create a string. Let's see what I can put together. HI, John. Thank you for your reply. In fact, the "slight misconception" of creating vHBA comes from here: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-NPIV_storage.html In the charpters 13.7.1 and 13.7.2, when create the storage pool, it use the same wwnn and wwpn id that come from 13.7.1 step. So I thought these two steps may be related in the past. In fact we can define the storage pool directly after we generate the wwnn and wwpn id, right? In RHEL7.5 and earlier, This tutorial will not give any error, Only in RHEL7.6, it will give an error like "the wwnn and wwpn id have been used by a hba device" now, I known that I just need delete the node dev before I create the storage pool. thank you and yisun. last question: the vm has installed the multipath package, it has 2 lun devices now while it has 8 lun devices in the past, will the performance in RHEL7.6 be lower than RHEL7.5 in theory? (In reply to Adam Xu from comment #11) > HI, John. Thank you for your reply. In fact, the "slight misconception" of > creating vHBA comes from here: > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/ > html/Virtualization_Deployment_and_Administration_Guide/sect-NPIV_storage. > html > > In the charpters 13.7.1 and 13.7.2, when create the storage pool, it use the > same wwnn and wwpn id that come from 13.7.1 step. So I thought these two > steps may be related in the past. In fact we can define the storage pool > directly after we generate the wwnn and wwpn id, right? > The RHEL docs were created from the wiki I listed above and I suppose I can see how they can be confusing. > In RHEL7.5 and earlier, This tutorial will not give any error, Only in > RHEL7.6, it will give an error like "the wwnn and wwpn id have been used by > a hba device" Strange, I don't recall anything changing that would cause that, but I'll look. It's a separate issue, let me focus on the missing/unreported LUNs first. I can reproduced what was seen, but my test environment is "flaky" right now. I'm hoping to post something upstream shortly. > > now, I known that I just need delete the node dev before I create the > storage pool. thank you and yisun. > > last question: > the vm has installed the multipath package, it has 2 lun devices now while > it has 8 lun devices in the past, will the performance in RHEL7.6 be lower > than RHEL7.5 in theory? The performance has nothing to do with the LUNs themselves... It's more of a number of LUNs type thing. If you had 100 or 1000 LUNs, then it would potentially take compute time to walk that list in order to "find" any particular LUN; whereas, with a hash table, the lookup is much faster since there's at most I think 6-10 LUN's in any one hash bucket. I haven't done any sort of real characterization. It's mostly an algorithm type observation. Posted a patch upstream for this: https://www.redhat.com/archives/libvir-list/2018-December/msg00562.html A second round of patches was posted: https://www.redhat.com/archives/libvir-list/2019-January/msg00657.html With a couple of review mods, these are now pushed upstream: commit 850cfd75beb7872b20439eccda0bcf7b68cab525 Author: John Ferlan <jferlan> Date: Fri Jan 18 08:33:10 2019 -0500 storage: Fetch a unique key for vHBA/NPIV LUNs ... Commit be1bb6c95 changed the way volumes were stored from a forward linked list to a hash table. In doing so, it required that each vol object would have 3 unique values as keys into tables - key, name, and path. Due to how vHBA/NPIV LUNs are created/used this resulted in a failure to utilize all the LUN's found during processing. During virStorageBackendSCSINewLun processing fetch the key (or serial value) for NPIV LUN's using virStorageFileGetNPIVKey which will formulate a more unique key based on the serial value and the port for the LUN. Signed-off-by: John Ferlan <jferlan> ACKed-by: Michal Privoznik <mprivozn> Reviewed-by: Ján Tomko <jtomko> $ git describe 850cfd75beb7872b20439eccda0bcf7b68cab525 v5.0.0-203-g850cfd75be $ Test with libvirt-4.5.0-12.virtcov.el7.x86_64 and result is: PASSED Test steps: 1. having a pool xml as follow: [root@dell-per730-58 ~]# cat pool <pool type='scsi'> <name>vp</name> <source> <adapter type='fc_host' wwnn='20000000c99e2b80' wwpn='1000000000000002' parent='scsi_host11'/> </source> <target> <path>/dev/disk/by-path</path> <permissions> <mode>0700</mode> <owner>0</owner> <group>0</group> </permissions> </target> </pool> 2. start the pool [root@dell-per730-58 ~]# virsh pool-define pool Pool vp defined from pool [root@dell-per730-58 ~]# virsh pool-start vp Pool vp started 3. check the newly connected luns [root@dell-per730-58 ~]# lsscsi ... [120:0:0:0] disk IBM 2145 0000 /dev/sdd [120:0:0:1] disk IBM 2145 0000 /dev/sde [120:0:1:0] disk IBM 2145 0000 /dev/sdf [120:0:1:1] disk IBM 2145 0000 /dev/sdg 4. check their multipath device, we can see sdg&sde and sdd&sdf having same backends [root@dell-per730-58 ~]# multipath -ll mpathe (360050763008084e6e000000000000062) dm-6 IBM ,2145 size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=50 status=active | `- 120:0:1:1 sdg 8:96 active ready running `-+- policy='service-time 0' prio=10 status=enabled `- 120:0:0:1 sde 8:64 active ready running mpathd (360050763008084e6e000000000000066) dm-5 IBM ,2145 size=15G features='1 queue_if_no_path' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=50 status=active | `- 120:0:1:0 sdf 8:80 active ready running `-+- policy='service-time 0' prio=10 status=enabled `- 120:0:0:0 sdd 8:48 active ready running 5. vol-list the pool, all devices should be listed even if they have same backends [root@dell-per730-58 ~]# virsh vol-list vp Name Path ------------------------------------------------------------------------------ unit:0:0:0 /dev/disk/by-path/pci-0000:06:00.0-vport-0x1000000000000002-fc-0x50050768030939b6-lun-0 unit:0:0:1 /dev/disk/by-path/pci-0000:06:00.0-vport-0x1000000000000002-fc-0x50050768030939b6-lun-1 unit:0:1:0 /dev/disk/by-path/pci-0000:06:00.0-vport-0x1000000000000002-fc-0x50050768030939b7-lun-0 unit:0:1:1 /dev/disk/by-path/pci-0000:06:00.0-vport-0x1000000000000002-fc-0x50050768030939b7-lun-1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2294 |