Bug 1276198
Summary: | Fail to create pool with a virtual HBA in NPIV | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Han Han <hhan> | ||||||
Component: | libvirt | Assignee: | John Ferlan <jferlan> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.2 | CC: | dyuan, jferlan, pzhang, rbalakri, sherold, xuzhang, yanyang, yisun | ||||||
Target Milestone: | rc | Keywords: | Regression | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | libvirt-1.3.1-1.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-11-03 18:29:03 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1288337, 1305606, 1313485 | ||||||||
Attachments: |
|
The /tmp/libvirt/libvirtd.log that you created has some good information that may have been useful to provide, but of course you'd have to know how to decipher it! Anyway, what libvirt does in order to find LU's s to scan the directory: /sys/bus/scsi/devices/ Looking for $hostnumber:$bus:$target:$lun directories. Since I tried after you, my example would be directories that start with 6 since my vHBA is scsi_host6, which were: 6:0:0:0 6:0:1:0 6:0:2:0 6:0:3:0 In each directory it opens the 'type' file looking to make sure the returned value is either "0x00" (DISK) or "0x05" (ROM). If so it declares it's a "Direct-Access LUN" Then it looks in any of this for the "block" file (or in older systems if it starts with "block")... Essentially it's looking to read that file in order to find the name of the device associated with the lun. In my case 6:0:2:0 is used "sdd" and 6:0:3:0 uses "sde". Once it has those it can scan the target path /dev/disk/by-path for the first entry using the block device name - in this case: ll /dev/disk/by-path ... lrwxrwxrwx. 1 root root 9 Oct 30 04:38 pci-0000:04:00.1-fc-0x203600a0b85b5dd4-lun-0 -> ../../sdd lrwxrwxrwx. 1 root root 9 Oct 30 04:38 pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0 -> ../../sde This provides what libvirt determines is the "stable path" to the device. Libvirt then needs to get information about the size and format from the file. In the log file you'll see a call to virStorageBackendDetectBlockVolFormatFD. When it goes to read() the file, there's an error EIO So all that said - I personally haven't encountered multipath and from what I can tell from the code, I'm not sure libvirt can handle it. If you look at the host in the directory /home/jferlan, you'll find a couple of files - if you run "detectformat" - it essentially recreates what libvirt does. For the block devices created - the first one cannot determine the format of the block device and the second well as you know fails. I'll have to do some more digging to see what I can come up... I'll leave things as they are now and hit back on this when I get up in the morning. Thank for your reply. I stopped multipathd.service and clean the environment as following. # systemctl stop multipathd.service # dmsetup remove_all # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 232.9G 0 disk ├─sda1 8:1 0 500M 0 part /boot ├─sda2 8:2 0 48.8G 0 part ├─sda4 8:4 0 1K 0 part ├─sda5 8:5 0 50G 0 part / ├─sda6 8:6 0 27.5G 0 part └─sda7 8:7 0 48.8G 0 part sdb 8:16 0 20G 0 disk sdc 8:32 0 20G 0 disk sr0 11:0 1 3G 0 rom Then try to reproduce the bug: # virsh nodedev-create newhba.xml Node device scsi_host11 created from newhba.xml # virsh pool-create by-path.xml error: Failed to create pool from by-path.xml error: cannot read beginning of file '/dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0': Input/output error # ll /dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0 lrwxrwxrwx. 1 root root 9 Oct 30 12:09 /dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0 -> ../../sde So I think the bug isn't related to multipath. One more issue may about the bug: When the pool xml with managed='yes',a unknown error appears. # rpm -q libvirt libgudev1 systemd qemu-kvm-rhev libvirt-1.2.17-13.el7.x86_64 libgudev1-219-19.el7.x86_64 systemd-219-19.el7.x86_64 qemu-kvm-rhev-2.3.0-31.el7.x86_64 1. Create a nodedev just as comment1. # virsh nodedev-list --cap scsi_host scsi_host0 scsi_host1 scsi_host2 scsi_host3 scsi_host4 scsi_host5 # virsh nodedev-create newhba.xml Node device scsi_host12 created from newhba.xml 2. Create a pool with the same wwnn:wwpn in scsi_host12 and managed='yes'. # cat managed.xml <pool type="scsi"> <name>host9</name> <source> <adapter type='fc_host' wwnn='2101001b32a9da4e' wwpn='2101001b32a90001' managed='yes'/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool> # virsh pool-create managed.xml error: Failed to create pool from managed.xml error: An error occurred, but the cause is unknown Get the log: # grep -Ei '(virStorage|udev)' /tmp/libvirt/libvirtd.log > unknown_error.log In log, I get the error like that in comment1: 2015-10-30 05:46:06.108+0000: 1944: debug : virStorageBackendSCSIFindLUs:491 : Found possible LU '12:0:3:0' 2015-10-30 05:46:06.109+0000: 1944: debug : virStorageBackendSCSINewLun:203 : Trying to create volume for '/dev/sde' 2015-10-30 05:46:06.109+0000: 1944: error : virStorageBackendDetectBlockVolFormatFD:1372 : cannot read beginning of file '/dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0': Input/output error 3. The next pool-create is success: # virsh pool-create managed.xml Pool test created from managed.xml # virsh vol-list test Name Path ------------------------------------------------------------------------------ unit:0:2:0 /dev/disk/by-path/pci-0000:04:00.1-fc-0x203600a0b85b5dd4-lun-0 I also find the after the step2 error occurs, the nodedev created in step1 is destroyed. Created attachment 1087814 [details]
the unkown error log
By the way, if set managed='no' in comment6, we will get the Input/output error the same with the error in comment1. I'm still trying to find a root the reason/cause for the issue. At this point, I believe it has something to do with the *physical* volume format or configuration. From the docs about managed for storage pools: "An optional attribute to instruct the SCSI storage backend to manage destroying the vHBA when the pool is destroyed. For configurations that do not provide an already created vHBA from a 'virsh nodedev-create', libvirt will set this property to "yes". For configurations that have already created a vHBA via 'virsh nodedev-create' and are using the wwnn/wwpn from that vHBA and optionally the scsi_host parent, setting this attribute to "yes" will allow libvirt to destroy the node device when the pool is destroyed. If this attribute is set to "no" or not defined in the XML, then libvirt will not destroy the vHBA. Since 1.2.11" So back to the configuration... Running that lsblk command again In fact something is different about both volumes as the one that can be found/read (0:2:0) Looking just now on that host I see: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 232.9G 0 disk ├─sda1 8:1 0 500M 0 part /boot ├─sda2 8:2 0 48.8G 0 part │ ├─rhel_dell--pet105--04-root 253:1 0 41G 0 lvm │ └─rhel_dell--pet105--04-swap 253:2 0 7.9G 0 lvm ├─sda4 8:4 0 1K 0 part ├─sda5 8:5 0 50G 0 part / ├─sda6 8:6 0 27.5G 0 part └─sda7 8:7 0 48.8G 0 part sdb 8:16 0 20G 0 disk └─3600a0b80005adb0b0000ab2d4cae9254 253:0 0 20G 0 mpath sdc 8:32 0 20G 0 disk └─3600a0b80005adb0b0000ab2d4cae9254 253:0 0 20G 0 mpath sr0 11:0 1 3G 0 rom # ls -al /dev/disk/by-path/pci-0000:04:00.1-fc-0x203600a0b85b5dd4-lun-0 lrwxrwxrwx. 1 root root 9 Oct 30 15:46 /dev/disk/by-path/pci-0000:04:00.1-fc-0x203600a0b85b5dd4-lun-0 -> ../../sdb # ls -al /dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0 lrwxrwxrwx. 1 root root 9 Oct 30 15:46 /dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0 -> ../../sdc Running my program which detects the format (/home/jferlan/detectformat) more or less like libvirt does: # ./detectformat /dev/sdb Open '/dev/sdb' for stat stat st_blocks=0 st_blksize=4096 st_size=0 statfs f_bsize=4096 f_blocks=989063 f_bfree=989063 f_bavail=989063 lseek end=21481278464 (20977811K, 20486M, 20G) lseek start=0 # ./detectformat /dev/sdc Open '/dev/sdc' for stat stat st_blocks=0 st_blksize=4096 st_size=0 statfs f_bsize=4096 f_blocks=989063 f_bfree=989063 f_bavail=989063 lseek end=21481278464 (20977811K, 20486M, 20G) lseek start=0 part_table_type = VIR_STORAGE_POOL_DISK_LVM2 offset=536 length=8 bytes=4096 part_table_type = VIR_STORAGE_POOL_DISK_GPT offset=512 length=8 bytes=4096 part_table_type = VIR_STORAGE_POOL_DISK_DVH offset=0 length=4 bytes=4096 part_table_type = VIR_STORAGE_POOL_DISK_MAC offset=0 length=2 bytes=4096 part_table_type = VIR_STORAGE_POOL_DISK_BSD offset=64 length=4 bytes=4096 part_table_type = VIR_STORAGE_POOL_DISK_SUN offset=508 length=2 bytes=4096 part_table_type = VIR_STORAGE_POOL_DISK_PC98 offset=0 length=8 bytes=4096 part_table_type = VIR_STORAGE_POOL_DISK_DOS offset=510 length=2 bytes=4096 If you then use "od -x /dev/sdb" - it will fail with the open error If you then use "od -x /dev/sdc | less" - it succeeds, but the format is not what I'd expect. I also note it seems someone is doing something on the machine right now, so I won't disrupt that. I will note that the 'lsblk' returns 'mpath' as the type for those two 3600* entries in the output - that's different and something I'd need more time to look at. As a comparison, here's an environment I use: sda 8:0 1 136.5G 0 disk ├─sda1 8:1 1 24.4G 0 part / ├─sda2 8:2 1 8.9G 0 part │ └─fedora_virtlab102-swap 253:0 0 8.9G 0 lvm [SWAP] └─sda3 8:3 1 103.2G 0 part ├─libvirt--volumes-rharwood--ninja 253:2 0 8G 0 lvm └─libvirt--volumes-rharwood--distcc 253:3 0 8G 0 lvm sdh 8:112 0 93.1G 0 disk sdi 8:128 0 93.1G 0 disk └─rhallisey-openstack 253:1 0 90G 0 lvm sr0 11:0 1 1024M 0 rom # ll /dev/disk/by-path/pci-0000:10:00.1-fc-0x207800c0ffd79b2a-lun-13 lrwxrwxrwx. 1 root root 9 Oct 29 19:05 /dev/disk/by-path/pci-0000:10:00.1-fc-0x207800c0ffd79b2a-lun-13 -> ../../sdh # ll /dev/disk/by-path/pci-0000:10:00.0-fc-0x217800c0ffd79b2a-lun-13 lrwxrwxrwx. 1 root root 9 Oct 29 18:33 /dev/disk/by-path/pci-0000:10:00.0-fc-0x217800c0ffd79b2a-lun-13 -> ../../sdi # # ./detectformat /dev/sdh Open '/dev/sdh' for stat stat st_blocks=0 st_blksize=4096 st_size=0 statfs f_bsize=4096 f_blocks=2309967 f_bfree=2309967 f_bavail=2309967 lseek end=100000595968 (97656832K, 95368M, 93G) lseek start=0 part_table_type = VIR_STORAGE_POOL_DISK_LVM2 offset=536 length=8 bytes=1024 part_table_type = VIR_STORAGE_POOL_DISK_LVM2 #./detectformat /dev/sdi Open '/dev/sdi' for stat stat st_blocks=0 st_blksize=4096 st_size=0 statfs f_bsize=4096 f_blocks=2309967 f_bfree=2309967 f_bavail=2309967 lseek end=100000595968 (97656832K, 95368M, 93G) lseek start=0 part_table_type = VIR_STORAGE_POOL_DISK_LVM2 offset=536 length=8 bytes=1024 part_table_type = VIR_STORAGE_POOL_DISK_LVM2 # od -x /dev/sdh | less 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000 414c 4542 4f4c 454e 0001 0000 0000 0000 0001020 89f8 f320 0020 0000 564c 324d 3020 3130 0001040 4145 7231 634d 7667 3944 6d36 434e 3369 0001060 7a65 5435 4f6f 4144 785a 6b45 634c 704e 0001100 0000 4880 0017 0000 0000 0010 0000 0000 ... # od -x /dev/sdi | less 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000 414c 4542 4f4c 454e 0001 0000 0000 0000 0001020 89f8 f320 0020 0000 564c 324d 3020 3130 0001040 4145 7231 634d 7667 3944 6d36 434e 3369 0001060 7a65 5435 4f6f 4144 785a 6b45 634c 704e For both you see a patter on the "1020" line - that matches the pattern for LVM2 found in the detectformat program. When I ran od -x /dev/sdd last night - I found something like: 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 0002000 ... Note the difference of 0002000 vs 0001000. Since the buffer we use to read was only 1024 bytes, it never read the file header. That's why I changed my code to use 4096, but I hadn't fully dug into the format yet... What I do know is *that's* what is causing the issue. The fact that you succeeded sometimes while failed others could have to do with the timing of the device showing up via udev vs. the time it took for libvirt to scan what was available. When you listed that volume, it may have been interesting to see the output of a "virsh vol-list $pool --details" - that would have show the Name, Path, Type, Capacity, and Allocation. Finally, I can tell you've been busy - do a "virsh nodedev-list scsi_host" - see all those stray "scsi_host##"'s - that's what managed='no' gets you. I also see that a "virsh nodedev-dumpxml scsi_host5" shows all the vports in use So after a bit of IRC conversation and some more "debugging" - it seems as though the block device file format is a bit of a red herring (e.g. no big deal); however, not being able to read the block device is a problem - well at least partially. The target format only matters insomuch as being able to display it - for this case there's less concern over what the format is. Although perhaps libvirt should perhaps spit out a VIR_WARN regarding the inability to determine the format. So the "issue" ends up being that as of bug 1171933, the code that handles stable path checking has undergone change in order to better detect good and bad configurations. In this particular case, prior to that set of patches, this particular volume would just be ignored since libvirt wasn't able to open the disk block device. With those changes, the error is now recognized and propagated back to the user/startup code. It's a simple enough adjustment in order to "ignore" such a device while still logging it as before. I have a fix that works and will post something upstream shortly. Posted a patch upstream: http://www.redhat.com/archives/libvir-list/2015-October/msg00919.html Series has been pushed, see: commit a523770c32f5beff7faa63b5b685d9a2a9a04878 Author: John Ferlan <jferlan> Date: Fri Oct 30 10:29:46 2015 -0400 git describe a523770c32f5beff7faa63b5b685d9a2a9a04878 v1.3.0-49-ga523770 Checked on libvirt-1.3.2-1.el7.x86_64, find a problem. Run comment1 looply with following scripts: #!/bin/bash -x for i in {14..58};do echo '2101001b32a90000:2100001b32a9da4e' > /sys/class/fc_host/host5/vport_create sed -i "s/host[0-9]*/host$i/g" by-path.xml virsh pool-create by-path.xml if [ $? -ne 0 ];then exit fi sleep 2 virsh pool-destroy host$i sleep 2 virsh nodedev-destroy scsi_host$i done A problem appeared when run into the 3rd loop: + for i in '{14..58}' + echo 2101001b32a90000:2100001b32a9da4e + sed -i 's/host[0-9]*/host16/g' by-path.xml + virsh pool-create by-path.xml Pool host16 created from by-path.xml + '[' 0 -ne 0 ']' + sleep 2 + virsh pool-destroy host16 Pool host16 destroyed + sleep 2 + virsh nodedev-destroy scsi_host16 error: Failed to destroy node device 'scsi_host16' error: internal error: Device is not a fibre channel HBA + for i in '{14..58}' + echo 2101001b32a90000:2100001b32a9da4e ./run.sh: line 3: echo: write error: Name not unique on network + sed -i 's/host[0-9]*/host17/g' by-path.xml + virsh pool-create by-path.xml error: Failed to create pool from by-path.xml error: Could not open '/sys/class/scsi_host/host17/scan' to trigger host scan: No such file or directory + '[' 1 -ne 0 ']' + exit It seems strange that the scsi_host16 cannot be destroyed. And if you change those sleep 2 to sleep 5 or sleep 10? Does that change the results? The point is - the underlying infrastructure (e.g. all that must happen asynchronously) is robust enough for a script that's thrashing create/delete processing. Are there any interesting 'entrails' in the libvirtd.log or host's messages log? Even for volume population - there's no guarantee that the volume appears immediately. A few pool-refresh's later it may show up. In any case, the multiple create/delete/destroy is a different problem than the original problem. I'm not saying it's not an issue, but it's not overly surprising either as there's quite a bit of infrastructure changes that needs to take place for which libvirt has no control (device create/delete, file link create/delete, systemd interactions, udev interactions, etc.) I can reproduce the bug. Verify it on libvirt-1.3.3-2.el7.x86_64 Let's do the loop of create_vHBA --> create_vHBA_pool --> destroy_vHBA_pool --> delete_vHBA: #!/bin/bash -x for i in host{37..100};do echo '2101001b32a90000:2100001b32a9da4e' > /sys/class/fc_host/host4/vport_create sed -i "s/host[0-9]*/$i/g" pool.xml virsh pool-create pool.xml if [ $? -ne 0 ];then echo error exit fi virsh nodedev-list --tree|grep $i -A 28 virsh pool-destroy $i echo '2101001b32a90000:2100001b32a9da4e' > /sys/class/fc_host/host4/vport_delete done No error happens in the loop. Bug fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |
Created attachment 1087363 [details] Script and result Description of problem: As subject Version-Release number of selected component (if applicable): libvirt-1.2.17-13.el7.x86_64 kernel-3.10.0-326.el7.x86_64 qemu-kvm-rhev-2.3.0-23.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Show HBA in NPIV: # ls /sys/class/fc_host/ host1 host4 2. create a vHBA: <device> <parent>scsi_host4</parent> <capability type='scsi_host'> <capability type='fc_host'> <wwnn>2101001b32a9da4e</wwnn> <wwpn>2101001b32a90005</wwpn> </capability> </capability> </device> 3. Show the newly created details: # virsh nodedev-list --tree|grep host153 -A 28 | +- scsi_host153 | | | | | +- scsi_target153_0_0 | | | | | | | +- scsi_153_0_0_0 | | | | | | | +- scsi_generic_sg8 | | | | | +- scsi_target153_0_1 | | | | | | | +- scsi_153_0_1_0 | | | | | | | +- scsi_generic_sg9 | | | | | +- scsi_target153_0_2 | | | | | | | +- scsi_153_0_2_0 | | | | | | | +- block_sdf_3600a0b80005ad1d700002ddc4fa32c87 | | | +- scsi_generic_sg10 | | | | | +- scsi_target153_0_3 | | | | | +- scsi_153_0_3_0 | | | | | +- block_sdg_3600a0b80005ad1d700002ddc4fa32c87 | | +- scsi_generic_sg11 4. create a pool with the vHBA # cat by-path.xml <pool type="scsi"> <name>host153</name> <source> <adapter name="host153"/> </source> <target> <path>/dev/disk/by-path</path> </target> </pool> # virsh pool-create by-path.xml error: Failed to create pool from by-path.xml error: cannot read beginning of file '/dev/disk/by-path/pci-0000:18:00.1-fc-0x203700a0b85b5dd4-lun-0': Input/output error # ll /dev/disk/by-path/pci-0000:18:00.1-fc-0x203700a0b85b5dd4-lun-0 lrwxrwxrwx. 1 root root 9 Oct 29 10:52 /dev/disk/by-path/pci-0000:18:00.1-fc-0x203700a0b85b5dd4-lun-0 -> ../../sdg # multipath -ll 3600a0b80005ad1d700002ddc4fa32c87 dm-2 IBM ,1726-4xx FAStT size=10G features='0' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=6 status=active | `- 153:0:2:0 sdf 8:80 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 153:0:3:0 sdg 8:96 active ghost running 3600a0b80005ada7600006b16546bd5bc dm-0 IBM ,1726-4xx FAStT size=30G features='0' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=6 status=active | `- 4:0:3:0 sdd 8:48 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 4:0:2:0 sdb 8:16 active ghost running 3600a0b80005ada7600008edb54efd406 dm-1 IBM ,1726-4xx FAStT size=50G features='0' hwhandler='1 rdac' wp=rw |-+- policy='service-time 0' prio=6 status=active | `- 4:0:3:1 sde 8:64 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 4:0:2:1 sdc 8:32 active ghost running So it seems that libvirt trys to read the active ghost running disk and causes the Input/output error. 5. Change by-path to by-id in the pool xml, sometimes pool-create succeeds. Actual results: As step4~5 Expected results: pool-create success, no errors. Additional info: When downgrade to kernel-3.10.0-316.el7.x86_64.rpm,libvirt-1.2.17-8.el7.x86_64, seems no change in the result. But when switch to RHEL6.7, pool-create is successful and can list the volume, whether by-path or by-id. In libvirtd log, I got errors like which is in step4: 2015-10-28 10:06:37.929+0000: 2691: error : virStorageBackendDetectBlockVolFormatFD:1332 : cannot read beginning of file '/dev/disk/by-id/scsi-3600a0b80005ad1d700002dde4fa32ca8': Input/output error 2015-10-28 10:06:37.929+0000: 2691: error : virStorageBackendSCSINewLun:274 : internal error Failed to update volume for '/dev/sde' 2015-10-28 10:06:37.929+0000: 2691: error : virStorageBackendSCSINewLun:274 : internal error Failed to update volume for '/dev/sde' You can use the script in attachment to run the steps in loop.