1276198 – Fail to create pool with a virtual HBA in NPIV

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1276198 - Fail to create pool with a virtual HBA in NPIV

Summary: Fail to create pool with a virtual HBA in NPIV

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	John Ferlan
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1288337 1305606 1313485
TreeView+	depends on / blocked

Reported:	2015-10-29 04:04 UTC by Han Han
Modified:	2016-11-03 18:29 UTC (History)
CC List:	8 users (show)
Fixed In Version:	libvirt-1.3.1-1.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-11-03 18:29:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Script and result (1.32 KB, application/x-gzip) 2015-10-29 04:04 UTC, Han Han	no flags	Details
the unkown error log (23.74 KB, text/plain) 2015-10-30 05:58 UTC, Han Han	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:2577	0	normal	SHIPPED_LIVE	Moderate: libvirt security, bug fix, and enhancement update	2016-11-03 12:07:06 UTC

Description Han Han 2015-10-29 04:04:29 UTC

Created attachment 1087363 [details]
Script and result

Description of problem:
As subject

Version-Release number of selected component (if applicable):
libvirt-1.2.17-13.el7.x86_64
kernel-3.10.0-326.el7.x86_64
qemu-kvm-rhev-2.3.0-23.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Show HBA in NPIV:
# ls /sys/class/fc_host/
host1  host4

2. create a vHBA:
<device>
  <parent>scsi_host4</parent>
  <capability type='scsi_host'>
    <capability type='fc_host'>
      <wwnn>2101001b32a9da4e</wwnn>
      <wwpn>2101001b32a90005</wwpn>
    </capability>
  </capability>
</device>

3. Show the newly created details:
# virsh nodedev-list --tree|grep host153 -A 28
  |           +- scsi_host153
  |           |   |
  |           |   +- scsi_target153_0_0
  |           |   |   |
  |           |   |   +- scsi_153_0_0_0
  |           |   |       |
  |           |   |       +- scsi_generic_sg8
  |           |   |        
  |           |   +- scsi_target153_0_1
  |           |   |   |
  |           |   |   +- scsi_153_0_1_0
  |           |   |       |
  |           |   |       +- scsi_generic_sg9
  |           |   |        
  |           |   +- scsi_target153_0_2
  |           |   |   |
  |           |   |   +- scsi_153_0_2_0
  |           |   |       |
  |           |   |       +- block_sdf_3600a0b80005ad1d700002ddc4fa32c87
  |           |   |       +- scsi_generic_sg10
  |           |   |        
  |           |   +- scsi_target153_0_3
  |           |       |
  |           |       +- scsi_153_0_3_0
  |           |           |
  |           |           +- block_sdg_3600a0b80005ad1d700002ddc4fa32c87
  |           |           +- scsi_generic_sg11

4. create a pool with the vHBA
# cat by-path.xml
<pool type="scsi">
<name>host153</name>
<source>
<adapter name="host153"/>
</source>
<target>
<path>/dev/disk/by-path</path>
</target>
</pool>

# virsh pool-create by-path.xml
error: Failed to create pool from by-path.xml
error: cannot read beginning of file '/dev/disk/by-path/pci-0000:18:00.1-fc-0x203700a0b85b5dd4-lun-0': Input/output error

# ll /dev/disk/by-path/pci-0000:18:00.1-fc-0x203700a0b85b5dd4-lun-0
lrwxrwxrwx. 1 root root 9 Oct 29 10:52 /dev/disk/by-path/pci-0000:18:00.1-fc-0x203700a0b85b5dd4-lun-0 -> ../../sdg

# multipath -ll
3600a0b80005ad1d700002ddc4fa32c87 dm-2 IBM     ,1726-4xx  FAStT
size=10G features='0' hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=6 status=active
| `- 153:0:2:0 sdf 8:80 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 153:0:3:0 sdg 8:96 active ghost running
3600a0b80005ada7600006b16546bd5bc dm-0 IBM     ,1726-4xx  FAStT
size=30G features='0' hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=6 status=active
| `- 4:0:3:0   sdd 8:48 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 4:0:2:0   sdb 8:16 active ghost running
3600a0b80005ada7600008edb54efd406 dm-1 IBM     ,1726-4xx  FAStT
size=50G features='0' hwhandler='1 rdac' wp=rw
|-+- policy='service-time 0' prio=6 status=active
| `- 4:0:3:1   sde 8:64 active ready running
`-+- policy='service-time 0' prio=1 status=enabled
  `- 4:0:2:1   sdc 8:32 active ghost running

So it seems that libvirt trys to read the active ghost running disk and causes the Input/output error.

5. Change by-path to by-id in the pool xml, sometimes pool-create succeeds.
 

Actual results:
As step4~5

Expected results:
pool-create success, no errors.

Additional info:
When downgrade to kernel-3.10.0-316.el7.x86_64.rpm,libvirt-1.2.17-8.el7.x86_64, seems no change in the result.
But when switch to RHEL6.7, pool-create is successful and can list the volume, whether by-path or by-id. In libvirtd log, I got errors like
which is in step4:
2015-10-28 10:06:37.929+0000: 2691: error : virStorageBackendDetectBlockVolFormatFD:1332 : cannot read beginning of file
'/dev/disk/by-id/scsi-3600a0b80005ad1d700002dde4fa32ca8': Input/output error                  
2015-10-28 10:06:37.929+0000: 2691: error : virStorageBackendSCSINewLun:274 : internal error Failed to update volume for '/dev/sde'
2015-10-28 10:06:37.929+0000: 2691: error : virStorageBackendSCSINewLun:274 : internal error Failed to update volume for '/dev/sde'

You can use the script in attachment to run the steps in loop.

Comment 4 John Ferlan 2015-10-29 23:09:17 UTC

The /tmp/libvirt/libvirtd.log that you created has some good information that may have been useful to provide, but of course you'd have to know how to decipher it!

Anyway, what libvirt does in order to find LU's s to scan the directory:

/sys/bus/scsi/devices/

Looking for $hostnumber:$bus:$target:$lun directories. Since I tried after you, my example would be directories that start with 6 since my vHBA is scsi_host6, which were:

6:0:0:0
6:0:1:0
6:0:2:0
6:0:3:0

In each directory it opens the 'type' file looking to make sure the returned value is either "0x00" (DISK) or "0x05" (ROM).

If so it declares it's a "Direct-Access LUN"

Then it looks in any of this for the "block" file (or in older systems if it starts with "block")... Essentially it's looking to read that file in order to find the name of the device associated with the lun.

In my case 6:0:2:0 is used "sdd" and 6:0:3:0 uses "sde".

Once it has those it can scan the target path /dev/disk/by-path for the first entry using the block device name - in this case:

ll /dev/disk/by-path
...
lrwxrwxrwx. 1 root root 9 Oct 30 04:38 pci-0000:04:00.1-fc-0x203600a0b85b5dd4-lun-0 -> ../../sdd
lrwxrwxrwx. 1 root root 9 Oct 30 04:38 pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0 -> ../../sde

This provides what libvirt determines is the "stable path" to the device. Libvirt then needs to get information about the size and format from the file. In the log file you'll see a call to virStorageBackendDetectBlockVolFormatFD.

When it goes to read() the file, there's an error EIO

So all that said - I personally haven't encountered multipath and from what I can tell from the code, I'm not sure libvirt can handle it.

If you look at the host in the directory /home/jferlan, you'll find a couple of files - if you run "detectformat" - it essentially recreates what libvirt does. For the block devices created - the first one cannot determine the format of the block device and the second well as you know fails.

I'll have to do some more digging to see what I can come up... I'll leave things as they are now and hit back on this when I get up in the morning.

Comment 5 Han Han 2015-10-30 04:12:11 UTC

Thank for your reply.
I stopped multipathd.service and clean the environment as following.
# systemctl stop multipathd.service
# dmsetup remove_all
# lsblk 
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 232.9G  0 disk 
├─sda1   8:1    0   500M  0 part /boot
├─sda2   8:2    0  48.8G  0 part 
├─sda4   8:4    0     1K  0 part 
├─sda5   8:5    0    50G  0 part /
├─sda6   8:6    0  27.5G  0 part 
└─sda7   8:7    0  48.8G  0 part 
sdb      8:16   0    20G  0 disk 
sdc      8:32   0    20G  0 disk 
sr0     11:0    1     3G  0 rom 

Then try to reproduce the bug:
# virsh nodedev-create newhba.xml 
Node device scsi_host11 created from newhba.xml
# virsh pool-create by-path.xml
error: Failed to create pool from by-path.xml
error: cannot read beginning of file '/dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0': Input/output error
# ll /dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0
lrwxrwxrwx. 1 root root 9 Oct 30 12:09 /dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0 -> ../../sde

So I think the bug isn't related to multipath.

Comment 6 Han Han 2015-10-30 05:55:08 UTC

One more issue may about the bug:
When the pool xml with managed='yes',a unknown error appears.
# rpm -q libvirt libgudev1 systemd  qemu-kvm-rhev
libvirt-1.2.17-13.el7.x86_64
libgudev1-219-19.el7.x86_64
systemd-219-19.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7.x86_64

1. Create a nodedev just as comment1.
# virsh nodedev-list --cap scsi_host                                      
scsi_host0
scsi_host1
scsi_host2
scsi_host3
scsi_host4
scsi_host5
# virsh nodedev-create newhba.xml 
Node device scsi_host12 created from newhba.xml

2. Create a pool with the same wwnn:wwpn in scsi_host12 and managed='yes'.
# cat managed.xml 
<pool type="scsi">
<name>host9</name>
<source>
<adapter type='fc_host' wwnn='2101001b32a9da4e' wwpn='2101001b32a90001' managed='yes'/>
</source>
<target>
<path>/dev/disk/by-path</path>
</target>
</pool>
# virsh pool-create managed.xml
error: Failed to create pool from managed.xml
error: An error occurred, but the cause is unknown

Get the log:
# grep -Ei '(virStorage|udev)' /tmp/libvirt/libvirtd.log > unknown_error.log
In log, I get the error like that in comment1:
2015-10-30 05:46:06.108+0000: 1944: debug : virStorageBackendSCSIFindLUs:491 : Found possible LU '12:0:3:0'
2015-10-30 05:46:06.109+0000: 1944: debug : virStorageBackendSCSINewLun:203 : Trying to create volume for '/dev/sde'
2015-10-30 05:46:06.109+0000: 1944: error : virStorageBackendDetectBlockVolFormatFD:1372 : cannot read beginning of file '/dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0': Input/output error

3. The next pool-create is success:
# virsh pool-create managed.xml
Pool test created from managed.xml
# virsh vol-list test
 Name                 Path                                    
------------------------------------------------------------------------------
 unit:0:2:0           /dev/disk/by-path/pci-0000:04:00.1-fc-0x203600a0b85b5dd4-lun-0

I also find the after the step2 error occurs, the nodedev created in step1 is destroyed.

Comment 7 Han Han 2015-10-30 05:58:22 UTC

Created attachment 1087814 [details]
the unkown error log

Comment 8 Han Han 2015-10-30 06:02:27 UTC

By the way, if set managed='no' in comment6, we will get the Input/output error the same with the error in comment1.

Comment 9 John Ferlan 2015-10-30 10:51:06 UTC

I'm still trying to find a root the reason/cause for the issue. At this point, I believe it has something to do with the *physical* volume format or configuration.  

From the docs about managed for storage pools:

"An optional attribute to instruct the SCSI storage backend to manage destroying the vHBA when the pool is destroyed. For configurations that do not provide an already created vHBA from a 'virsh nodedev-create', libvirt will set this property to "yes". For configurations that have already created a vHBA via 'virsh nodedev-create' and are using the wwnn/wwpn from that vHBA and optionally the scsi_host parent, setting this attribute to "yes" will allow libvirt to destroy the node device when the pool is destroyed. If this attribute is set to "no" or not defined in the XML, then libvirt will not destroy the vHBA. Since 1.2.11"

So back to the configuration... Running that lsblk command again
In fact something is different about both volumes as the one that can be found/read (0:2:0) 

Looking just now on that host I see:

NAME                                MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                                   8:0    0 232.9G  0 disk  
├─sda1                                8:1    0   500M  0 part  /boot
├─sda2                                8:2    0  48.8G  0 part  
│ ├─rhel_dell--pet105--04-root      253:1    0    41G  0 lvm   
│ └─rhel_dell--pet105--04-swap      253:2    0   7.9G  0 lvm   
├─sda4                                8:4    0     1K  0 part  
├─sda5                                8:5    0    50G  0 part  /
├─sda6                                8:6    0  27.5G  0 part  
└─sda7                                8:7    0  48.8G  0 part  
sdb                                   8:16   0    20G  0 disk  
└─3600a0b80005adb0b0000ab2d4cae9254 253:0    0    20G  0 mpath 
sdc                                   8:32   0    20G  0 disk  
└─3600a0b80005adb0b0000ab2d4cae9254 253:0    0    20G  0 mpath 
sr0                                  11:0    1     3G  0 rom   
# ls -al /dev/disk/by-path/pci-0000:04:00.1-fc-0x203600a0b85b5dd4-lun-0
lrwxrwxrwx. 1 root root 9 Oct 30 15:46 /dev/disk/by-path/pci-0000:04:00.1-fc-0x203600a0b85b5dd4-lun-0 -> ../../sdb
# ls -al /dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0
lrwxrwxrwx. 1 root root 9 Oct 30 15:46 /dev/disk/by-path/pci-0000:04:00.1-fc-0x203700a0b85b5dd4-lun-0 -> ../../sdc

Running my program which detects the format (/home/jferlan/detectformat) more or less like libvirt does:

# ./detectformat /dev/sdb
Open '/dev/sdb' for stat
stat st_blocks=0 st_blksize=4096 st_size=0
statfs f_bsize=4096 f_blocks=989063 f_bfree=989063 f_bavail=989063
lseek end=21481278464 (20977811K, 20486M, 20G)
lseek start=0
# ./detectformat /dev/sdc
Open '/dev/sdc' for stat
stat st_blocks=0 st_blksize=4096 st_size=0
statfs f_bsize=4096 f_blocks=989063 f_bfree=989063 f_bavail=989063
lseek end=21481278464 (20977811K, 20486M, 20G)
lseek start=0
part_table_type = VIR_STORAGE_POOL_DISK_LVM2
offset=536 length=8 bytes=4096
part_table_type = VIR_STORAGE_POOL_DISK_GPT
offset=512 length=8 bytes=4096
part_table_type = VIR_STORAGE_POOL_DISK_DVH
offset=0 length=4 bytes=4096
part_table_type = VIR_STORAGE_POOL_DISK_MAC
offset=0 length=2 bytes=4096
part_table_type = VIR_STORAGE_POOL_DISK_BSD
offset=64 length=4 bytes=4096
part_table_type = VIR_STORAGE_POOL_DISK_SUN
offset=508 length=2 bytes=4096
part_table_type = VIR_STORAGE_POOL_DISK_PC98
offset=0 length=8 bytes=4096
part_table_type = VIR_STORAGE_POOL_DISK_DOS
offset=510 length=2 bytes=4096

If you then use "od -x /dev/sdb" - it will fail with the open error
If you then use "od -x /dev/sdc | less" - it succeeds, but the format is not what I'd expect.  I also note it seems someone is doing something on the machine right now, so I won't disrupt that.

I will note that the 'lsblk' returns 'mpath' as the type for those two 3600* entries in the output - that's different and something I'd need more time to look at.

As a comparison, here's an environment I use:

sda                                     8:0    1 136.5G  0 disk 
├─sda1                                  8:1    1  24.4G  0 part /
├─sda2                                  8:2    1   8.9G  0 part 
│ └─fedora_virtlab102-swap            253:0    0   8.9G  0 lvm  [SWAP]
└─sda3                                  8:3    1 103.2G  0 part 
  ├─libvirt--volumes-rharwood--ninja  253:2    0     8G  0 lvm  
  └─libvirt--volumes-rharwood--distcc 253:3    0     8G  0 lvm  
sdh                                     8:112  0  93.1G  0 disk 
sdi                                     8:128  0  93.1G  0 disk 
└─rhallisey-openstack                 253:1    0    90G  0 lvm  
sr0                                    11:0    1  1024M  0 rom  

# ll /dev/disk/by-path/pci-0000:10:00.1-fc-0x207800c0ffd79b2a-lun-13
lrwxrwxrwx. 1 root root 9 Oct 29 19:05 /dev/disk/by-path/pci-0000:10:00.1-fc-0x207800c0ffd79b2a-lun-13 -> ../../sdh
# ll /dev/disk/by-path/pci-0000:10:00.0-fc-0x217800c0ffd79b2a-lun-13
lrwxrwxrwx. 1 root root 9 Oct 29 18:33 /dev/disk/by-path/pci-0000:10:00.0-fc-0x217800c0ffd79b2a-lun-13 -> ../../sdi
# 

# ./detectformat /dev/sdh
Open '/dev/sdh' for stat
stat st_blocks=0 st_blksize=4096 st_size=0
statfs f_bsize=4096 f_blocks=2309967 f_bfree=2309967 f_bavail=2309967
lseek end=100000595968 (97656832K, 95368M, 93G)
lseek start=0
part_table_type = VIR_STORAGE_POOL_DISK_LVM2
offset=536 length=8 bytes=1024
part_table_type = VIR_STORAGE_POOL_DISK_LVM2
#./detectformat /dev/sdi
Open '/dev/sdi' for stat
stat st_blocks=0 st_blksize=4096 st_size=0
statfs f_bsize=4096 f_blocks=2309967 f_bfree=2309967 f_bavail=2309967
lseek end=100000595968 (97656832K, 95368M, 93G)
lseek start=0
part_table_type = VIR_STORAGE_POOL_DISK_LVM2
offset=536 length=8 bytes=1024
part_table_type = VIR_STORAGE_POOL_DISK_LVM2

# od -x /dev/sdh | less
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000 414c 4542 4f4c 454e 0001 0000 0000 0000
0001020 89f8 f320 0020 0000 564c 324d 3020 3130
0001040 4145 7231 634d 7667 3944 6d36 434e 3369
0001060 7a65 5435 4f6f 4144 785a 6b45 634c 704e
0001100 0000 4880 0017 0000 0000 0010 0000 0000
...

# od -x /dev/sdi | less
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000 414c 4542 4f4c 454e 0001 0000 0000 0000
0001020 89f8 f320 0020 0000 564c 324d 3020 3130
0001040 4145 7231 634d 7667 3944 6d36 434e 3369
0001060 7a65 5435 4f6f 4144 785a 6b45 634c 704e


For both you see a patter on the "1020" line - that matches the pattern for LVM2 found in the detectformat program.

When I ran od -x /dev/sdd last night - I found something like:

0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0002000 ...

Note the difference of 0002000 vs 0001000.  Since the buffer we use to read was only 1024 bytes, it never read the file header. That's why I changed my code to use 4096, but I hadn't fully dug into the format yet... 

What I do know is *that's* what is causing the issue. The fact that you succeeded sometimes while failed others could have to do with the timing of the device showing up via udev vs. the time it took for libvirt to scan what was available.  When you listed that volume, it may have been interesting to see the output of a "virsh vol-list $pool --details" - that would have show the Name, Path, Type, Capacity, and Allocation.

Finally, I can tell you've been busy - do a "virsh nodedev-list scsi_host" - see all those stray "scsi_host##"'s - that's what managed='no' gets you.  I also see that a "virsh nodedev-dumpxml scsi_host5" shows all the vports in use

Comment 10 John Ferlan 2015-10-30 14:22:09 UTC

So after a bit of IRC conversation and some more "debugging" - it seems as though the block device file format is a bit of a red herring (e.g. no big deal); however, not being able to read the block device is a problem - well at least partially.  The target format only matters insomuch as being able to display it - for this case there's less concern over what the format is. Although perhaps libvirt should perhaps spit out a VIR_WARN regarding the inability to determine the format.

So the "issue" ends up being that as of bug 1171933, the code that handles stable path checking has undergone change in order to better detect good and bad configurations.  In this particular case, prior to that set of patches, this particular volume would just be ignored since libvirt wasn't able to open the disk block device.  With those changes, the error is now recognized and propagated back to the user/startup code.

It's a simple enough adjustment in order to "ignore" such a device while still logging it as before. 

I have a fix that works and will post something upstream shortly.

Comment 12 John Ferlan 2015-10-30 15:16:41 UTC

Posted a patch upstream:

http://www.redhat.com/archives/libvir-list/2015-October/msg00919.html

Comment 14 John Ferlan 2015-12-09 21:40:45 UTC

Series has been pushed, see:

commit a523770c32f5beff7faa63b5b685d9a2a9a04878
Author: John Ferlan <jferlan>
Date:   Fri Oct 30 10:29:46 2015 -0400


git describe a523770c32f5beff7faa63b5b685d9a2a9a04878
v1.3.0-49-ga523770

Comment 16 Han Han 2016-02-25 07:56:58 UTC

Checked on libvirt-1.3.2-1.el7.x86_64, find a problem.
Run comment1 looply with following scripts:
#!/bin/bash -x
for i in {14..58};do
    echo '2101001b32a90000:2100001b32a9da4e' > /sys/class/fc_host/host5/vport_create
    sed -i "s/host[0-9]*/host$i/g" by-path.xml
    virsh pool-create by-path.xml
    if [ $? -ne 0 ];then
	exit
    fi
    sleep 2
    virsh pool-destroy host$i
    sleep 2
    virsh nodedev-destroy scsi_host$i
done

A problem appeared when run into the 3rd loop:
+ for i in '{14..58}'
+ echo 2101001b32a90000:2100001b32a9da4e
+ sed -i 's/host[0-9]*/host16/g' by-path.xml
+ virsh pool-create by-path.xml
Pool host16 created from by-path.xml

+ '[' 0 -ne 0 ']'
+ sleep 2
+ virsh pool-destroy host16
Pool host16 destroyed

+ sleep 2
+ virsh nodedev-destroy scsi_host16
error: Failed to destroy node device 'scsi_host16'
error: internal error: Device is not a fibre channel HBA

+ for i in '{14..58}'
+ echo 2101001b32a90000:2100001b32a9da4e
./run.sh: line 3: echo: write error: Name not unique on network
+ sed -i 's/host[0-9]*/host17/g' by-path.xml
+ virsh pool-create by-path.xml
error: Failed to create pool from by-path.xml
error: Could not open '/sys/class/scsi_host/host17/scan' to trigger host scan: No such file or directory

+ '[' 1 -ne 0 ']'
+ exit

It seems strange that the scsi_host16 cannot be destroyed.

Comment 17 John Ferlan 2016-02-25 14:45:07 UTC

And if you change those sleep 2 to sleep 5 or sleep 10?  Does that change the results?

The point is - the underlying infrastructure (e.g. all that must happen asynchronously) is robust enough for a script that's thrashing create/delete processing.  Are there any interesting 'entrails' in the libvirtd.log or
host's messages log?

Even for volume population - there's no guarantee that the volume appears immediately. A few pool-refresh's later it may show up.

In any case, the multiple create/delete/destroy is a different problem than the original problem. I'm not saying it's not an issue, but it's not overly surprising either as there's quite a bit of infrastructure changes that needs to take place for which libvirt has no control (device create/delete, file link create/delete, systemd interactions, udev interactions, etc.)

Comment 18 Han Han 2016-05-03 05:48:27 UTC

I can reproduce the bug.
Verify it on libvirt-1.3.3-2.el7.x86_64
Let's do the loop of create_vHBA --> create_vHBA_pool --> destroy_vHBA_pool --> delete_vHBA:
#!/bin/bash -x
for i in host{37..100};do
echo '2101001b32a90000:2100001b32a9da4e' > /sys/class/fc_host/host4/vport_create
sed -i "s/host[0-9]*/$i/g" pool.xml
virsh pool-create pool.xml
if [ $? -ne 0 ];then
echo error
exit
fi
virsh nodedev-list --tree|grep $i -A 28
virsh pool-destroy $i
echo '2101001b32a90000:2100001b32a9da4e' > /sys/class/fc_host/host4/vport_delete
done

No error happens in the loop. Bug fixed.

Comment 20 errata-xmlrpc 2016-11-03 18:29:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html

Note You need to log in before you can comment on or make changes to this bug.