1472277 – An attempt to start a vHBA storage pool backed by an already pre-created vHBA returns unknown cause error

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1472277 - An attempt to start a vHBA storage pool backed by an already pre-created vHBA returns unknown cause error

Summary: An attempt to start a vHBA storage pool backed by an already pre-created vHBA...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	John Ferlan
QA Contact:	yisun
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-07-18 10:49 UTC by Erik Skultety
Modified:	2018-04-10 10:54 UTC (History)
CC List:	4 users (show)
Fixed In Version:	libvirt-3.7.0-1.el7
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-04-10 10:52:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:0704	0	None	None	None	2018-04-10 10:54:52 UTC

Description Erik Skultety 2017-07-18 10:49:50 UTC

Description of problem:
When trying to start a storage pool where the underlying vHBA device already exists, libvirt returns an unknown cause error.

Version-Release number of selected component (if applicable):
libvirt-3.2.0 and onward

How reproducible:
always

Steps to Reproduce:
1. pre-create a vHBA device via nodedev driver (XML example below)

<device>                                                                            
  <name>scsi_host26</name>
  <path>/sys/devices/pci0000:00/.../host8/vport-8:0-11/host26</path>                                                                              
  <parent>scsi_host8</parent>
  <capability type='scsi_host'>
    <host>26</host>
    <unique_id>26</unique_id>
    <capability type='fc_host'>
      <wwnn>20000000c99e2b81</wwnn>
      <wwpn>1000000000000001</wwpn>
      <fabric_wwn>2001547feeb71cc1</fabric_wwn>
    </capability>
  </capability>
</device>                                     

2. define a scsi storage pool

# virsh pool-dumpxml npiv

<pool type='scsi'>
  <name>npiv</name>
  <uuid>03f2bb35-3c32-40ce-aa2b-46cdc9ce0e70</uuid>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
    <adapter type='fc_host' parent='scsi_host8' managed='no' wwnn='20000000c99e2b81' wwpn='1000000000000001'/>
  </source>
  <target>
    <path>/dev/disk/by-path</path>
  </target>
</pool>

3. start the pool

# virsh pool-start npiv

Actual results:
error: Failed to start pool npiv
error: An error occurred, but the cause is unknown

Expected results:
The pool is started successfully.

Additional info:

Comment 1 John Ferlan 2017-07-18 15:25:40 UTC

hrmph... various refactors in the code broke this check.

I've posted a patch to resolve:

https://www.redhat.com/archives/libvir-list/2017-July/msg00662.html

as part of a series

https://www.redhat.com/archives/libvir-list/2017-July/msg00661.html

Comment 2 John Ferlan 2017-07-24 17:45:22 UTC

Review of original changes, results in following patch:

https://www.redhat.com/archives/libvir-list/2017-July/msg00838.html

as part of a v3 of series:

https://www.redhat.com/archives/libvir-list/2017-July/msg00837.html

which now has been pushed:

$ git describe c4030331c8bd820c6825db2dcd23c8743a5b9297
v3.5.0-238-gc403033
$ git show c4030331c8bd820c6825db2dcd23c8743a5b9297
commit c4030331c8bd820c6825db2dcd23c8743a5b9297
Author: John Ferlan <jferlan>
Date:   Tue Jul 18 09:21:30 2017 -0400

    storage: Fix existing parent check for vHBA creation
    
...
    
    Commit id '106930aaa' altered the order of checking for an existing
    vHBA (e.g something created via nodedev-create functionality outside
    of the storage pool logic) which inadvertantly broke the code to
    decide whether to alter/force the fchost->managed field to be 'yes'
    because the storage pool will be managing the created vHBA in order
    to ensure when the storage pool is destroyed that the vHBA is also
    destroyed.
    
    This patch moves the check (and checkParent helper) for an existing
    vHBA back into the createVport in storage_backend_scsi. It also
    adjusts the checkParent logic to more closely follow the intentions
    prior to commit id '79ab0935'. The changes made by commit id '08c0ea16f'
    are only necessary to run the virStoragePoolFCRefreshThread when
    a vHBA was really created because there's a timing lag such that
    the refreshPool call made after a startPool from storagePoolCreate*
    wouldn't necessarily find LUNs, but the thread would. For an already
    existing vHBA, using the thread is unnecessary since the vHBA already
    exists and the lag to configure the LUNs wouldn't exist.
    
    Signed-off-by: John Ferlan <jferlan>

Comment 4 yisun 2017-12-01 07:24:25 UTC

verified with:
libvirt-3.9.0-4.el7.x86_64
kernel-3.10.0-768.el7.x86_64
qemu-kvm-rhev-2.10.0-9.el7.x86_64

1. having an online hba scsi_host8
# virsh nodedev-dumpxml scsi_host8
<device>
  <name>scsi_host8</name>
  <path>/sys/devices/pci0000:00/0000:00:03.0/0000:08:00.1/host8</path>
  <parent>pci_0000_08_00_1</parent>
  <capability type='scsi_host'>
    <host>8</host>
    <unique_id>8</unique_id>
    <capability type='fc_host'>
      <wwnn>2001001b32a9da4e</wwnn>
      <wwpn>2101001b32a9da4e</wwpn>
      <fabric_wwn>2001547feeb71cc1</fabric_wwn>
    </capability>
    <capability type='vport_ops'>
      <max_vports>127</max_vports>
      <vports>1</vports>
    </capability>
  </capability>
</device>


2. create a vhba with wwnn:wwpn = 20000000c99e2b81:1000000000000001 and parent=scsi_host8
# cat vhba.xml 
<device>
  <parent>scsi_host8</parent>
  <capability type='scsi_host'>
    <capability type='fc_host'>
      <wwnn>20000000c99e2b81</wwnn>
      <wwpn>1000000000000001</wwpn>
    </capability>
  </capability>
</device>


# virsh nodedev-create vhba.xml 
Node device scsi_host9 created from vhba.xml


# virsh nodedev-dumpxml scsi_host9
<device>
  <name>scsi_host9</name>
  <path>/sys/devices/pci0000:00/0000:00:03.0/0000:08:00.1/host8/vport-8:0-0/host9</path>
  <parent>scsi_host8</parent>
  <capability type='scsi_host'>
    <host>9</host>
    <unique_id>9</unique_id>
    <capability type='fc_host'>
      <wwnn>20000000c99e2b81</wwnn>
      <wwpn>1000000000000001</wwpn>
      <fabric_wwn>2001547feeb71cc1</fabric_wwn>
    </capability>
  </capability>
</device>

# lsscsi | grep "\[9"
[9:0:0:0]    disk    IBM      2145             0000  /dev/sdf 
[9:0:0:1]    disk    IBM      2145             0000  /dev/sdg 
[9:0:1:0]    disk    IBM      2145             0000  /dev/sdh 
[9:0:1:1]    disk    IBM      2145             0000  /dev/sdi

3. prepare a scsi pool with same wwnn:wwpn
# cat vhba.pool 
<pool type='scsi'>
  <name>vhba</name>
  <capacity unit='bytes'>0</capacity>
  <allocation unit='bytes'>0</allocation>
  <available unit='bytes'>0</available>
  <source>
    <adapter type='fc_host' parent='scsi_host8' managed='no' wwnn='20000000c99e2b81' wwpn='1000000000000001'/>
  </source>
  <target>
    <path>/dev/disk/by-path</path>
  </target>
</pool>

4. try to create or start the pool
# virsh pool-create vhba.pool
error: Failed to create pool from vhba.pool
error: unsupported configuration: the wwnn/wwpn for 'host9' are assigned to an HBA

# virsh pool-define vhba.pool; virsh pool-start vhba
Pool vhba defined from vhba.pool

error: Failed to start pool vhba
error: unsupported configuration: the wwnn/wwpn for 'host9' are assigned to an HBA

5. destroy the vhba and create the pool again, it should be successful
# virsh nodedev-destroy scsi_host9
Destroyed node device 'scsi_host9'

# virsh pool-start vhba
Pool vhba started

6. check a new vhba created successfully
# virsh nodedev-dumpxml scsi_host14
<device>
  <name>scsi_host14</name>
  <path>/sys/devices/pci0000:00/0000:00:03.0/0000:08:00.1/host8/vport-8:0-6/host14</path>
  <parent>scsi_host8</parent>
  <capability type='scsi_host'>
    <host>14</host>
    <unique_id>14</unique_id>
    <capability type='fc_host'>
      <wwnn>20000000c99e2b81</wwnn>
      <wwpn>1000000000000001</wwpn>
      <fabric_wwn>2001547feeb71cc1</fabric_wwn>
    </capability>
  </capability>
</device>

7. destroy the pool, check the vhba destroyed 
# virsh pool-destroy vhba
Pool vhba destroyed

# virsh nodedev-dumpxml scsi_host14
error: Could not find matching device 'scsi_host14'
error: Node device not found: no node device with matching name 'scsi_host14'

Comment 8 errata-xmlrpc 2018-04-10 10:52:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0704

Note You need to log in before you can comment on or make changes to this bug.