RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1476227 - libvirt hangs after failed to create a vHBA (npiv vport)
Summary: libvirt hangs after failed to create a vHBA (npiv vport)
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: John Ferlan
QA Contact: yisun
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-28 10:58 UTC by yisun
Modified: 2017-12-04 14:40 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-04 14:40:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description yisun 2017-07-28 10:58:42 UTC
Description:
libvirt hangs after failed to create a vHBA (npiv vport)

How reproduced:
100%

Versions:
kernel-3.10.0-693.el7.x86_64
qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64
libvirt-3.2.0-14.el7_4.2.x86_64


Steps:
1. Having an online HBA
## virsh nodedev-dumpxml scsi_host7
<device>
  <name>scsi_host7</name>
  <path>/sys/devices/pci0000:00/0000:00:01.0/0000:20:00.1/host7</path>
  <parent>pci_0000_20_00_1</parent>
  <capability type='scsi_host'>
    <host>7</host>
    <unique_id>1</unique_id>
    <capability type='fc_host'>
      <wwnn>20000000c99e2b81</wwnn>
      <wwpn>10000000c99e2b81</wwpn>
      <fabric_wwn>2001547feeb71cc1</fabric_wwn>
    </capability>
    <capability type='vport_ops'>
      <max_vports>255</max_vports>
      <vports>0</vports>
    </capability>
  </capability>
</device>


2. prepare a xml for vHBA creation with parent=above HBA
## cat nodedev.xml
<device>
    <capability type="scsi_host">
        <capability type="fc_host">
            <wwnn>20000000c99e2b80</wwnn>
            <wwpn>1000000000000001</wwpn>
        </capability>
    </capability>
    <parent>scsi_host7</parent>
</device>

3. try to create vHBA (this will be failed in my enviornment, and reason is provided in *Addition info* part)
## virsh nodedev-create nodedev.xml
error: Disconnected from qemu:///system due to keepalive timeout
error: Failed to create node device from nodedev.xml
error: internal error: connection closed due to keepalive timeout

4. now libvirt hangs, a "virsh list" will just hang there until I ctrl
## time virsh list
^C
real    4m26.726s
user    0m0.006s
sys    0m0.003s


Expected result:
libvirt should not hang there even if vHBA creation failed

Actual result:
libvirt hangs



Additional info:
1. vHBA cannot be created with pure kernel command either, as follow:
## echo "1000000000000001:20000000c99e2b80" > /sys/class/fc_host/host7/vport_create
-bash: echo: write error: Interrupted system call

2. and in messages log, we can find following error
## cat /var/log/messages
44643 Jul 28 17:50:15 bootp-73-75-161 kernel: scsi host10: Emulex LPe12002-M8 8Gb 2-port PCIe Fibre Channel Adapter on PCI bus 20 device 01 irq 17 port 1
44644 Jul 28 17:50:15 bootp-73-75-161 kernel: lpfc 0000:20:00.1: 1:(1):2528 Mailbox command x8d cannot issue Data: x0 x2
44645 Jul 28 17:50:15 bootp-73-75-161 kernel: lpfc 0000:20:00.1: 1:(1):1818 VPort failed init, mbxCmd x8d READ_SPARM mbxStatus x0, rc = xff
44646 Jul 28 17:50:15 bootp-73-75-161 kernel: lpfc 0000:20:00.1: 1:(1):1813 Create VPORT failed. Cannot get sparam
44647 Jul 28 17:50:15 bootp-73-75-161 kernel: FC Virtual Port LLDD Create failed      

3. Our HBA card is from Emulex Corporation
## lspci
...
20:00.0 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)
20:00.1 Fibre Channel: Emulex Corporation Saturn-X: LightPulse Fibre Channel Host Adapter (rev 03)

4. So I googled its manual, and seems driver errors (error numbers are 1813 and 1818, as in above messages log)

ftp://ftp.software.ibm.com/systems/support/system_x_cluster/hbanyware-4.1a36a.pdf
...
elx_mes1813 Create VPORT failed. Cannot get sparam.
DESCRIPTION: The port could not be created beca
use it could not be initialized possibly due to
unavailable resources.
DATA: None
SEVERITY: Error
LOG: LOG_VPORT verbose
ACTION: Software driver error. If this problem
persists, report these errors to Technical Support.
...
elx_mes1818 VPort failed init, mbxCmd <mailbox command> READ_SPARM mbxStatus
<mailbox status>, rc = <status>
DESCRIPTION: A pending mailbox command
 issued to initialize port, failed.
DATA: (1) mbxCommand (2) mbxStatus (3) rc
SEVERITY: Error
LOG: LOG_VPORT verbose
ACTION: Software driver error. If this problem
persists, report these errors to Technical Support.
...

strange thing is this card worked well with elder kernel versions. not sure if there is also a kernel issue?

Comment 3 John Ferlan 2017-08-02 21:07:17 UTC
Well not being able to use the vport_create command directly would seem to mean to me that something that libvirt is relying on is behaving badly.

The "hang" you ^C'd out of doesn't help at unless you attach to the libvirtd daemon in gdb and then provide the results of a 'bt' for all the threads. That'll at least give a shred of possibility at figuring out why libvirtd is very unhappy when the creation of a vport has issues. I'd "assume" it has to do with nodedev driver interaction with udev, but that's purely a guess.

In any case, the commands w/ the provided wwnn/wwpn worked for me with recent upstream:

# virsh version
Compiled against library: libvirt 3.6.0
Using library: libvirt 3.6.0
Using API: QEMU 3.6.0
Running hypervisor: QEMU 2.6.2


I also checked out a v3.2-maint release, rebuilt, and ran successfully.

# virsh version
Compiled against library: libvirt 3.2.1
Using library: libvirt 3.2.1
Using API: QEMU 3.2.1
Running hypervisor: QEMU 2.6.2

Example run:

# virsh nodedev-create bz1476227.xml
Node device scsi_host36 created from bz1476227.xml

# virsh nodedev-dumpxml scsi_host36
<device>
  <name>scsi_host36</name>
  <path>/sys/devices/pci0000:00/0000:00:04.0/0000:10:00.1/host4/vport-4:0-1/host36</path>
  <parent>scsi_host4</parent>
  <capability type='scsi_host'>
    <host>36</host>
    <unique_id>33</unique_id>
    <capability type='fc_host'>
      <wwnn>20000000c99e2b80</wwnn>
      <wwpn>1000000000000001</wwpn>
      <fabric_wwn>2002000573de9681</fabric_wwn>
    </capability>
  </capability>
</device>

Comment 4 John Ferlan 2017-12-04 14:40:56 UTC
Closing this as works for me since I cannot reproduce and it seems as though from the problem report that this is not a libvirt problem, but rather a kernel driver problem.


Note You need to log in before you can comment on or make changes to this bug.