Bug 1556828

Summary: [SR-IOV] - Can't start VM with SR-IOV vNIC
Product: Red Hat Enterprise Linux 7 Reporter: Michael Burman <mburman>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED ERRATA QA Contact: yafu <yafu>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: bugs, danken, dyuan, jdenemar, jherrman, jiyan, jsuchane, mburman, michal.skrivanek, mprivozn, ratamir, salmy, spower, xuzhang, yalzhang
Target Milestone: pre-dev-freezeKeywords: Upstream, ZStream
Target Release: 7.6   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-4.3.0-1.el7 Doc Type: Bug Fix
Doc Text:
Prior to this update, if the "interface type='hostdev'" configuration was used for a guest virtual machine, booting the guest in some cases failed due to a validation error. With this update, the libvirt service has been fixed to ignore hostdev duplicates in the XML configuration. As a result, the guest boots correctly in the described scenario.
Story Points: ---
Clone Of:
: 1557330 1558655 (view as bug list) Environment:
Last Closed: 2018-10-30 09:53:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1557330, 1558655    
Attachments:
Description Flags
sr-iov vm failed to run none

Description Michael Burman 2018-03-15 10:25:21 UTC
Created attachment 1408394 [details]
sr-iov vm failed to run

Description of problem:
[SR-IOV] - Can't start VM with SR-IOV vNIC

2018-03-15 11:25:40,890+0200 ERROR (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') The vm start process failed (vm:940)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2829, in _run
    dom = self._connection.defineXML(domxml)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3676, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirtError: XML error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af7
2018-03-15 11:25:40,891+0200 INFO  (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') Changed state to Down: XML error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af7 (code=1
) (vm:1677)

Version-Release number of selected component (if applicable):
4.2.2.2-0.1.el7
vdsm-4.20.20-1.el7ev.x86_64
libvirt-3.9.0-13.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Try to start VM with sr-iov vNIC

Actual results:
Failed to run

Expected results:
Should work

Comment 1 Michael Burman 2018-03-15 10:30:14 UTC
The issue reproduced with new libvirt libvirt-3.9.0-14.el7.x86_64

2018-03-15 12:27:16,562+0200 ERROR (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') The vm start process failed (vm:940)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2829, in _run
    dom = self._connection.defineXML(domxml)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3676, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirtError: XML error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af7
2018-03-15 12:27:16,566+0200 INFO  (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') Changed state to Down: XML error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af7 (code=1
) (vm:1677)

Comment 2 Michal Skrivanek 2018-03-16 08:22:59 UTC
can you try with a build from https://bugzilla.redhat.com/show_bug.cgi?id=1554962#c3 ?

Comment 3 Michael Burman 2018-03-16 08:51:41 UTC
(In reply to Michal Skrivanek from comment #2)
> can you try with a build from
> https://bugzilla.redhat.com/show_bug.cgi?id=1554962#c3 ?

Hi Michal,
OK i did what you asked, tested with https://bugzilla.redhat.com/show_bug.cgi?id=1554962#c3 and get the same error and result - 

 2018-03-16 10:48:34,210+0200 ERROR (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') The vm start process failed (vm:940)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 869, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2829, in _run
    dom = self._connection.defineXML(domxml)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3676, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirtError: XML error: non unique alias detected: ua-92597e9b-5bc2-41a9-ad1d-2d603ecaedfa
2018-03-16 10:48:34,214+0200 INFO  (vm/f9bd0e85) [virt.vm] (vmId='f9bd0e85-54e5-46ec-9ae7-5a61861120a9') Changed state to Down: XML error: non unique alias detected: ua-92597e9b-5bc2-41a9-ad1d-2d603ecaedfa (code=1
) (vm:1677)

[root@puma22 ~]# rpm -qa | grep libvirt
libvirt-daemon-driver-interface-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-storage-logical-3.9.0-15.el7ua.x86_64
libvirt-daemon-kvm-3.9.0-15.el7ua.x86_64
libvirt-python-3.9.0-1.el7.x86_64
libvirt-daemon-3.9.0-15.el7ua.x86_64
libvirt-daemon-config-nwfilter-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-storage-scsi-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-nwfilter-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-storage-iscsi-3.9.0-15.el7ua.x86_64
libvirt-client-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-network-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-secret-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-lxc-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-storage-rbd-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-storage-3.9.0-15.el7ua.x86_64
libvirt-lock-sanlock-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-storage-core-3.9.0-15.el7ua.x86_64
libvirt-daemon-config-network-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-storage-mpath-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-qemu-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-storage-disk-3.9.0-15.el7ua.x86_64
libvirt-3.9.0-15.el7ua.x86_64
libvirt-libs-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-nodedev-3.9.0-15.el7ua.x86_64
libvirt-daemon-driver-storage-gluster-3.9.0-15.el7ua.x86_64

from - https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=15543641

Comment 4 Michal Privoznik 2018-03-16 11:40:56 UTC
That won't help. This is genuine libvirt bug. Problem is that when libvirt parses domain definition for <interface type='hostdev'/> it creates second entry in internal domain representation just like if it was <hostdev/>. So later when user alias validation runs it finds two devices with the same alias and throws and error. Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2018-March/msg00935.html

Comment 7 yafu 2018-03-19 09:20:00 UTC
Reproduced with libvirt-3.9.0-14.el7.x86_64.

Steps:
1.Prepare a hostdev interface with alias name:
 #cat interface.xml
 <interface type='hostdev' managed='yes'>
   <mac address='66:18:0b:c2:85:b1'/>
   <source>
     <address type='pci' domain='0x0000' bus='0x04' slot='0x10' function='0x1'/>
   </source>
   <alias name='ua-hostdev0'/>
   <model type='virtio'/>
</interface>

2.Coldplug the hostdev interface device to the guest:
 #virsh attach-device vm1 interface.xml --config
 Device attached successfully

3.Start the guest:
 #virsh start vm1
 error: Failed to start domain vm1
 error: XML error: non unique alias detected: ua-hostdev0

Comment 11 Michal Privoznik 2018-03-20 14:38:41 UTC
I've just pushed the patch upstream:

commit 630c6e34957666f20a66167c7a512e65fc711aa0
Author:     Michal Privoznik <mprivozn>
AuthorDate: Fri Mar 16 12:33:12 2018 +0100
Commit:     Michal Privoznik <mprivozn>
CommitDate: Tue Mar 20 15:30:14 2018 +0100

    virDomainDeviceDefValidateAliasesIterator: Ignore some hostdevs
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1556828
    
    When defining a domain that has <interface type='hostdev'/> our
    parser creates two entries in virDomainDef: one for <interface/>
    and one for <hostdev/>. However, some info is shared between the
    two which makes user alias validation fail because alias belongs
    to the set of shared info.
    
    Signed-off-by: Michal Privoznik <mprivozn>
    Reviewed-by: Jiri Denemark <jdenemar>


v4.1.0-246-g630c6e3495

Comment 14 yafu 2018-06-01 10:07:40 UTC
Verified with libvirt-4.3.0-1.el7.x86_64.

Test steps:
(1)Scenario 1: hotplug->unhotplug->hotplug hostdev interface with alias name:
1.Start a guest:
#virsh start iommu1
Domain iommu1 started

2.Prepare a hostdev interface with alias name:
#cat interface.xml
<interface type='hostdev' managed='yes'>
  <mac address='66:18:0b:c2:85:b8'/>
  <source>
    <address type='pci' domain='0x0000' bus='0x04' slot='0x10' function='0x3'/>
  </source>
  <alias name='ua-04c2decd-4e33-4023-84de-a2205c777af7'/>
  <model type='virtio'/>
</interface>

3.Hotplug the hostdev interface deto the guest:
#virsh attach-device iommu1 interface.xml
Device attached successfully

4.Check the hostdev interface in the live guest xml:
#virsh dumpxml iommu1 | grep -A10 interface
<interface type='hostdev' managed='yes'>
      <mac address='66:18:0b:c2:85:b8'/>
      <driver name='vfio'/>
    ...
      <alias name='ua-04c2decd-4e33-4023-84de-a2205c777af7'/>
    ...
</interface>

5.Hotunplug the hostdev interface from the guest:
#virsh detach-device iommu1 interface.xml
Device detached successfully

6.Check the hostdev interface in the live guest xml:
#virsh dumpxml iommu1  | grep hostdev
no output

7.Repeat step3-4, the hostdev interface can attach successfully to the guest.

(2)Scenario 2: coldplug->coldunplug hostdev interface with alias name:
1.Coldplug hostdev interface to guest:
#virsh attach-device iommu1 interface.xml --config
Device detached successfully

2.Start the guest:
#virsh start iommu1
Domain iommu1 started

3.Check the hostdev interface in the live xml:
#virsh dumpxml iommu1 | grep -A10 interface
<interface type='hostdev' managed='yes'>
      <mac address='66:18:0b:c2:85:b8'/>
      <driver name='vfio'/>
    ...
      <alias name='ua-04c2decd-4e33-4023-84de-a2205c777af7'/>
    ...
</interface>

4.Coldunplug the hostdev interface from the guest:
#virsh detach-device iommu1 interface.xml --config
Device detached successfully

5.Check the hostdev interface in the inactive domain xml:
#virsh dumpxml iommu1 --inactive | grep -i hostdev
no output

(3)Scenario 3: Hotplug hostdev interface with the same alias name:
1.Attach the hostdev interface device with alias name to the guest:
#virsh attach-device iommu1 interface.xml
Device attached successfully

2.Repeat step 1:
#virsh attach-device iommu1 interface.xml
error: Failed to attach device from /nic-xml/interface.xml
error: XML error: non unique alias detected: ua-04c2decd-4e33-4023-84de-a2205c777af

(4)Scenario 4: live update alias name:
#cat interface.xml
<interface type='hostdev' managed='yes'>
      <mac address='66:18:0b:c2:85:b8'/>
      <driver name='vfio'/>
    ...
      <alias name='ua-04c2decd-4e33-4023-84de-a2205c777a66'/>
    ...
</interface>

#virsh update-device iommu1 interface.xml
error: Failed to update device from interface.xml-2
error: Operation not supported: cannot change config of 'hostdev' network type

Comment 16 errata-xmlrpc 2018-10-30 09:53:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3113