Bug 1005682

Summary: The driver of vf is still pci-stub after detach it from the guest
Product: Red Hat Enterprise Linux 7 Reporter: Xuesong Zhang <xuzhang>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: acathrow, alex.williamson, chayang, dallan, dyuan, honzhang, jiahu, juzhang, lagarcia, laine, virt-bugs, xuzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.1.1-10.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 995935 Environment:
Last Closed: 2014-06-13 10:10:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirtd.log
none
message.log
none
audit.log none

Comment 2 Laine Stump 2013-09-18 11:12:16 UTC
>    <iommuGroup number='11'>
>      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x0'/>
>      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x5'/>
>      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x6'/>
>      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x7'/>
>      <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>      <address domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
>      <address domain='0x0000' bus='0x06' slot='0x00' function='0x1'/>
>   [...]


It seems messed up that there are so many devices in the same iommu_group as an SRIOV VF. On my RHEL7 systems which has an 82576 card (igb/igbvf drivers), each VF is in an iommu group that contains only itself.

Similarly, 

On the same system, I also didn't see any incorrect binding of the VF device driver during nodedev-detach/reattach for either legacy (--driver kvm) or vfio (--driver vfio). This was both using an older kernel/libvirt:

  kernel-3.10.0-4.el7.x86_64
  libvirt-1.1.1-2.el7.x86_64

and the latest nightly as of yesterday:

  kernel-3.10.0-20.el7.x86_64
  libvirt-1.1.1-5.el7.x86_64

My suspicion is that there is something unusual about your hardware setup that is causing both the huge list of devices in a single iommu group, as well as the failure to rebind to igbvf.

Comment 3 Laine Stump 2013-09-18 11:20:07 UTC
Hmm. Or maybe this is due to the particular slot the network card is plugged into?

Can you provide the output of "lspci" and also of:

 ls /sys/bus/pci/devices/0000:11:10.3/iommu_group/devices

Is there a different slot you can try plugging the card into?

Comment 4 Laine Stump 2013-09-18 11:20:52 UTC
ALex, do you have any other suggestions / advice?

Comment 5 Alex Williamson 2013-09-18 12:31:34 UTC
My guess is that this is one of the many Intel systems for which the root ports do not have an ACS capability.  This causes us to group all the devices behind the root port together and also tie together multifunction root ports.  We're working with Intel to come up with a white list for assuming a level of ACS on these ports to allow them to be separated.

Comment 6 Laine Stump 2013-09-20 10:36:02 UTC
Alex - do you think the lack of ACS on the root ports could also be the cause of the bound driver failing to change from "pci-stub" back to "igbvf" during the execution of "virsh nodedev-reattach" ?

Is there anything libvirt can do here? Or should it just be closed as CANTFIX?

Comment 7 Laine Stump 2013-09-20 10:38:35 UTC
Also, Alex's response also cleared the needinfo to the reporter for Comment 3 (not sure if there is any way to avoid that unfortunately).

Zhang - can you provide the info I asked for in Comment 3?

Comment 8 Alex Williamson 2013-09-20 14:16:12 UTC
(In reply to Laine Stump from comment #6)
> Alex - do you think the lack of ACS on the root ports could also be the
> cause of the bound driver failing to change from "pci-stub" back to "igbvf"
> during the execution of "virsh nodedev-reattach" ?

No, unless it's exercising an existing bug in libvirt.  What exactly does libvirt do to try to rebind the device back to the host driver?  To bind to pci-stub/vfio-pci, it must do something like:

echo $PCI_ADDRESS > /sys/bus/pci/devices/$PCI_ADDRESS/driver/unbind
echo $VID $DID > /sys/bus/pci/drivers/$NEW_DRIVER/new_id
echo $PCI_ADDRESS > /sys/bus/pci/drivers/$NEW_DRIVER/bind
echo $VID $DID > /sys/bus/pci/drivers/$NEW_DRIVER/remove_id

Then to re-bind to the host driver:

echo $PCI_ADDRESS > /sys/bus/pci/devices/$PCI_ADDRESS/driver/unbind
echo $PCI_ADDRESS > /sys/bus/pci/drivers_probe

That will cause PCI to find rescan available drivers for that device and bind to the first match.  If there's more than one driver capable of binding to the device at that time, you may not end up with the driver you want.  One case where this may occur is if the steps in the first case race with the steps in the second case for devices with the same $VID/$DID.  Another case would be if libvirt doesn't remove_id for one of the drivers.
 
> Is there anything libvirt can do here? Or should it just be closed as
> CANTFIX?

Sure, libvirt could add a mutex so that it's never racing between the sequences above, libvirt could remember the host driver for a device and bind to it explicitly, libvirt could make sure it's not leaving any unintended IDs with the pci-stub/vfio-pci drivers, etc.

Comment 9 Alex Williamson 2013-09-20 14:17:07 UTC
Re-adding needinfo for Zhang

Comment 10 Laine Stump 2013-09-23 10:52:37 UTC
(I had actually meant "if the problem with failed re-attach has the same root cause that puts so many devices into a single iommu_group then can libvirt do anything, but thanks for answering what would have been my next question anyway :-)


Here is what libvirt does to re-attach a device back to the host driver:

1) iff the device is currently bound to "pci-stub", "vfio-pci", or "pciback":
      echo $PCI_ADDRESS > /sys/bus/pci/devices/$PCI_ADDRESS/driver/unbind

3) iff /sys/bus/pci/devices/$PCI_ADDRESS/driver/remove_slot exists:
      echo $PCI_ADDRESS > /sys/bus/pci/devices/$PCI_ADDRESS/driver/remove_slot
   (this is apparantly needed for xen)

4) iff /sys/bus/pci/devices/$PCI_ADDRESS/driver/remove_id exists *OR*
       there is no driver currently bound to the device, then:
       echo $PCI_ADDRESS > /sys/bus/pci/drivers_probe

So yeah, libvirt is doing what you say it should be doing to re-attach to the host (and the "bind to the stub driver" part seems to always be successful.

As for the mutex - although I see your point, it doesn't sound like in this case the failure is happening during an attempt to rebind two devices simultaneously (I won't dwell on the fact that the need for a mutex is caused by an incredibly poor design of the device binding/unbinding mechanism - what if two different applications are trying to bind/unbind similar devices at the same time?)

As for the last two suggestions:

> libvirt could remember the host driver for a device and bind to it explicitly

That's an interesting idea, and I'm not sure why libvirt hasn't always done that. Possibly because the original author didn't have a convenient place to store that information across restarts of libvirtd?

> libvirt could make sure it's not leaving any unintended IDs with the 
> pci-stub/vfio-pci drivers

I have to say I don't understand what you're saying here. Can you "ELI5"? (explain it like I'm 5)

Comment 11 Laine Stump 2013-09-23 10:57:50 UTC
Zhang - I'm still interested in the info I asked for in Comment 3.

Also, when the system is in the state of still being bound to pci-stub (even though you've run virsh detach-device), can you try doing the reprobe by hand?

echo 0000\:11\:10.3 > /sys/bus/pci/devices/0000\:11\:10.3/driver/unbind
echo 0000\:11\:10.3 > /sys/bus/pci/drivers_probe

(Did I get that right, Alex?)

Comment 12 Alex Williamson 2013-09-23 13:35:00 UTC
(In reply to Laine Stump from comment #10)
> > libvirt could make sure it's not leaving any unintended IDs with the 
> > pci-stub/vfio-pci drivers
> 
> I have to say I don't understand what you're saying here. Can you "ELI5"?
> (explain it like I'm 5)

My assumption is that if we're doing unbind + drivers_probe then we get re-bound to pci-stub either because we're racing another simultaneous operation or libvirt added the device to pci-stub/new_id but never removed it.  In that case we'd have both the original host driver and pci-stub able to claim the device and rebinding to pci-stub is a valid outcome as far as the kernel and drivers have been told.

(In reply to Laine Stump from comment #11)
> 
> echo 0000\:11\:10.3 > /sys/bus/pci/devices/0000\:11\:10.3/driver/unbind
> echo 0000\:11\:10.3 > /sys/bus/pci/drivers_probe
> 
> (Did I get that right, Alex?)

Yes, except the escapes aren't needed for the echo, ex:

echo 0000:11:10.3 > /sys/bus/pci/devices/0000\:11\:10.3/driver/unbind
echo 0000:11:10.3 > /sys/bus/pci/drivers_probe

Comment 13 Xuesong Zhang 2013-09-24 11:24:44 UTC
hi, Laine,

   The SR-IOV I used is one card which is not supported ACS, Alex's guess in comment 5 is right. But I tried this scenario in on SR-IOV host which is support ACS, it also can be reproduced, so this issue can be reproduced whether the SR-IOV card is ACS or not.

   While I hot-plug/hot-unplug the VF with the following xml, there isn't the issue in this bug:
# cat vf-hostdev.xml 
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address bus='0x03' slot='0x10' function='0x3'/>
  </source>
</hostdev>

   But while I hot-plug/hot-unplug the VF with the another xml type, the driver is still pci-stub while detach the VF. 
# cat vf.xml 
<interface type='hostdev' managed='yes'>
  <mac address='52:54:00:43:0d:0b'/>
  <source>
    <address type='pci' domain='0x0000' bus='0x03' slot='0x10' function='0x3'/>
  </source>
</interface>

   I guess maybe you use the 1st type xml for detaching the VF from the guest. If yes, would you please try to detach the VF with the 2nd xml type, it can be reproduced 100% in my SR-IOV host.
   
   I tested with the latest build, it still can be reproduced now.
   libvirt-1.1.1-5.el7.x86_64
   qemu-kvm-1.5.3-3.el7.x86_64
   kernel-3.10.0-23.el7.x86_64

   I upload the libvirtd.log, message.log, audit.log for your reference.

Comment 14 Xuesong Zhang 2013-09-24 11:25:49 UTC
Created attachment 802187 [details]
libvirtd.log

Comment 15 Xuesong Zhang 2013-09-24 11:26:20 UTC
Created attachment 802188 [details]
message.log

Comment 16 Xuesong Zhang 2013-09-24 11:26:52 UTC
Created attachment 802189 [details]
audit.log

Comment 17 Laine Stump 2013-10-03 11:36:16 UTC
(In reply to Zhang Xuesong from comment #13)
>    I guess maybe you use the 1st type xml for detaching the VF from the
> guest. If yes, would you please try to detach the VF with the 2nd xml type,


No, all of my testing is with <interface type='hostdev'>, not <hostdev>


> it can be reproduced 100% in my SR-IOV host.

and to verify - this is also the case on the host that *does* properly support ACS, correct?

Comment 18 Xuesong Zhang 2013-10-08 03:05:27 UTC
(In reply to Laine Stump from comment #17)
> (In reply to Zhang Xuesong from comment #13)
> >    I guess maybe you use the 1st type xml for detaching the VF from the
> > guest. If yes, would you please try to detach the VF with the 2nd xml type,
> 
> 
> No, all of my testing is with <interface type='hostdev'>, not <hostdev>
> 
> 
> > it can be reproduced 100% in my SR-IOV host.
> 
> and to verify - this is also the case on the host that *does* properly
> support ACS, correct?

Yeah, this bug can be reproduced on the host which is support ACS. It can be reproduced whether the host support ACS or not, seems not related with ACS capabilities.

Comment 21 Laine Stump 2013-10-16 13:50:23 UTC
Some time with gdb shows that the problem is caused by an omission in the new "device removed" event handling code.

When device removal was handled synchronously, a detach-device of an <interface type='hostdev'> would result in the *hostdev* removal code being executed instead of netdev (with the hostdev code doing a small bit of extra code to revert the mac address and vlan of the device).

With the new asynchronous handling, the event handler code responds to qemu's "device removed" event by treating the device purely as a netdev, so none of the hostdev-specific things are cleaned up, i.e. the host driver isn't reattached to the device (and the device isn't removed from the "in-use" so other guests can't use it).

(The part that I don't understand, is how my testing on my own RHEL7 host with <interface type='hostdev'> failed to reproduce the problem)

I'm working on a patch for this now.

In the meantime, note that this cannot be the same problem as reported against RHEL6.5 in Bug 995935 - the RHEL6 libvirt handles device removal synchronously, since 1) qemu in RHEL6 doesn't have that event, and 2) libvirt in RHEL6 doesn't have the code to respond to that event.

Comment 22 Laine Stump 2013-10-21 16:17:09 UTC
A fix has been pushed upstream:

commit 7a600cf77fb649e3412d4bf4f9eefba046562880
Author: Laine Stump <laine>
Date:   Fri Oct 18 11:39:08 2013 +0300

    qemu: simplify calling qemuDomainHostdevNetConfigRestore

commit c5561644d8551d80d94b648b9ab828c9859e1667
Author: Laine Stump <laine>
Date:   Fri Oct 18 12:28:40 2013 +0300

    qemu: move qemuDomainRemoveNetDevice to avoid forward reference

commit 69e047ae214d92feea6e54dfe821b1498d0004a9
Author: Laine Stump <laine>
Date:   Fri Oct 18 12:34:53 2013 +0300

    qemu: fix removal of <interface type='hostdev'>

Comment 25 Hu Jianwei 2013-10-24 10:31:19 UTC
I can reproduce it on libvirt-1.1.1-9.el7.x86_64, but can not reproduce it on libvirt-1.1.1-10.el7.x86_64.

Version:
libvirt-1.1.1-10.el7.x86_64
qemu-kvm-rhev-1.5.3-9.el7.x86_64
kernel-3.10.0-33.el7.x86_64

1. Check the driver status:
[root@sriov2 ~]# readlink -f /sys/bus/pci/devices/0000\:11\:10.1/driver/
/sys/bus/pci/drivers/igbvf
[root@sriov2 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 2     r7                             running

2. attach the following xml to the guest.
[root@sriov2 ~]# cat VF-interface.xml
<interface type='hostdev' managed='yes'>
    <mac address='52:54:00:43:0d:0b'/>      
      <source>
        <address type='pci' domain='0x0000' bus='0x11' slot='0x10' function='0x1'/>
      </source>
</interface>
[root@sriov2 ~]# virsh attach-device r7 VF-interface.xml
Device attached successfully

3. check the driver of the attached device.
[root@sriov2 ~]# virsh nodedev-dumpxml pci_0000_11_10_1
<device>
  <name>pci_0000_11_10_1</name>
  <path>/sys/devices/pci0000:00/0000:00:1c.6/0000:0c:00.0/0000:0d:04.0/0000:11:10.1</path>
  <parent>pci_0000_0d_04_0</parent>
  <driver>
    <name>pci-stub</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>17</bus>
    <slot>16</slot>
    <function>1</function>
    <product id='0x10ca'>82576 Virtual Function</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='phys_function'>
      <address domain='0x0000' bus='0x10' slot='0x00' function='0x1'/>
    </capability>
    <iommuGroup number='11'>
      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x0'/>
      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x5'/>
      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x6'/>
      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x7'/>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x06' slot='0x00' function='0x1'/>
      <address domain='0x0000' bus='0x07' slot='0x01' function='0x0'/>
      <address domain='0x0000' bus='0x07' slot='0x05' function='0x0'/>
      <address domain='0x0000' bus='0x07' slot='0x07' function='0x0'/>
      <address domain='0x0000' bus='0x07' slot='0x09' function='0x0'/>
      <address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x10' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x10' slot='0x00' function='0x1'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x0'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x1'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x2'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x3'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x4'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x5'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x6'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x7'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x0'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x1'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x2'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x3'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x4'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x5'/>
      <address domain='0x0000' bus='0x12' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0d' slot='0x02' function='0x0'/>
      <address domain='0x0000' bus='0x0d' slot='0x04' function='0x0'/>
      <address domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0e' slot='0x00' function='0x1'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x0'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x1'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x2'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x3'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x4'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x5'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x6'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x7'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x0'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x1'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x2'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x3'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x4'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x5'/>
    </iommuGroup>
  </capability>
</device>

4. detach the device from the geust, the driver comes back from pci-stub to igbvf.
[root@sriov2 ~]# virsh detach-device r7 VF-interface.xml
Device detached successfully

5. Check the driver info and that interface can get ip normally.
[root@sriov2 ~]# virsh nodedev-dumpxml pci_0000_11_10_1
<device>
  <name>pci_0000_11_10_1</name>
  <path>/sys/devices/pci0000:00/0000:00:1c.6/0000:0c:00.0/0000:0d:04.0/0000:11:10.1</path>
  <parent>pci_0000_0d_04_0</parent>
  <driver>
    <name>igbvf</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>17</bus>
    <slot>16</slot>
    <function>1</function>
    <product id='0x10ca'>82576 Virtual Function</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <capability type='phys_function'>
      <address domain='0x0000' bus='0x10' slot='0x00' function='0x1'/>
    </capability>
    <iommuGroup number='11'>
      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x0'/>
      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x5'/>
      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x6'/>
      <address domain='0x0000' bus='0x00' slot='0x1c' function='0x7'/>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x06' slot='0x00' function='0x1'/>
      <address domain='0x0000' bus='0x07' slot='0x01' function='0x0'/>
      <address domain='0x0000' bus='0x07' slot='0x05' function='0x0'/>
      <address domain='0x0000' bus='0x07' slot='0x07' function='0x0'/>
      <address domain='0x0000' bus='0x07' slot='0x09' function='0x0'/>
      <address domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x10' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x10' slot='0x00' function='0x1'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x0'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x1'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x2'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x3'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x4'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x5'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x6'/>
      <address domain='0x0000' bus='0x11' slot='0x10' function='0x7'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x0'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x1'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x2'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x3'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x4'/>
      <address domain='0x0000' bus='0x11' slot='0x11' function='0x5'/>
      <address domain='0x0000' bus='0x12' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0d' slot='0x02' function='0x0'/>
      <address domain='0x0000' bus='0x0d' slot='0x04' function='0x0'/>
      <address domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/>
      <address domain='0x0000' bus='0x0e' slot='0x00' function='0x1'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x0'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x1'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x2'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x3'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x4'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x5'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x6'/>
      <address domain='0x0000' bus='0x0f' slot='0x10' function='0x7'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x0'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x1'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x2'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x3'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x4'/>
      <address domain='0x0000' bus='0x0f' slot='0x11' function='0x5'/>
    </iommuGroup>
  </capability>
</device>

[root@sriov2 ~]# readlink -f /sys/bus/pci/devices/0000\:11\:10.1/driver/
/sys/bus/pci/drivers/igbvf

We can get expected results, changed to verified.

Comment 26 Laine Stump 2013-10-28 12:19:48 UTC
Something strange is going on here - the libvirt build you're testing (libvirt-1.1.1-10.el7) was supposed to also include patches to prefer VFIO device assignment over legacy KVM device assignment (see Bug 1001738). However, according to the output of nodedev-dumpxml that you've pasted here, the legacy KVM driver is being used instead (evidenced by the fact that the device is bound to "pci-stub" rather than "vfio-pci"). (Possibly it's just that the vfio driver isn't installed, but in that case I would expect the "iommuGroup" element to be empty.)


Note that (again, indicated by the output of nodedev-dumpxml) your SRIOV card is plugged into a slot that does not have proper separation from the integrated host devices, so all its VFs have been placed in the same iommu_group as the integrated devices. This should lead to a failure of managed device assignment (i.e. this *exact test*) due to VFIO's insistence that *all* devices in an iommu_group be bound to either the vfio-pci, or to pci-stub. To avoid failure of this test once the patch in Bug 1001738 is working properly on your hardware, you will need to add "<driver name='kvm'/>" to the xml used in the attach-device and detach-device commands.

Can you verify that the VFIO module is loaded?

Comment 27 Xuesong Zhang 2013-11-26 10:02:19 UTC
(In reply to Laine Stump from comment #26)
> Something strange is going on here - the libvirt build you're testing
> (libvirt-1.1.1-10.el7) was supposed to also include patches to prefer VFIO
> device assignment over legacy KVM device assignment (see Bug 1001738).
> However, according to the output of nodedev-dumpxml that you've pasted here,
> the legacy KVM driver is being used instead (evidenced by the fact that the
> device is bound to "pci-stub" rather than "vfio-pci"). (Possibly it's just
> that the vfio driver isn't installed, but in that case I would expect the
> "iommuGroup" element to be empty.)
> 

I tried with libvirt build libvirt-1.1.1-12.el7.x86_64, while the VFIO module didn't be loaded, the iommuGroup still contains the PCI element, didn't empty, is it ok?

> 
> Note that (again, indicated by the output of nodedev-dumpxml) your SRIOV
> card is plugged into a slot that does not have proper separation from the
> integrated host devices, so all its VFs have been placed in the same
> iommu_group as the integrated devices. This should lead to a failure of
> managed device assignment (i.e. this *exact test*) due to VFIO's insistence
> that *all* devices in an iommu_group be bound to either the vfio-pci, or to
> pci-stub. To avoid failure of this test once the patch in Bug 1001738 is
> working properly on your hardware, you will need to add "<driver
> name='kvm'/>" to the xml used in the attach-device and detach-device
> commands.
> 
> Can you verify that the VFIO module is loaded?

hi, Laine,

Sorry for the late response. Yeah, the VFIO module will not be loaded while the host starting. And we didn't load the VFIO module while we verifying this bug.

Comment 28 Ludek Smid 2014-06-13 10:10:25 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.