Bug 1307114

Summary: Hot unplug of PCI devices with VFIO fails
Product: [Community] Virtualization Tools Reporter: Ludovic Beliveau <ludovic.beliveau>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: jishao, jtomko, ludovic.beliveau, mprivozn, rbalakri, shyu
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-17 11:42:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ludovic Beliveau 2016-02-12 18:16:07 UTC
Description of problem:


Version-Release number of selected component (if applicable):

libvirt 2.2.0

How reproducible:

Always.

Steps to Reproduce:
1. boot vm with PCI device (with VFIO)
2. hot unplug PCI device when vm is running

Actual results:

error : virSecurityDACSetOwnershipInternal:304 : unable to set user and group to '0:0' on '/dev/vfio/45': No such file or directory

Expected results:

No errors

Additional info:

This is needed in openstack/nova in order to be able to suspend guest with PCI devices.

It would be possible to set "dynamic_ownership = 0" as a workaround but we still want DAC security to be enabled for the other devices.

On hot unplug of PCI devices with VFIO driver for QEMU, libvirt is
trying to restore the host devices to it's previous value (basically a chown
on the previous user/group).

However for devices with VFIO driver, when the device is unbinded it is
removed from the /dev/vfio file system causing the restore label to fail.

Comment 1 Ludovic Beliveau 2016-02-12 18:17:14 UTC
I submitted the following bug fix to the libvirt mailing list:

Currently, on hot unplug of PCI devices with VFIO driver for QEMU, libvirt is
trying to restore the host devices to it's previous value (basically a chown
on the previous user/group).

However for devices with VFIO driver, when the device is unbinded it is
removed from the /dev/vfio file system causing the restore label to fail.

The fix is to not restore the label for those PCI devices since they are going
to be teared down anyway.

Signed-off-by: Ludovic Beliveau <ludovic.beliveau>
---
diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c
index f8db960..f5beabd 100644
--- a/src/qemu/qemu_hotplug.c
+++ b/src/qemu/qemu_hotplug.c
@@ -2996,6 +2996,8 @@ qemuDomainRemoveHostDevice(virQEMUDriverPtr driver,
     int ret = -1;
     qemuDomainObjPrivatePtr priv = vm->privateData;
     char *drivestr = NULL;
+    virDomainHostdevSubsysPCIPtr pcisrc = NULL;
+    bool is_vfio = false;
 
     VIR_DEBUG("Removing host device %s from domain %p %s",
               hostdev->info->alias, vm, vm->def->name);
@@ -3039,6 +3041,8 @@ qemuDomainRemoveHostDevice(virQEMUDriverPtr driver,
 
     switch ((virDomainHostdevSubsysType) hostdev->source.subsys.type) {
     case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI:
+        pcisrc = &hostdev->source.subsys.u.pci;
+        is_vfio = pcisrc->backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO;
         qemuDomainRemovePCIHostDevice(driver, vm, hostdev);
         /* QEMU might no longer need to lock as much memory, eg. we just
          * detached the last VFIO device, so adjust the limit here */
@@ -3058,7 +3062,8 @@ qemuDomainRemoveHostDevice(virQEMUDriverPtr driver,
     if (qemuTeardownHostdevCgroup(vm, hostdev) < 0)
         VIR_WARN("Failed to remove host device cgroup ACL");
 
-    if (virSecurityManagerRestoreHostdevLabel(driver->securityManager,
+    if (!is_vfio &&
+        virSecurityManagerRestoreHostdevLabel(driver->securityManager,
                                               vm->def, hostdev, NULL) < 0) {
         VIR_WARN("Failed to restore host device labelling");
     }

Comment 2 Jingjing Shao 2016-03-17 03:43:21 UTC
I saw the libvirt version is libvirt 2.2.0 in decription.
Is it a mistake or not? 

I tried to reproduce the issue on libvirt-1.2.17-13.el7.x86_64 and have tested three scenarios, but all of the them can not reproduce it. May I get more info about this issue?

Below is the steps info.

scenarios A : make the device unavailable on the guest and hotplug the device on the host

1.[root@ibm-x3850x5-05 jishao]# virsh start r7
Domain r7 started

2.[root@ibm-x3850x5-05 jishao]# virsh dumpxml r7 | grep -i hostdev -A6
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:62:9d:13'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x0'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

3.Get into the guest

3.1[root@localhost ~]#  lspci -vvv
....
00:03.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
Subsystem: Intel Corporation Device 7a11
Physical Slot: 3
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Region 0: Memory at fc050000 (64-bit, non-prefetchable) [size=16K]
Region 3: Memory at fc054000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
Vector table: BAR=3 offset=00000000
PBA: BAR=3 offset=00002000
Capabilities: [a0] Express (v0) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed unknown, Width x0, ASPM not supported, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
Kernel driver in use: ixgbevf
......

3.2[root@localhost ~]#  ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.122.79 netmask 255.255.255.0 broadcast 192.168.122.255
inet6 fe80::5054:ff:fe35:6e8d prefixlen 64 scopeid 0x20<link>
ether 52:54:00:35:6e:8d txqueuelen 1000 (Ethernet)
RX packets 739 bytes 63534 (62.0 KiB)
RX errors 0 dropped 83 overruns 0 frame 0
TX packets 534 bytes 382772 (373.8 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 52:54:00:fe:61:07 txqueuelen 1000 (Ethernet)
RX packets 14 bytes 840 (840.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


3.3 [root@localhost ~]# ip l set eth0 down
[root@localhost ~]#

3.4
[root@localhost ~]# echo 1 > /sys/bus/pci/devices/0000\:00\:03.0/remove

[root@localhost ~]#  ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.122.79 netmask 255.255.255.0 broadcast 192.168.122.255
inet6 fe80::5054:ff:fe35:6e8d prefixlen 64 scopeid 0x20<link>
ether 52:54:00:35:6e:8d txqueuelen 1000 (Ethernet)
RX packets 771 bytes 66134 (64.5 KiB)
RX errors 0 dropped 83 overruns 0 frame 0
TX packets 561 bytes 387154 (378.0 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 
........

4.On the host
[root@ibm-x3850x5-05 jishao]# virsh dumpxml r7 | grep -i hostdev -A6
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:62:9d:13'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x0'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
....

5. On the host
[root@ibm-x3850x5-05 jishao]# cat vf1.xml
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:62:9d:13'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x0'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

[root@ibm-x3850x5-05 jishao]# virsh detach-device r7 vf1.xml
Device detached successfully

6.On the host
[root@ibm-x3850x5-05 jishao]# virsh dumpxml r7 | grep -i hostdev -A8
[root@ibm-x3850x5-05 jishao]#
[root@ibm-x3850x5-05 jishao]#

scenarios B :start the guest with interface and unbind the device manually ,then detach the interface

1.[root@ibm-x3850x5-05 ~]# lspci | grep 82599
86:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
86:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
86:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
86:10.1 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
86:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
86:10.3 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)

2.[root@ibm-x3850x5-05 ~]# virsh nodedev-dumpxml pci_0000_86_00_0
<device>
<name>pci_0000_86_00_0</name>
<path>/sys/devices/pci0000:80/0000:80:01.0/0000:86:00.0</path>
<parent>pci_0000_80_01_0</parent>
<driver>
<name>ixgbe</name>
</driver>
<capability type='pci'>
<domain>0</domain>
<bus>134</bus>
<slot>0</slot>
<function>0</function>
<product id='0x10fb'>82599ES 10-Gigabit SFI/SFP+ Network Connection</product>
<vendor id='0x8086'>Intel Corporation</vendor>
<capability type='virt_functions' maxCount='63'>
<address domain='0x0000' bus='0x86' slot='0x10' function='0x0'/>
<address domain='0x0000' bus='0x86' slot='0x10' function='0x2'/>
<address domain='0x0000' bus='0x86' slot='0x10' function='0x4'/>
<address domain='0x0000' bus='0x86' slot='0x10' function='0x6'/>
</capability>
<iommuGroup number='27'>
<address domain='0x0000' bus='0x86' slot='0x00' function='0x0'/>
</iommuGroup>
<numa node='1'/>
<pci-express>
<link validity='cap' port='0' speed='5' width='8'/>
<link validity='sta' speed='5' width='4'/>
</pci-express>
</capability>
</device>

3.add the information to the guest r7 xml, then start r7
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:62:9d:13'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x0'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>


[root@ibm-x3850x5-05 images]# virsh start r7
Domain r7 started

[root@ibm-x3850x5-05 images]# virsh dumpxml r7 | grep hostdev -A8 -i
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:62:9d:13'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x0'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>

4.get into the guest r7
[root@localhost ~]# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::5054:ff:fe62:9d13 prefixlen 64 scopeid 0x20<link>
ether 52:54:00:62:9d:13 txqueuelen 1000 (Ethernet)
RX packets 14 bytes 840 (840.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5 bytes 438 (438.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
.....

[root@localhost ~]# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
link/ether 52:54:00:62:9d:13 brd ff:ff:ff:ff:ff:ff
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT
link/ether 52:54:00:90:a7:02 brd ff:ff:ff:ff:ff:ff
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT qlen 500
link/ether 52:54:00:90:a7:02 brd ff:ff:ff:ff:ff:ff

5.on the host
[root@ibm-x3850x5-05 ~]# echo "0000:86:10.0" > /sys/bus/pci/devices/0000\:86\:10.0/driver/unbind


6.On the guest
[root@localhost ~]# ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 0 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
.....

[root@localhost ~]# ip l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT
link/ether 52:54:00:90:a7:02 brd ff:ff:ff:ff:ff:ff
4: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT qlen 500
link/ether 52:54:00:90:a7:02 brd ff:ff:ff:ff:ff:ff

7.On the host
[root@ibm-x3850x5-05 ~]# virsh dumpxml r7 | grep hostdev -A8 -i
[root@ibm-x3850x5-05 ~]#
[root@ibm-x3850x5-05 ~]#
[root@ibm-x3850x5-05 ~]#



scenarios C : Firstly unbind the device manually then start the guest with interface and detach the interface


1.[root@ibm-x3850x5-05 jishao]# echo "0000:86:10.2" > /sys/bus/pci/devices/0000\:86\:10.2/driver/unbind
[root@ibm-x3850x5-05 jishao]#

2.
[root@ibm-x3850x5-05 jishao]# virsh edit rhel7.1
Domain rhel7.1 XML configuration edited.


[root@ibm-x3850x5-05 jishao]# virsh dumpxml rhel7.1 | grep interface -A 8
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:62:9d:13'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x2'/>
–
</interface>
<serial type='pty'>
<target port='0'/>
</serial>
<console type='pty'>

[root@ibm-x3850x5-05 jishao]# virsh start rhel7.1
Domain rhel7.1 started

3.
[root@ibm-x3850x5-05 jishao]# cat vf1.xml
<interface type='hostdev' managed='yes'>
<mac address='52:54:00:62:9d:13'/>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x86' slot='0x10' function='0x2'/>
</source>
</interface>

4.[root@ibm-x3850x5-05 jishao]# virsh detach-device rhel7.1 vf1.xml
Device detached successfully

Comment 3 Ludovic Beliveau 2016-03-17 11:29:58 UTC
This got fixed in upstream: 
http://libvirt.org/git/?p=libvirt.git;a=commit;h=8fbdff163456b6311cd459017b6087ff5081bd56

Thanks,
/ludovic

Comment 4 Ján Tomko 2016-03-17 11:42:34 UTC
(In reply to Jingjing Shao from comment #2)
> I saw the libvirt version is libvirt 2.2.0 in decription.
> Is it a mistake or not? 
> 
> I tried to reproduce the issue on libvirt-1.2.17-13.el7.x86_64 and have
> tested three scenarios, but all of the them can not reproduce it. May I get
> more info about this issue?
> 

The issue is the following error logged in libvirtd log:
error : virSecurityDACSetOwnershipInternal:304 : unable to set user and group to '0:0' on '/dev/vfio/45': No such file or directory

The fix removed logging of the error message. 
The error was ignored anyway so there is no other change in behavior.