Bug 971313 - qemu crashes due to selinux AVC when detaching a hostdev
qemu crashes due to selinux AVC when detaching a hostdev
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.0
Unspecified Unspecified
high Severity medium
: rc
: ---
Assigned To: Laine Stump
Virtualization Bugs
: TestOnly
Depends On: 984112
Blocks:
  Show dependency treegraph
 
Reported: 2013-06-06 05:16 EDT by hongming
Modified: 2014-06-17 20:51 EDT (History)
8 users (show)

See Also:
Fixed In Version: libvirt-1.1.1-1.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-06-13 06:29:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
honzhang: needinfo-


Attachments (Terms of Use)
libvirt debug log (624.54 KB, text/plain)
2013-06-06 05:18 EDT, hongming
no flags Details
guest qemu log (8.99 KB, text/plain)
2013-06-24 23:03 EDT, hongming
no flags Details
audit log (8.09 KB, text/x-log)
2013-06-26 05:57 EDT, hongming
no flags Details

  None (edit)
Description hongming 2013-06-06 05:16:49 EDT
Description of problem:
domain crash when attach the same vf to guest again

Version-Release number of selected component (if applicable):
libvirt-1.0.6-1.el7.x86_64
qemu-kvm-1.5.0-2.el7.x86_64
3.9.0-0.55.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
# lspci|grep 82576
0e:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0e:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
0f:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0f:10.1 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0f:10.2 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
0f:10.3 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01)
.......

# cat vf.xml
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
       <address type='pci' domain='0x0000' bus='0x0f' slot='0x10' function='0x3'/>
      </source>
    </hostdev>


# virsh start rhel7
Domain rhel7 started

# virsh attach-device rhel7 vf.xml
Device attached successfully


# virsh detach-device rhel7 vf.xml
Device detached successfully


# virsh attach-device rhel7 vf.xml
error: Failed to attach device from vf.xml
error: Unable to read from monitor: Connection reset by peer


Actual results:
domain crash when attach the same vf to guest again

Expected results:
The domain still works fine

Additional info:
Comment 1 hongming 2013-06-06 05:18:47 EDT
Created attachment 757573 [details]
libvirt debug log
Comment 3 Laine Stump 2013-06-24 18:28:33 EDT
> error: Unable to read from monitor: Connection reset by peer

Whenever you see this error, you need to gather the guest's qemu logfile, which is in /var/log/libvirt/qemu/$guestname.log ($guestname is rhel7 in this case).

Also, can you verify that the guest remains running properly *after* the device is detached the first time, right up until the 2nd attach attempt?
Comment 4 hongming 2013-06-24 23:03:13 EDT
Created attachment 764884 [details]
guest qemu log

Attached guest qemu log
Comment 5 hongming 2013-06-24 23:16:50 EDT
The state of guest become shutoff when the device detached the first time.

# virsh start rhel7
Domain rhel7 started

# virsh attach-device rhel7 vf.xml
Device attached successfully

# virsh detach-device rhel7 vf.xml
Device detached successfully

# virsh list
 Id    Name                           State
----------------------------------------------------
 4     rhel7                          running

wait for some moment , the guest shut off 

# virsh list
 Id    Name                           State
----------------------------------------------------
Comment 6 Laine Stump 2013-06-25 13:29:41 EDT
(In reply to hongming from comment #4)
> 
> Attached guest qemu log

That's an interesting log, but it doesn't look like it is /var/log/libvirt/qemu/rhel7.log.

The qemu logfile would contain things such as the qemu commandline that was used start start qemu, and any error messages that qemu generated after it was started.

The logfile that you've attached attachment 764884 [details] contains a lot of messages from libvirtd, followed by what looks like a very strange error message about the kernel being unable to write to a pci device. I don't recognize it, so I'm asking the qemu people for assistance.
Comment 7 Laine Stump 2013-06-25 13:42:22 EDT
Can you try this with selinux disabled to see if the behavior is different?

Also check for any new AVCs in /var/log/audit/audit.log.
Comment 8 hongming 2013-06-26 05:56:11 EDT
If selinux is disabled , it works fine. the bug can't be reproduced.

If selinux is enforing , it can be reproduced.
# getenforce 
Enforcing

# virsh start rhel7
Domain rhel7 started

# virsh attach-device rhel7 vf.xml
Device attached successfully

# virsh detach-device rhel7 vf.xml
Device detached successfully

# cat /var/log/audit/audit.log|grep avc
type=AVC msg=audit(1372230114.643:139): avc:  denied  { write } for  pid=1711 comm="qemu-kvm" path="/sys/devices/pci0000:00/0000:00:1c.6/0000:0c:00.0/0000:0d:02.0/0000:0f:10.3/config" dev="sysfs" ino=27333 scontext=system_u:system_r:svirt_t:s0:c150,c740 tcontext=system_u:object_r:sysfs_t:s0 tclass=file
Comment 9 hongming 2013-06-26 05:57:37 EDT
Created attachment 765495 [details]
audit log
Comment 10 Laine Stump 2013-07-01 12:42:00 EDT
This is the offending AVC:

type=AVC msg=audit(1372240465.252:542): avc:  denied  { write } for  pid=4129 comm="qemu-kvm" path="/sys/devices/pci0000:00/0000:00:1c.6/0000:0c:00.0/0000:0d:02.0/0000:0f:10.3/config" dev="sysfs" ino=27333 scontext=system_u:system_r:svirt_t:s0:c394,c836 tcontext=system_u:object_r:sysfs_t:s0 tclass=file

The fact that this works when the device is attached, but fails when the device is detached, implies that the selinux label on this resource is being "undone" too soon.
Comment 12 Laine Stump 2013-07-02 19:23:17 EDT
I have a theory about this that is a bit disturbing - when we send the device_del command to qemu, it returns almost immediately with success, but it hasn't *really* finished detaching the device. In the meantime, we happily proceed to reattach the device to the host driver, undo any cgroups that we had setup, and relabel everything in sysfs to prevent access by the qemu process.*But it may not be finished yet!*

So I think the solution to this problem is to implement a wait for the new qemu event that it produces when it is *really* finished with a device (does anyone remember the BZ number for the libvirt side of that?
Comment 13 Laine Stump 2013-07-31 15:45:13 EDT
I believe that this bug may have been fixed by the patch for Bug 984112, which is now available in the latest RHEL7 build- libvirt-1.1.1-1.el7. Can you please retest and see if that is the case.

If it is fixed, we should re-target this back to 7.0, then mark it as fixed in libvirt-1.1.1-1.el7.
Comment 14 Laine Stump 2013-08-01 11:18:52 EDT
QA testing for Bug 984112 ran into this same problem, so it seems it isn't yet fixed. See my comment in that bug.
Comment 15 Jiri Denemark 2013-08-02 09:41:58 EDT
They hit it when trying to reproduce the bug with an older package. In any case, this is supposed to be fixed by the patches for bug 984112 and it would be a bug in those patches if not. Thus, I'm moving this back to 7.0 with a TestOnly keyword.
Comment 16 Hu Jianwei 2013-08-07 01:33:43 EDT
This bug was blocked by bug 990987, I can verify it after bug 990987 fixed.
Comment 17 Hu Jianwei 2013-12-19 21:37:08 EST
I can't reproduce it any more.

Version:
libvirt-1.1.1-15.el7.x86_64
qemu-kvm-rhev-1.5.3-21.el7.x86_64
kernel-3.10.0-61.el7.x86_64

1. Enable vfio module in kernel
[root@sriov1 ~]#  modprobe vfio_pci
[root@sriov1 ~]# lsmod|grep vfio
vfio_iommu_type1       17636  1 
vfio_pci               36474  1 
vfio                   20777  5 vfio_iommu_type1,vfio_pci
[root@sriov1 ~]#

2. Detach VF device from host
[root@sriov1 ~]# cat net.xml 
<hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x10' function='0x4'/>
      </source>
    </hostdev>
[root@sriov1 ~]# 

[root@sriov1 ~]# virsh nodedev-dumpxml pci_0000_03_10_4
<device>
  <name>pci_0000_03_10_4</name>
  <path>/sys/devices/pci0000:00/0000:00:01.0/0000:03:10.4</path>
  <parent>pci_0000_00_01_0</parent>
  <driver>
    <name>igbvf</name>
  </driver>
...(snipped)

[root@sriov1 ~]# virsh nodedev-detach pci_0000_03_10_4
Device pci_0000_03_10_4 detached

[root@sriov1 ~]# virsh nodedev-dumpxml pci_0000_03_10_4
<device>
  <name>pci_0000_03_10_4</name>
  <path>/sys/devices/pci0000:00/0000:00:01.0/0000:03:10.4</path>
  <parent>pci_0000_00_01_0</parent>
  <driver>
    <name>vfio-pci</name>
  </driver>
...(snipped)

2. Repeat 3 times to attach/detach VF to domain r7
[root@sriov1 ~]# getenforce 
Enforcing
[root@sriov1 ~]# 

[root@sriov1 ~]# virsh attach-device r7 net.xml 
Device attached successfully

[root@sriov1 ~]# virsh detach-device r7 net.xml 
Device detached successfully

[root@sriov1 ~]# virsh attach-device r7 net.xml 
Device attached successfully

[root@sriov1 ~]# virsh detach-device r7 net.xml 
Device detached successfully

[root@sriov1 ~]# virsh attach-device r7 net.xml 
Device attached successfully

[root@sriov1 ~]# virsh detach-device r7 net.xml 
Device detached successfully

[root@sriov1 ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 4     r7                             running


We can get the expected results, and the bug 984112 has been verified.
Comment 18 Ludek Smid 2014-06-13 06:29:16 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.