Bug 644276

Summary: The kvm VM can't be started when Passthrough pci device with svirt
Product: Red Hat Enterprise Linux 5 Reporter: koka xiong <kxiong>
Component: selinux-policyAssignee: Miroslav Grepl <mgrepl>
Status: CLOSED ERRATA QA Contact: Milos Malik <mmalik>
Severity: high Docs Contact:
Priority: high    
Version: 5.6CC: ajia, berrange, bsarathy, chrisw, dwalsh, eblake, eparis, jdenemar, llim, mjenner, mmalik, mzhan, sgrubb, virt-maint, xen-maint, yoyzhang
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: selinux-policy-2.4.6-298.el5 Doc Type: Bug Fix
Doc Text:
With SELinux running in the enforcing mode, using a pass-through PCI device with sVirt rendered KVM (Kernel-based Virtual Machine) unable to start a virtual machine. With this update, the "virt_use_sysfs" boolean has been updated to resolve this issue, and virtual machines no longer fail to start.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 21:50:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the audit.log
none
audit-coldplug.log
none
audit-hotplug.log none

Description koka xiong 2010-10-19 10:11:19 UTC
Created attachment 454316 [details]
the audit.log

Description of problem:
Passthrough pci device with svirt,then start the KVM machine,the KVM can't start

Version-Release number of selected component (if applicable):
libvirt-python-0.8.2-7.el5
libvirt-0.8.2-7.el5
kmod-kvm-83-205.el5
kvm-83-205.el5
kvm-qemu-img-83-205.el5


How reproducible:
always

Steps to Reproduce:
1.Make sure selinux is enabled.
# getenforce 
Enforcing
2.Prepare a VM which is not running
3.Add the following lines to domain xml

 <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x3f' slot='0x00' function='0x0'/>
      </source>
    </hostdev>

4.Start the VM.
  
Actual results:
The VM can't be started

Expected results:

Additional info:
 virsh dumpxml bootiso_test
<domain type='kvm'>
  <name>bootiso_test</name>
  <uuid>303257e8-752d-81db-72e3-06cbf3f3a5a7</uuid>
  <memory>1048576</memory>
  <currentMemory>1048576</currentMemory>
  <vcpu>2</vcpu>
  <os>
    <type arch='x86_64' machine='rhel5.5.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>destroy</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/bootiso_test.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <disk type='file' device='floppy'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/fd.img'/>
      <target dev='fda' bus='fdc'/>
      <address type='drive' controller='0' bus='0' unit='0'/>
    </disk>
    <controller type='ide' index='0'/>
    <controller type='fdc' index='0'/>
    <interface type='network'>
      <mac address='54:52:00:55:16:a9'/>
      <source network='default'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
    </interface>
    <serial type='file'>
      <source path='/var/log/vm-serial.log'/>
      <target port='0'/>
    </serial>
    <console type='file'>
      <source path='/var/log/vm-serial.log'/>
      <target port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
    <sound model='ac97'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x3f' slot='0x00' function='0x0'/>
      </source>
    </hostdev>
  </devices>
</domain>


tail -f /var/log/messages
Oct 19 13:44:03 dhcp-93-194 avahi-daemon[4036]: Registering new address record for 10.66.93.194 on eth0.
Oct 19 13:44:03 dhcp-93-194 NET[5505]: /sbin/dhclient-script : updated /etc/resolv.conf
Oct 19 13:44:03 dhcp-93-194 dhclient: bound to 10.66.93.194 -- renewal in 2799 seconds.
Oct 19 13:44:04 dhcp-93-194 avahi-daemon[4036]: New relevant interface eth0.IPv6 for mDNS.
Oct 19 13:44:04 dhcp-93-194 avahi-daemon[4036]: Joining mDNS multicast group on interface eth0.IPv6 with address fe80::7ae7:d1ff:fe7f:20ed.
Oct 19 13:44:04 dhcp-93-194 avahi-daemon[4036]: Registering new address record for fe80::7ae7:d1ff:fe7f:20ed on eth0.
Oct 19 13:44:05 dhcp-93-194 dnsmasq[4891]: reading /etc/resolv.conf
Oct 19 13:44:05 dhcp-93-194 dnsmasq[4891]: using nameserver 172.16.52.28#53
Oct 19 13:44:05 dhcp-93-194 dnsmasq[4891]: using nameserver 10.66.127.10#53
Oct 19 13:44:05 dhcp-93-194 dnsmasq[4891]: using nameserver 10.66.191.13#53

Comment 1 Daniel Berrangé 2010-10-19 10:36:37 UTC
Please check for any 'AVC' messages in /var/log/audit/audit.log

Most likely explanation is that the RHEL-5 kernel and/or selinux policy is probably too old to support fine-grained labelling of files on sysfs. IIRC we only added that capability in Fedora 12/13.

Comment 2 Daniel Walsh 2010-10-19 13:26:25 UTC
AVC's indicate it wants.

allow svirt_t self:capability { sys_rawio sys_admin };

Comment 3 Daniel Walsh 2010-10-19 13:26:55 UTC
I would figure this is a kernel issue?

Comment 4 Eric Paris 2010-10-20 13:03:33 UTC
Where do you have these logs?  The order or the calls and the syscall in question might help me figure out what is triggering these denials.

(I also don't know how device assignment works, and that would probably point me too)

Comment 5 Jiri Denemark 2010-10-21 09:50:05 UTC
*** Bug 638859 has been marked as a duplicate of this bug. ***

Comment 6 koka xiong 2010-10-22 05:09:55 UTC
Retested on Intel with
libvirt-python-0.8.2-8.el5
libvirt-0.8.2-8.el5
kmod-kvm-83-205.el5
kvm-83-205.el5
kvm-qemu-img-83-205.el5
Retested on Intel which support VT-d,this bug appears because the tested machine AMD doesn't support VT-d.So close this one as not a bug.

Comment 7 Jiri Denemark 2010-10-27 14:26:57 UTC
I don't think comment 6 is correct, it must have been configuration error of some sort. I can still reproduce it on my machine. With selinux in permissive mode, PCI passthrough works fine. In enforcing mode, I get the following error from qemu-kvm:

Failed to assign irq for "01:00.0": Operation not permitted
Perhaps you are assigning a device that shares an IRQ with another device?
Failed to initialize assigned device host=01:00.0

Similar error is shown when I start the guest without this PCI device and then try to hotplug it. Messages in audit.log are also similar in both cases.

Comment 8 Jiri Denemark 2010-10-27 14:27:54 UTC
Created attachment 455993 [details]
audit-coldplug.log

Comment 9 Jiri Denemark 2010-10-27 14:28:26 UTC
Created attachment 455995 [details]
audit-hotplug.log

Comment 10 Jiri Denemark 2010-10-27 14:30:49 UTC
*** Bug 638859 has been marked as a duplicate of this bug. ***

Comment 14 Eric Paris 2010-12-06 16:22:54 UTC
Jiri, can i ask what device you were passing through?

I see that something is check CAP_SYS_ADMIN on read() calls and CAP_SYS_RAWIO on some ioctl call.  I'm starting to look where such checks might be coming from...

Comment 15 Eric Paris 2010-12-06 17:29:31 UTC
I'm reassigning this to 'kernel.'  Our best guess right now is that libvirt has been rebased to support this but the appropriate upstream kernel and kvm changes have not been backported.  Hopefully the right virt people will be able to help run this down.

Comment 16 Chris Wright 2010-12-06 18:00:04 UTC
(In reply to comment #14)
> Jiri, can i ask what device you were passing through?
> 
> I see that something is check CAP_SYS_ADMIN on read() calls and CAP_SYS_RAWIO
> on some ioctl call.  I'm starting to look where such checks might be coming
> from...

CAP_SYS_RAWIO comes from kvm_assign_irq ioctl, CAP_SYS_ADMIN comes from core pci sysfs config space read function.

Jiri, If you set user and group to root and set clear_emulator_capabilities = 0 in /etc/libvirt/qemu.conf does this start working again (with SELinux in enforcing mode)?  That would help narrow this down to the libvirt rebase as opposed to any sVirt policy change.

Comment 17 Chris Wright 2010-12-06 18:49:08 UTC
BTW, the capabilities look fine here:

$ rpm -q libvirt
libvirt-0.8.2-6.el5

$ grep -e [UG]id -e ^Cap /proc/$(pidof libvirtd)/status
Uid:	0	0	0	0
Gid:	0	0	0	0
CapInh:	0000000000000000
CapPrm:	00000000fffffeff
CapEff:	00000000fffffeff
$ sudo virsh start rhel54
Domain rhel54 started

$ grep -e [UG]id -e ^Cap /proc/$(pidof qemu-kvm)/status
Uid:	0	0	0	0
Gid:	0	0	0	0
CapInh:	0000000000000000
CapPrm:	00000000fffffeff
CapEff:	00000000fffffeff

Comment 18 Eric Paris 2010-12-06 19:34:22 UTC
Talking to chris on irc he tells me that in RHEL5 qemu must have CAP_SYS_ADMIN and CAP_SYS_RAWIO to make PCI passthrough work.  I'm reassigning to selinux-policy and suggesting that we make this a boolean (default to off.)

In RHEL6 we do all of the restricted work in libvirt so qemu does not need these permissions.

Comment 19 Daniel Walsh 2010-12-06 19:59:00 UTC
Miroslav add it to virt_use_sysfs

Comment 20 Chris Wright 2010-12-06 20:08:42 UTC
I don't think we can default to off, since this is a feature that was working (although, I admit, I'm not sure what has changed to cause this regression since libvirt is not dropping privs, did policy change?).

Comment 22 Martin Jenner 2010-12-06 21:04:57 UTC
my issues are seen on an ibm-dx360 system using the follow device

[root@ibm-dx360m2-02 vt-d]# virsh nodedev-list --cap=net
net_00_1a_64_f1_22_42
net_00_1a_64_f1_22_43
net_02_1a_64_f1_22_46

[root@ibm-dx360m2-02 vt-d]# virsh nodedev-dumpxml net_00_1a_64_f1_22_43
<device>
  <name>net_00_1a_64_f1_22_43</name>
  <parent>pci_8086_10a7</parent>
  <capability type='net'>
    <interface>eth1</interface>
    <address>00:1a:64:f1:22:43</address>
    <capability type='80203'/>
  </capability>
</device>


[root@ibm-dx360m2-02 vt-d]# virsh nodedev-dumpxml pci_8086_10a7
<device>
  <name>pci_8086_10a7</name>
  <parent>pci_8086_3408</parent>
  <driver>
    <name>igb</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>11</bus>
    <slot>0</slot>
    <function>1</function>
    <product id='0x10a7'>82575EB Gigabit Network Connection</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
  </capability>
</device>

I am adding the following to the device section of the virtual machine

[root@ibm-dx360m2-02 vt-d]# cat nodedev-device.xml
  <hostdev mode="subsystem" type="pci" managed="yes">
    <source>
    <address domain="0x0000" bus="0x0b" slot="0x00" function="0x1"/>
    </source>
  </hostdev>

Comment 23 Martin Jenner 2010-12-06 21:08:40 UTC
I confirmed/setup the VT-d capabilities on the system using the following

- appended intel_iommu=on to kernel line in /boot/grub/grub.conf

title Red Hat Enterprise Linux Server-base (2.6.18-232.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-232.el5 ro root=/dev/VolGroup00/LogVol00 intel_iommu=on
        initrd /initrd-2.6.18-232.el5.img

- rebooted the system and checked the following 

[root@ibm-dx360m2-01 qemu]# dmesg | grep IOMM
Intel-IOMMU: enabled
IOMMU fe710000: ver 1:0 cap c90780106f0462 ecap f020f6
IOMMU 0xfe710000: using Queued invalidation
IOMMU: Setting RMRR:
IOMMU: Setting identity map for device 0000:00:1a.0 [0x7d890000 - 0x7d910000]
IOMMU: Setting identity map for device 0000:00:1a.1 [0x7d890000 - 0x7d910000]
IOMMU: Setting identity map for device 0000:00:1a.7 [0x7d890000 - 0x7d910000]
IOMMU: Setting identity map for device 0000:00:1d.0 [0x7d890000 - 0x7d910000]
IOMMU: Setting identity map for device 0000:00:1d.1 [0x7d890000 - 0x7d910000]
IOMMU: Setting identity map for device 0000:00:1d.2 [0x7d890000 - 0x7d910000]
IOMMU: Setting identity map for device 0000:00:1d.7 [0x7d890000 - 0x7d910000]
IOMMU: Prepare 0-16MiB unity mapping for LPC
IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000]

[root@ibm-dx360m2-01 qemu]# dmesg | grep DMAR
ACPI: DMAR (v001 IBM    THURLEY  0x00000001 IBM  0x01000013) @ 0x000000007f7eb000
DMAR:Host address width 51
DMAR:DRHD base: 0x000000fe710000 flags: 0x1
DMAR:RMRR base: 0x0000007d890000 end: 0x0000007d90ffff
DMAR:ATSR flags: 0x0

Comment 26 Miroslav Grepl 2010-12-07 10:44:16 UTC
I was going thru all comments. I have added a fix to 

selinux-policy-2.4.6-298.el5.

# sesearch -A -C -s svirt_t -c capability -p sys_rawio
Found 1 av rules:
DT allow svirt_t svirt_t : capability { sys_rawio sys_admin }; [ virt_use_sysfs ]

Which means rules are available using the "virt_use_sysfs" boolean.

Comment 33 Jaromir Hradilek 2011-01-05 16:22:38 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
With SELinux running in the enforcing mode, using a pass-through PCI device with sVirt rendered KVM (Kernel-based Virtual Machine) unable to start a virtual machine. With this update, the "virt_use_sysfs" boolean has been updated to resolve this issue, and virtual machines no longer fail to start.

Comment 35 errata-xmlrpc 2011-01-13 21:50:47 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0026.html

Comment 36 zhanghaiyan 2011-05-30 10:06:47 UTC
*** Bug 700320 has been marked as a duplicate of this bug. ***