Bug 644276
Summary: | The kvm VM can't be started when Passthrough pci device with svirt | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | koka xiong <kxiong> | ||||||||
Component: | selinux-policy | Assignee: | Miroslav Grepl <mgrepl> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Milos Malik <mmalik> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 5.6 | CC: | ajia, berrange, bsarathy, chrisw, dwalsh, eblake, eparis, jdenemar, llim, mjenner, mmalik, mzhan, sgrubb, virt-maint, xen-maint, yoyzhang | ||||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | selinux-policy-2.4.6-298.el5 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
With SELinux running in the enforcing mode, using a pass-through PCI device with sVirt rendered KVM (Kernel-based Virtual Machine) unable to start a virtual machine. With this update, the "virt_use_sysfs" boolean has been updated to resolve this issue, and virtual machines no longer fail to start.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2011-01-13 21:50:47 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Please check for any 'AVC' messages in /var/log/audit/audit.log Most likely explanation is that the RHEL-5 kernel and/or selinux policy is probably too old to support fine-grained labelling of files on sysfs. IIRC we only added that capability in Fedora 12/13. AVC's indicate it wants. allow svirt_t self:capability { sys_rawio sys_admin }; I would figure this is a kernel issue? Where do you have these logs? The order or the calls and the syscall in question might help me figure out what is triggering these denials. (I also don't know how device assignment works, and that would probably point me too) *** Bug 638859 has been marked as a duplicate of this bug. *** Retested on Intel with libvirt-python-0.8.2-8.el5 libvirt-0.8.2-8.el5 kmod-kvm-83-205.el5 kvm-83-205.el5 kvm-qemu-img-83-205.el5 Retested on Intel which support VT-d,this bug appears because the tested machine AMD doesn't support VT-d.So close this one as not a bug. I don't think comment 6 is correct, it must have been configuration error of some sort. I can still reproduce it on my machine. With selinux in permissive mode, PCI passthrough works fine. In enforcing mode, I get the following error from qemu-kvm: Failed to assign irq for "01:00.0": Operation not permitted Perhaps you are assigning a device that shares an IRQ with another device? Failed to initialize assigned device host=01:00.0 Similar error is shown when I start the guest without this PCI device and then try to hotplug it. Messages in audit.log are also similar in both cases. Created attachment 455993 [details]
audit-coldplug.log
Created attachment 455995 [details]
audit-hotplug.log
*** Bug 638859 has been marked as a duplicate of this bug. *** Jiri, can i ask what device you were passing through? I see that something is check CAP_SYS_ADMIN on read() calls and CAP_SYS_RAWIO on some ioctl call. I'm starting to look where such checks might be coming from... I'm reassigning this to 'kernel.' Our best guess right now is that libvirt has been rebased to support this but the appropriate upstream kernel and kvm changes have not been backported. Hopefully the right virt people will be able to help run this down. (In reply to comment #14) > Jiri, can i ask what device you were passing through? > > I see that something is check CAP_SYS_ADMIN on read() calls and CAP_SYS_RAWIO > on some ioctl call. I'm starting to look where such checks might be coming > from... CAP_SYS_RAWIO comes from kvm_assign_irq ioctl, CAP_SYS_ADMIN comes from core pci sysfs config space read function. Jiri, If you set user and group to root and set clear_emulator_capabilities = 0 in /etc/libvirt/qemu.conf does this start working again (with SELinux in enforcing mode)? That would help narrow this down to the libvirt rebase as opposed to any sVirt policy change. BTW, the capabilities look fine here: $ rpm -q libvirt libvirt-0.8.2-6.el5 $ grep -e [UG]id -e ^Cap /proc/$(pidof libvirtd)/status Uid: 0 0 0 0 Gid: 0 0 0 0 CapInh: 0000000000000000 CapPrm: 00000000fffffeff CapEff: 00000000fffffeff $ sudo virsh start rhel54 Domain rhel54 started $ grep -e [UG]id -e ^Cap /proc/$(pidof qemu-kvm)/status Uid: 0 0 0 0 Gid: 0 0 0 0 CapInh: 0000000000000000 CapPrm: 00000000fffffeff CapEff: 00000000fffffeff Talking to chris on irc he tells me that in RHEL5 qemu must have CAP_SYS_ADMIN and CAP_SYS_RAWIO to make PCI passthrough work. I'm reassigning to selinux-policy and suggesting that we make this a boolean (default to off.) In RHEL6 we do all of the restricted work in libvirt so qemu does not need these permissions. Miroslav add it to virt_use_sysfs I don't think we can default to off, since this is a feature that was working (although, I admit, I'm not sure what has changed to cause this regression since libvirt is not dropping privs, did policy change?). my issues are seen on an ibm-dx360 system using the follow device [root@ibm-dx360m2-02 vt-d]# virsh nodedev-list --cap=net net_00_1a_64_f1_22_42 net_00_1a_64_f1_22_43 net_02_1a_64_f1_22_46 [root@ibm-dx360m2-02 vt-d]# virsh nodedev-dumpxml net_00_1a_64_f1_22_43 <device> <name>net_00_1a_64_f1_22_43</name> <parent>pci_8086_10a7</parent> <capability type='net'> <interface>eth1</interface> <address>00:1a:64:f1:22:43</address> <capability type='80203'/> </capability> </device> [root@ibm-dx360m2-02 vt-d]# virsh nodedev-dumpxml pci_8086_10a7 <device> <name>pci_8086_10a7</name> <parent>pci_8086_3408</parent> <driver> <name>igb</name> </driver> <capability type='pci'> <domain>0</domain> <bus>11</bus> <slot>0</slot> <function>1</function> <product id='0x10a7'>82575EB Gigabit Network Connection</product> <vendor id='0x8086'>Intel Corporation</vendor> </capability> </device> I am adding the following to the device section of the virtual machine [root@ibm-dx360m2-02 vt-d]# cat nodedev-device.xml <hostdev mode="subsystem" type="pci" managed="yes"> <source> <address domain="0x0000" bus="0x0b" slot="0x00" function="0x1"/> </source> </hostdev> I confirmed/setup the VT-d capabilities on the system using the following - appended intel_iommu=on to kernel line in /boot/grub/grub.conf title Red Hat Enterprise Linux Server-base (2.6.18-232.el5) root (hd0,0) kernel /vmlinuz-2.6.18-232.el5 ro root=/dev/VolGroup00/LogVol00 intel_iommu=on initrd /initrd-2.6.18-232.el5.img - rebooted the system and checked the following [root@ibm-dx360m2-01 qemu]# dmesg | grep IOMM Intel-IOMMU: enabled IOMMU fe710000: ver 1:0 cap c90780106f0462 ecap f020f6 IOMMU 0xfe710000: using Queued invalidation IOMMU: Setting RMRR: IOMMU: Setting identity map for device 0000:00:1a.0 [0x7d890000 - 0x7d910000] IOMMU: Setting identity map for device 0000:00:1a.1 [0x7d890000 - 0x7d910000] IOMMU: Setting identity map for device 0000:00:1a.7 [0x7d890000 - 0x7d910000] IOMMU: Setting identity map for device 0000:00:1d.0 [0x7d890000 - 0x7d910000] IOMMU: Setting identity map for device 0000:00:1d.1 [0x7d890000 - 0x7d910000] IOMMU: Setting identity map for device 0000:00:1d.2 [0x7d890000 - 0x7d910000] IOMMU: Setting identity map for device 0000:00:1d.7 [0x7d890000 - 0x7d910000] IOMMU: Prepare 0-16MiB unity mapping for LPC IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0x1000000] [root@ibm-dx360m2-01 qemu]# dmesg | grep DMAR ACPI: DMAR (v001 IBM THURLEY 0x00000001 IBM 0x01000013) @ 0x000000007f7eb000 DMAR:Host address width 51 DMAR:DRHD base: 0x000000fe710000 flags: 0x1 DMAR:RMRR base: 0x0000007d890000 end: 0x0000007d90ffff DMAR:ATSR flags: 0x0 I was going thru all comments. I have added a fix to selinux-policy-2.4.6-298.el5. # sesearch -A -C -s svirt_t -c capability -p sys_rawio Found 1 av rules: DT allow svirt_t svirt_t : capability { sys_rawio sys_admin }; [ virt_use_sysfs ] Which means rules are available using the "virt_use_sysfs" boolean. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: With SELinux running in the enforcing mode, using a pass-through PCI device with sVirt rendered KVM (Kernel-based Virtual Machine) unable to start a virtual machine. With this update, the "virt_use_sysfs" boolean has been updated to resolve this issue, and virtual machines no longer fail to start. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0026.html *** Bug 700320 has been marked as a duplicate of this bug. *** |
Created attachment 454316 [details] the audit.log Description of problem: Passthrough pci device with svirt,then start the KVM machine,the KVM can't start Version-Release number of selected component (if applicable): libvirt-python-0.8.2-7.el5 libvirt-0.8.2-7.el5 kmod-kvm-83-205.el5 kvm-83-205.el5 kvm-qemu-img-83-205.el5 How reproducible: always Steps to Reproduce: 1.Make sure selinux is enabled. # getenforce Enforcing 2.Prepare a VM which is not running 3.Add the following lines to domain xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x3f' slot='0x00' function='0x0'/> </source> </hostdev> 4.Start the VM. Actual results: The VM can't be started Expected results: Additional info: virsh dumpxml bootiso_test <domain type='kvm'> <name>bootiso_test</name> <uuid>303257e8-752d-81db-72e3-06cbf3f3a5a7</uuid> <memory>1048576</memory> <currentMemory>1048576</currentMemory> <vcpu>2</vcpu> <os> <type arch='x86_64' machine='rhel5.5.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>destroy</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/bootiso_test.img'/> <target dev='vda' bus='virtio'/> <address type='drive' controller='0' bus='0' unit='0'/> </disk> <disk type='file' device='floppy'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/fd.img'/> <target dev='fda' bus='fdc'/> <address type='drive' controller='0' bus='0' unit='0'/> </disk> <controller type='ide' index='0'/> <controller type='fdc' index='0'/> <interface type='network'> <mac address='54:52:00:55:16:a9'/> <source network='default'/> <target dev='vnet0'/> <model type='virtio'/> </interface> <serial type='file'> <source path='/var/log/vm-serial.log'/> <target port='0'/> </serial> <console type='file'> <source path='/var/log/vm-serial.log'/> <target port='0'/> </console> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/> <sound model='ac97'/> <video> <model type='cirrus' vram='9216' heads='1'/> </video> <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x3f' slot='0x00' function='0x0'/> </source> </hostdev> </devices> </domain> tail -f /var/log/messages Oct 19 13:44:03 dhcp-93-194 avahi-daemon[4036]: Registering new address record for 10.66.93.194 on eth0. Oct 19 13:44:03 dhcp-93-194 NET[5505]: /sbin/dhclient-script : updated /etc/resolv.conf Oct 19 13:44:03 dhcp-93-194 dhclient: bound to 10.66.93.194 -- renewal in 2799 seconds. Oct 19 13:44:04 dhcp-93-194 avahi-daemon[4036]: New relevant interface eth0.IPv6 for mDNS. Oct 19 13:44:04 dhcp-93-194 avahi-daemon[4036]: Joining mDNS multicast group on interface eth0.IPv6 with address fe80::7ae7:d1ff:fe7f:20ed. Oct 19 13:44:04 dhcp-93-194 avahi-daemon[4036]: Registering new address record for fe80::7ae7:d1ff:fe7f:20ed on eth0. Oct 19 13:44:05 dhcp-93-194 dnsmasq[4891]: reading /etc/resolv.conf Oct 19 13:44:05 dhcp-93-194 dnsmasq[4891]: using nameserver 172.16.52.28#53 Oct 19 13:44:05 dhcp-93-194 dnsmasq[4891]: using nameserver 10.66.127.10#53 Oct 19 13:44:05 dhcp-93-194 dnsmasq[4891]: using nameserver 10.66.191.13#53