Summary: SELinux is preventing qemu-kvm "write" access on /dev/dm-5. Detailed Description: SELinux denied access requested by qemu-kvm. It is not expected that this access is required by qemu-kvm and this access may signal an intrusion attempt. It is also possible that the specific version or configuration of the application is causing it to require additional access. Allowing Access: You can generate a local policy module to allow this access - see FAQ (http://docs.fedoraproject.org/selinux-faq-fc5/#id2961385) Please file a bug report. Additional Information: Source Context system_u:system_r:svirt_t:s0:c202,c996 Target Context system_u:object_r:fixed_disk_device_t:s0 Target Objects /dev/dm-5 [ blk_file ] Source qemu-kvm Source Path /usr/bin/qemu-kvm Port <Unknown> Host (removed) Source RPM Packages Target RPM Packages Policy RPM selinux-policy-3.7.11-1.fc13 Selinux Enabled True Policy Type targeted Enforcing Mode Enforcing Plugin Name catchall Host Name (removed) Platform Linux (removed) 2.6.33-1.fc13.i686.PAE #1 SMP Wed Feb 24 19:54:49 UTC 2010 i686 i686 Alert Count 11629 First Seen Tue 09 Mar 2010 10:00:34 AM GMT Last Seen Tue 09 Mar 2010 10:10:28 AM GMT Local ID c5168667-01d2-43d4-a7d0-e64fdebc5f93 Line Numbers Raw Audit Messages node=(removed) type=AVC msg=audit(1268129428.874:128494): avc: denied { write } for pid=28490 comm="qemu-kvm" path="/dev/dm-5" dev=devtmpfs ino=166267 scontext=system_u:system_r:svirt_t:s0:c202,c996 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file Hash String generated from catchall,qemu-kvm,svirt_t,fixed_disk_device_t,blk_file,write audit2allow suggests: #============= svirt_t ============== allow svirt_t fixed_disk_device_t:blk_file write;
This happened part way through an installation of a VM (Debian 5 as it happens) using virt-install. /dev/dm-5 is the VM's disk device. When this happens, installation appears to hang and the VM is unresponsive. I guess it must be getting loads of disk errors. Note that installation got quite far in before this happened.
WTF .. The reason is because I happened to run 'virt-df' at the same time (I didn't think that could possibly affect the installation, which is why I didn't mention it). The device is labelled like this during the first part of the installation: brw-rw----. qemu qemu system_u:object_r:svirt_image_t:s0:c129,c671 /dev/dm-5 After running virt-df, the label changes to: brw-rw----. root disk system_u:object_r:fixed_disk_device_t:s0 /dev/dm-5 I'm not aware of how virt-df could change the label. All it does is to run qemu directly on the disk image (with snapshot=on so it won't modify the disk image, just read from it).
(In reply to comment #2) > brw-rw----. root disk system_u:object_r:fixed_disk_device_t:s0 /dev/dm-5 > > I'm not aware of how virt-df could change the label. All it does is to run > qemu directly on the disk image (with snapshot=on so it won't modify the > disk image, just read from it). my guess is that restorecond did it, this is the context from the policy: /dev/dm-[0-9]+ block device system_u:object_r:fixed_disk_device_t:s0
I used systemtap to locate the offending process: udevd! udevd(28299) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:fixed_disk_device_t:s0 udevd(28299) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:fixed_disk_device_t:s0 udevd(28299) vfs_setxattr 0x5/166267 0 security.selinux system_u:object_r:fixed_disk_device_t:s0
My SystemTap script, in case anyone wasn't to reproduce it: http://rwmj.wordpress.com/2010/03/09/tip-use-systemtap-to-monitor-selinux-changes-to-files/
restorecond only watches files listed in /etc/selinux/restorecond.conf It does not touch /dev/ So the question is why does udev change the label of an existing device. udev should only be setting the label on device creation.
if a device is closed after being open for a write, udev gets a "change" event, and resets all labels, symlinks, etc.
qemu has some perverse behaviour where it will always open the device for write, even if snapshot mode is requested (which should only require read access). However ... (In reply to comment #7) > if a device is closed after being open for a write, udev gets a "change" event, > and resets all labels, symlinks, etc. This doesn't sound good to me. It seems with sVirt, the intention is: (1) At device creation, udevd labels the device with system_u:object_r:fixed_disk_device. (2) When libvirt starts the domain, it is relabeled with some temporary svirt label. (3) When libvirt stops the domain, libvirt relabels it back to system_u:object_r:fixed_disk_device. After step (1), libvirt is relying on udevd not to touch the labels on the device node ever again. Can we tell udevd not to touch the label on the device unless it is actually being created?
(In reply to comment #8) > Can we tell udevd not to touch the label on the device > unless it is actually being created? Hmm, keep the device open after (2) ...
(In reply to comment #9) > (In reply to comment #8) > > Can we tell udevd not to touch the label on the device > > unless it is actually being created? > > Hmm, keep the device open after (2) ... There are two separate and unrelated processes involved here, so that is not practical. We need to have the udevd rules reflect that labels should not be set unless the device is being created anew.
> if a device is closed after being open for a write, udev gets a "change" event, > and resets all labels, symlinks, etc. This is madness. Udev is inferring that 'closed after write' implies the app in question no longer needs/wants the device. This cannot possibly be a safe assumption in the general case, as this BZ report demonstrates.
Well, this is because block devices can be formatted, etc. They can get new UUIDs, LABELs, partitions, filesystem types, etc. Probably owner, group, permissions and selinux context should not be changed on "change" events.
(In reply to comment #12) > Well, this is because block devices can be formatted, etc. They can get new > UUIDs, LABELs, partitions, filesystem types, etc. > > Probably owner, group, permissions and selinux context should not be changed on > "change" events. We don't touch owner, group, mode if they did not change, we only call lsetfilecon(). It's in udev-node.c. We can remove it, if you think we don't need it.
(In reply to comment #13) > (In reply to comment #12) > > Well, this is because block devices can be formatted, etc. They can get new > > UUIDs, LABELs, partitions, filesystem types, etc. > > > > Probably owner, group, permissions and selinux context should not be changed on > > "change" events. > > We don't touch owner, group, mode if they did not change, we only call > lsetfilecon(). > > It's in udev-node.c. We can remove it, if you think we don't need it. Hmm, yes, we should not do that for "change" events.
> We don't touch owner, group, mode if they did not change, we only call > lsetfilecon(). That's the crux of the problem - they will have changed in the context of virtual machines. The owner/group will have been changed to 'qemu:qemu' and file context to 'svirt_image_t'. The mode is the only thing that isn't changed, we just assume it was always mode 0700 or 0770, but that is probably a mistake to rely on too.
(In reply to comment #15) > > We don't touch owner, group, mode if they did not change, we only call > > lsetfilecon(). > > That's the crux of the problem - they will have changed in the context of > virtual machines. The owner/group will have been changed to 'qemu:qemu' and > file context to 'svirt_image_t'. The mode is the only thing that isn't changed, > we just assume it was always mode 0700 or 0770, but that is probably a mistake > to rely on too. Changing owner, group, mode by external tools is not supported, udev will restore all that to the configured values with the next event, which can happen any time. Such changes needs to happen with udev rules or by applying ACL's.
(In reply to comment #14) > > It's in udev-node.c. We can remove it, if you think we don't need it. > > Hmm, yes, we should not do that for "change" events. We do not distinguish between "add" and "change". So we just remove this line? http://git.kernel.org/?p=linux/hotplug/udev.git;a=blob;f=udev/udev-node.c;h=4314cceb79893d8f966ff8aa056ea8e7d265e766;hb=HEAD#l58
(In reply to comment #17) > (In reply to comment #14) > > > It's in udev-node.c. We can remove it, if you think we don't need it. > > > > Hmm, yes, we should not do that for "change" events. > > We do not distinguish between "add" and "change". So we just remove this line? > > http://git.kernel.org/?p=linux/hotplug/udev.git;a=blob;f=udev/udev-node.c;h=4314cceb79893d8f966ff8aa056ea8e7d265e766;hb=HEAD#l58 Hmm, ok, so restorecon needs to be run after manual creation (which is done already anyway) before udev "add" events.
(In reply to comment #16) > (In reply to comment #15) > > > We don't touch owner, group, mode if they did not change, we only call > > > lsetfilecon(). > > > > That's the crux of the problem - they will have changed in the context of > > virtual machines. The owner/group will have been changed to 'qemu:qemu' and > > file context to 'svirt_image_t'. The mode is the only thing that isn't changed, > > we just assume it was always mode 0700 or 0770, but that is probably a mistake > > to rely on too. > > Changing owner, group, mode by external tools is not supported, udev will > restore all that to the configured values with the next event [...] You mention "owner, group, mode" there. Does that also include SELinux label or not?
> Changing owner, group, mode by external tools is not supported, udev will > restore all that to the configured values with the next event, which can happen > any time. Such changes needs to happen with udev rules or by applying ACL's. Farming this out to udev rules is not practical because it means that at every guest startup attempt, or disk hotplug action, we'd need to write out new udev files, busy-wait while the kernel got around to scheduling udev & udev loading & then processing the new udev rules, before libvirt can continue launching the guest / attaching the disk. This kind of approach was done with Xen & was a complete disaster resulting in very unreliable & slow guest launch when the host was under high loads which delayed udevd processing rules promptly enough. Compare that complexity with the simple chown() call that is currently done. Udev must not randomly change/reset devices which are already present & in active use on the system.
From an Accreditation standard this would also be very bad. If an admin changes the permissions (MAC/DAC) on a device, they had better stay until reboot. For example if they change a device to Trusted and udev changes it back to system low, that would be very bad.
(In reply to comment #19) > > Changing owner, group, mode by external tools is not supported, udev will > > restore all that to the configured values with the next event [...] > > You mention "owner, group, mode" there. Does that also include > SELinux label or not? If the primary ownerships configured by udev rules are not touched, these should not be touched too. At least it can be easily fixed to do it that way.
(In reply to comment #21) > For example if they change a device to Trusted and udev changes it back to > system low, that would be very bad. Until now, the udev rules are the primary source for that, and no external tool is expected to change stuff behind its back. At least temporary udev rules need to be placed on the system to reflect such intended changes.
(In reply to comment #20) > Compare that complexity with the simple chown() call that is currently done. > Udev must not randomly change/reset devices which are already present & in > active use on the system. You could still chmod(), but only after you have installed temporary udev rules, which specify the same permissions, so that udev in case of receiving an update event needs what it should do. Using ACLs, and not changing primary and standardized permissions, if possible, would be generally preferred though.