Bug 667756
Summary: | [rhel6] [libvirt] unable to restore vm after hibernate when selinux is on (libvirtError: cannot close file: Bad file descriptor) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Haim <hateya> | ||||
Component: | libvirt | Assignee: | Laine Stump <laine> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 6.1 | CC: | abaron, bazulay, berrange, dallan, danken, dnaori, dwalsh, dyuan, eblake, eparis, gren, hateya, iheim, jdenemar, jialiu, jyang, mgoldboi, mgrepl, mzhan, vbian, xen-maint, yeylon, ykaul | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-0.8.7-4.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-05-19 13:25:29 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
I have duplicated the problem on my own setup, and am investigating. This is a bit strange and frustrating. Here's the AVC that shows up: type=1400 audit(1295473116.144:40581): avc: denied { read } for pid=3826 comm="qemu-kvm" path="pipe:[107734]" dev=pipefs ino=107734 scontext=system_u:system_r:svirt_t:s0:c374,c1011 tcontext=unconfined_u:system_r:virtd_t:s0-s0:c0.c1023 tclass=fifo_file *but* this failure does not happen if I run libvirtd directly from a shell prompt, or under gdb - in that case the restore completes with no problems. I only get this avc when libvirtd was run from the init.d script (either as part of system boot, or if I run "/etc/init.d/libvirtd start" from a root shell prompt at a later time. dwalsh - I'm assuming that the avc has something to do with the fact that the pipe is a conduit for a file that lives on an nfs volume (note that virt_use_nfs is turned on). Any idea what might make it behave differently when run from the shell / under gdb? Could anything in the environment have an effect? I guess the next step is to attempt to attach gdb to a running libvirtd that was started from the init.d script... When you are running libvirtd directly from the shell it won't transition to virtd_t, it will stay unconfined_t. Only if you run from the init script will it become confined. My guess is that this is fallout from using the new '-incoming fd:n' which directly accesses the file from qemu, then qemu closes the fd; compared to the old '-incoming exec:cat' where qemu only accessed a pipe and the subsidiary cat accessed the file, and qemu never closed the fd. Unfortunately, apparently it's not exec:cat vs. fd: - I modified the source to use the old method, and the error (including AVC) remains the same. One other notable point - if the NFS server isn't root-squashed, the AVC doesn't happen, and the restore is successful. This could be a leaked file descriptor from libvirt to svirt_t. Dan B we made some changes to set the label on the socket for MLS mode, do you think this is related? @laine I'm struggling to see why using root squash NFS would cause any difference here. In both cases, libvirtd opens a pipe, and passes one end of it to QEMU, so the labelling on that pipe wouldn't have changed. @dwalsh, the MLS socket stuff was labelling the libvirtd end of the QEMU monitor socket connection. This is a UNIX socket, so wouldn't appear as an AVC on a 'pipe:' object. I'm sure this FD is the pipe we pass to QEMU's -incoming arg when restoring from a file. Right, and that pipe was created by what? thus that pipe had a label of what? If libvirt is opening files and handing them to qemu they must be labeled in a way that qemu can handle them. Just the same way that when libvirt opens the disk image it has to put the right svirt label on the first so qemu can use it, it's going to have to put the right label on this pipe fd. Make sense? If you intend to pass fd's between libvirt and qemu, libvirt is going to have to label those fd's with the right label. We're not passing a pipe - we're passing the actual fd to the file (or block device) containing the snapshot image. I'm guessing that the issues are happening when the snapshot image file lives on root-squash NFS. Actually its more complicated than that. The logic on the restore codepath is approximately fd = open(savedimaged) if (error) { pipe(); pid = fork(); if (pid == 0) { setuid(qemu) setgid(qemu) fd = open(savedimaged) forever() read(fd) write(pipe[1]) } else { fd = pipe[0] } } THe complex error path there runs in NFS root squash scenarios. So we have either a FD for the file itself, or a pipe FD. The former is labelled already, the latter isn't, which could explain the difference laine sees with root squash. We also then have to sometimes layer in a decompression program (gunzip, etc), which can also result in QEMU getting a pipe instead of an FD. So turning on save compression should also cause this AVC if my diagnosis is correct. We likely need to use fsetxattr(fd) to give the pipe a suitable label. As I said in the irc, I think we have two options. One is to try fsetfilecon(pipe[1], "svirt_t:MCS") If this does not work. We need to update policy to allow this perhaps with a boolean, and then work to fix the kernel so fsetfilecon(pipe[1], "svirt_t:MCS") does work. fsetfilecon() *almost* works. selinux doesn't allow it to be done on fifos. Dan - do we need to file a separate bug to get that policy added, or can you just reference this BZ#? If selinux is in permissive mode, once I call fsetfilecon() on the pipe, I get a couple of extra AVCs, but not the AVC from qemu, and the restore is successful: Jan 21 14:11:03 stinkstation kernel: type=1400 audit(1295637063.215:40603): avc: denied { relabelfrom } for pid=13513 comm="libvirtd" name="" dev=pipefs ino=28036915 scontext=unconfined_u:system_r:virtd_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:virtd_t:s0-s0:c0.c1023 tclass=fifo_file Jan 21 14:11:03 stinkstation kernel: type=1400 audit(1295637063.238:40604): avc: denied { relabelto } for pid=13513 comm="libvirtd" name="" dev=pipefs ino=28036915 scontext=unconfined_u:system_r:virtd_t:s0-s0:c0.c1023 tcontext=system_u:system_r:svirt_t:s0:c440,c936 tclass=fifo_file Jan 21 14:11:03 stinkstation kernel: type=1400 audit(1295637063.260:40605): avc: denied { associate } for pid=13513 comm="libvirtd" name="" dev=pipefs ino=28036915 scontext=system_u:system_r:svirt_t:s0:c440,c936 tcontext=system_u:object_r:fs_t:s0 tclass=filesystem I should note that Dan Walsh suggested in irc that we should be using "svirt_image_t" rather than "svirt_t". If this is correct, then should that be for *everything* that uses seclabel.label, or just in certain cases? No for this case only since SELinux seems to be putting this label on a file system. svirt_image_t is for labels on disk. svirt_t is for process labels. In this case you are relabling a fifo_file, which I guess the kernel stores in some kind of file system. That would remove the second to avc messages. The first avc message will require a polciy change. allow virtd_t self:fifo_file { manage_fifo_file_perms relabelfrom relabelto }; The libvirt-side fix for this has been posted upstream: https://www.redhat.com/archives/libvir-list/2011-January/msg00991.html I tried a locally patched libvirt with selinux-policy-3.7.19-68.el6 (which contains the required policy change to allow fsetfilecon on a fifo), and restores from root-squash NFS are now successful. As soon as the libvirt patches are committed upstream, I will backport them to RHEL6.1 The backported series was sent to rhvirt-patches. For completeness, here are the upstream commit IDs of the patches. Note that the RHEL6.1 patches are slightly different, as the security driver code was refactored post-0.8.7. Details are in the patches sent to rhvirt-patches. commit d89608f994025aef9809bcb224e2d71f35fb85e9 Author: Laine Stump <laine> Date: Sun Jan 23 16:02:42 2011 -0500 Add a function to the security driver API that sets the label of an open fd. commit 34a19dda1c525e3e94a7b51cd161fafba8f2fbe8 Author: Laine Stump <laine> Date: Sun Jan 23 16:09:40 2011 -0500 Set SELinux context label of pipes used for qemu migration commit c9c794b52bea18d998e9affa0c166c6bcf475348 Author: Laine Stump <laine> Date: Mon Jan 24 11:58:15 2011 -0500 Manually kill gzip if restore fails before starting qemu (this isn't strictly necessary to fix the problem outlined in the bug report, but is in the same area, and is worth putting in while we're there) What is the setting of dynamic_ownership in /etc/libvirt/qemu.conf? If dynamic_ownership gets set back to 1, you will see a failure like this. It needs to be set to 0. Note also that the original problem was related to the *save image* also being on the NFS share, not just the disk. In the test in Comment 22, you only have the disk on NFS, but have put the save image on local disk. (See the previous comment for the question - I forgot to set needinfo when I posted it.) (In reply to comment #23) I have already set dynamic_ownership = 0, and I have tried again, Now this bug is verified as Passed. Environment : # uname -a Linux dhcp-65-85.nay.redhat.com 2.6.32-99.el6.x86_64 #1 SMP Fri Jan 14 10:46:00 EST 2011 x86_64 x86_64 x86_64 GNU/Linux libvirt-0.8.7-4.el6.x86_64 kernel-2.6.32-99.el6.x86_64 qemu-kvm-0.12.1.2-2.132.el6.x86_64 selinux-policy-3.7.19-68.el6.noarch Steps: 1. # setenforce 1 # getenforce Enforcing # setsebool virt_use_nfs on # getsebool -a|grep virt_use_nfs virt_use_nfs --> on 2. nfs server is on 10.66.65.85 # cat /etc/exports /var/lib/libvirt/images *(rw,root_squash) # service nfs start # iptables -F 3.# mount 10.66.65.85:/var/lib/libvirt/images/ /var/lib/libvirt/migrate/ # ll -d /var/lib/libvirt/images/ drwxr-xr-x. 2 qemu qemu 4096 Jan 28 06:39 /var/lib/libvirt/images/ 4.# virsh list --all Id Name State ---------------------------------- - rhel6 shut off # virsh dumpxml rhel6 ... <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/var/lib/libvirt/migrate/rhel6.img'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> ... # ll -d rhel6.img -rwxrwxrwx. 1 qemu qemu 5368709120 Feb 11 22:42 rhel6.img 5. start the guest # virsh start rhel6 Domain rhel6 started [root@dhcp-65-85 migrate]# virsh save rhel6 /var/lib/libvirt/migrate/rhel6.save Domain rhel6 saved to /var/lib/libvirt/migrate/rhel6.save [root@dhcp-65-85 migrate]# virsh restore rhel6.save Domain restored from rhel6.save # virsh list --all Id Name State ---------------------------------- 4 rhel6 running ------ I also can reproduce this bug with libvirt-0.8.7-1.el6.x86_64. # virsh restore /var/lib/libvirt/migrate/rhel6.save error: Failed to restore domain from /var/lib/libvirt/migrate/rhel6.save error: cannot close file: Bad file descriptor based on the above comment and on latest test conducted on libvirt, removing need info. see also bug 691499 for a manifestation of the problem when using compressed save images from a libvirtd run in an unconfined_t context tested with libvirt-0.8.7-18.el6.x86_64 qemu-kvm-0.12.1.2-2.158.el6.x86_64 kernel-2.6.32-131.0.1.el6.x86_64 selinux-policy-3.7.19-73.el6.noarch was able to restore vm after hibernate when selinux is on . So keep the bug status VERIFIED An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html |
Created attachment 472099 [details] libvirt\vdsm\qemu logs Description of problem: restore suspended vm fails and qemu process dies when selinux is set on enforcing with NFS storage. on vdsm log, i get the following libvirt error: Thread-500::ERROR::2011-01-06 18:08:24,082::vm::632::vds.vmlog.796d95ea-1640-4aea-9f12-0d9ea0440ee3::(_startUnderlyingVm) Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 602, in _startUnderlyingVm self._run() File "/usr/share/vdsm/libvirtvm.py", line 718, in _run self._connection.restore(fname) File "/usr/share/vdsm/libvirtvm.py", line 1081, in wrapper raise e libvirtError: cannot close file: Bad file descriptor - vm log: 2011-01-06 18:08:20.993: starting up LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu Conroe -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name rhel6-nfs-1 -uuid 796d95ea-1640-4aea-9f12-0d9ea0440ee3 -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/rhel6-nfs-1.monitor,server,nowait -mon chardev=monitor,mode=control -rtc base=2011-01-06T16:08:20 -boot c -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/cf4e325a-482b-4e20-8b1d-6b1acd5c7dc4/78cbee4a-f021-47d1-9f90-c6ef34c2935d/images/7c571638-4826-46ee-8a9b-9d4232154ace/f5d32eff-5adc-4787-a47d-3cccb98b8ccb,if=none,id=drive-virtio-disk0,boot=on,format=raw,serial=ee-8a9b-9d4232154ace,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=00:1a:4a:16:87:30,bus=pci.0,addr=0x3 -chardev socket,id=channel0,path=/var/lib/libvirt/qemu/channels/rhel6-nfs-1.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=0,chardev=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet,id=input0 -vnc 0:0,password -k en-us -vga cirrus -incoming exec:cat load of migration failed 2011-01-06 18:08:24.006: shutting down 19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/cpu/libvirt/qemu/rhel6-nfs-1/ (16) 19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/cpuacct/libvirt/qemu/rhel6-nfs-1/ and all child cgroups 19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/cpuacct/libvirt/qemu/rhel6-nfs-1/ 19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/cpuacct/libvirt/qemu/rhel6-nfs-1/ (16) 19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/cpuset/libvirt/qemu/rhel6-nfs-1/ and all child cgroups 19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/cpuset/libvirt/qemu/rhel6-nfs-1/ 19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/cpuset/libvirt/qemu/rhel6-nfs-1/ (16) 19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/memory/libvirt/qemu/rhel6-nfs-1/ and all child cgroups 19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/memory/libvirt/qemu/rhel6-nfs-1/ 19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/memory/libvirt/qemu/rhel6-nfs-1/ (16) 19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/devices/libvirt/qemu/rhel6-nfs-1/ and all child cgroups 19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/devices/libvirt/qemu/rhel6-nfs-1/ 19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/devices/libvirt/qemu/rhel6-nfs-1/ (16) 19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/freezer/libvirt/qemu/rhel6-nfs-1/ and all child cgroups 19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/freezer/libvirt/qemu/rhel6-nfs-1/ 19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/freezer/libvirt/qemu/rhel6-nfs-1/ (16) 19:12:41.252: 23377: debug : virCgroupNew:555 : New group /libvirt/qemu/rhel6-nfs-1 please note that when selinux is off, operation succeeds. repro steps: 1) make sure to work on nfs storage 2) make sure to start vm 3) make sure selinux is set on enforcing 4) make sure to suspend vm (migrate to file) 5) try to restore