Bug 667756

Summary:

[rhel6] [libvirt] unable to restore vm after hibernate when selinux is on (libvirtError: cannot close file: Bad file descriptor)

Product:

Red Hat Enterprise Linux 6

Reporter:

Haim <hateya>

Component:

libvirt

Assignee:

Laine Stump <laine>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

high

Docs Contact:

Priority:

low

Version:

6.1

CC:

abaron, bazulay, berrange, dallan, danken, dnaori, dwalsh, dyuan, eblake, eparis, gren, hateya, iheim, jdenemar, jialiu, jyang, mgoldboi, mgrepl, mzhan, vbian, xen-maint, yeylon, ykaul

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

libvirt-0.8.7-4.el6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-05-19 13:25:29 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
libvirt\vdsm\qemu logs	none

Description Haim 2011-01-06 17:21:29 UTC

Created attachment 472099 [details]
libvirt\vdsm\qemu logs

Description of problem:

restore suspended vm fails and qemu process dies when selinux is set on enforcing with NFS storage. 

on vdsm log, i get the following libvirt error: 

Thread-500::ERROR::2011-01-06 18:08:24,082::vm::632::vds.vmlog.796d95ea-1640-4aea-9f12-0d9ea0440ee3::(_startUnderlyingVm) Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 602, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 718, in _run
    self._connection.restore(fname)
  File "/usr/share/vdsm/libvirtvm.py", line 1081, in wrapper
    raise e
libvirtError: cannot close file: Bad file descriptor

- vm log: 

2011-01-06 18:08:20.993: starting up
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu Conroe -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name rhel6-nfs-1 -uuid 796d95ea-1640-4aea-9f12-0d9ea0440ee3 -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/rhel6-nfs-1.monitor,server,nowait -mon chardev=monitor,mode=control -rtc base=2011-01-06T16:08:20 -boot c -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/cf4e325a-482b-4e20-8b1d-6b1acd5c7dc4/78cbee4a-f021-47d1-9f90-c6ef34c2935d/images/7c571638-4826-46ee-8a9b-9d4232154ace/f5d32eff-5adc-4787-a47d-3cccb98b8ccb,if=none,id=drive-virtio-disk0,boot=on,format=raw,serial=ee-8a9b-9d4232154ace,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=00:1a:4a:16:87:30,bus=pci.0,addr=0x3 -chardev socket,id=channel0,path=/var/lib/libvirt/qemu/channels/rhel6-nfs-1.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=0,chardev=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet,id=input0 -vnc 0:0,password -k en-us -vga cirrus -incoming exec:cat
load of migration failed
2011-01-06 18:08:24.006: shutting down


19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/cpu/libvirt/qemu/rhel6-nfs-1/ (16)
19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/cpuacct/libvirt/qemu/rhel6-nfs-1/ and all child cgroups
19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/cpuacct/libvirt/qemu/rhel6-nfs-1/
19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/cpuacct/libvirt/qemu/rhel6-nfs-1/ (16)
19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/cpuset/libvirt/qemu/rhel6-nfs-1/ and all child cgroups
19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/cpuset/libvirt/qemu/rhel6-nfs-1/
19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/cpuset/libvirt/qemu/rhel6-nfs-1/ (16)
19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/memory/libvirt/qemu/rhel6-nfs-1/ and all child cgroups
19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/memory/libvirt/qemu/rhel6-nfs-1/
19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/memory/libvirt/qemu/rhel6-nfs-1/ (16)
19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/devices/libvirt/qemu/rhel6-nfs-1/ and all child cgroups
19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/devices/libvirt/qemu/rhel6-nfs-1/
19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/devices/libvirt/qemu/rhel6-nfs-1/ (16)
19:12:41.052: 23377: debug : virCgroupRemove:710 : Removing cgroup /cgroup/freezer/libvirt/qemu/rhel6-nfs-1/ and all child cgroups
19:12:41.052: 23377: debug : virCgroupRemoveRecursively:665 : Removing cgroup /cgroup/freezer/libvirt/qemu/rhel6-nfs-1/
19:12:41.052: 23377: error : virCgroupRemoveRecursively:668 : Unable to remove /cgroup/freezer/libvirt/qemu/rhel6-nfs-1/ (16)
19:12:41.252: 23377: debug : virCgroupNew:555 : New group /libvirt/qemu/rhel6-nfs-1

please note that when selinux is off, operation succeeds. 

repro steps: 

1) make sure to work on nfs storage 
2) make sure to start vm 
3) make sure selinux is set on enforcing 
4) make sure to suspend vm (migrate to file)
5) try to restore

Comment 3 Laine Stump 2011-01-18 18:29:58 UTC

I have duplicated the problem on my own setup, and am investigating.

Comment 4 Laine Stump 2011-01-19 21:51:57 UTC

This is a bit strange and frustrating. Here's the AVC that shows up:

type=1400 audit(1295473116.144:40581): avc:  denied  { read } for  pid=3826 comm="qemu-kvm" path="pipe:[107734]" dev=pipefs ino=107734 scontext=system_u:system_r:svirt_t:s0:c374,c1011 tcontext=unconfined_u:system_r:virtd_t:s0-s0:c0.c1023 tclass=fifo_file

*but* this failure does not happen if I run libvirtd directly from a shell prompt, or under gdb - in that case the restore completes with no problems. I only get this avc when libvirtd was run from the init.d script (either as part of system boot, or if I run "/etc/init.d/libvirtd start" from a root shell prompt at a later time.

dwalsh - I'm assuming that the avc has something to do with the fact that the pipe is a conduit for a file that lives on an nfs volume (note that virt_use_nfs is turned on). Any idea what might make it behave differently when run from the shell / under gdb? Could anything in the environment have an effect?

I guess the next step is to attempt to attach gdb to a running libvirtd that was started from the init.d script...

Comment 5 Daniel Berrangé 2011-01-20 10:30:57 UTC

When you are running libvirtd directly from the shell it won't transition to virtd_t, it will stay unconfined_t. Only if you run from the init script will it become confined.

Comment 6 Eric Blake 2011-01-20 16:44:18 UTC

My guess is that this is fallout from using the new '-incoming fd:n' which directly accesses the file from qemu, then qemu closes the fd; compared to the old '-incoming exec:cat' where qemu only accessed a pipe and the subsidiary cat accessed the file, and qemu never closed the fd.

Comment 7 Laine Stump 2011-01-20 18:10:21 UTC

Unfortunately, apparently it's not exec:cat vs. fd: - I modified the source to use the old method, and the error (including AVC) remains the same.

One other notable point - if the NFS server isn't root-squashed, the AVC doesn't happen, and the restore is successful.

Comment 8 Daniel Walsh 2011-01-20 21:40:00 UTC

This could be a leaked file descriptor from libvirt to svirt_t.

Dan B we made some changes to set the label on the socket for MLS mode, do you think this is related?

Comment 9 Daniel Berrangé 2011-01-21 14:37:57 UTC

@laine I'm struggling to see why using root squash NFS would cause any difference here. In both cases, libvirtd opens a pipe, and passes one end of it to QEMU, so the labelling on that pipe wouldn't have changed.

@dwalsh, the MLS socket stuff was labelling the libvirtd end of the QEMU monitor socket connection. This is a UNIX socket, so wouldn't appear as an AVC on a 'pipe:' object.  I'm sure this FD is the pipe we pass to QEMU's -incoming arg when restoring from a file.

Comment 10 Eric Paris 2011-01-21 15:24:21 UTC

Right, and that pipe was created by what?  thus that pipe had a label of what?  If libvirt is opening files and handing them to qemu they must be labeled in a way that qemu can handle them.  Just the same way that when libvirt opens the disk image it has to put the right svirt label on the first so qemu can use it, it's going to have to put the right label on this pipe fd.  Make sense?  If you intend to pass fd's between libvirt and qemu, libvirt is going to have to label those fd's with the right label.

Comment 11 Eric Blake 2011-01-21 15:33:23 UTC

We're not passing a pipe - we're passing the actual fd to the file (or block device) containing the snapshot image.  I'm guessing that the issues are happening when the snapshot image file lives on root-squash NFS.

Comment 12 Daniel Berrangé 2011-01-21 15:39:09 UTC

Actually its more complicated than that. The logic on the restore codepath is approximately

    fd = open(savedimaged)
    if (error) {
         pipe();
         pid = fork();
         if (pid == 0) {
             setuid(qemu)
             setgid(qemu)
             fd = open(savedimaged)
             forever()
                read(fd)
                write(pipe[1])
         } else {
            fd = pipe[0]
         }
     }

THe complex error path there runs in NFS root squash scenarios. So we have either a FD for the file itself, or a pipe FD. The former is labelled already, the latter isn't, which could explain the difference laine sees with root squash.

We also then have to sometimes layer in a decompression program (gunzip, etc), which can also result in QEMU getting a pipe instead of an FD. So turning on save compression should also cause this AVC if my diagnosis is correct.

We likely need to use  fsetxattr(fd) to give the pipe a suitable label.

Comment 13 Daniel Walsh 2011-01-21 16:01:44 UTC

As I said in the irc, I think we have two options.

One is to try

fsetfilecon(pipe[1], "svirt_t:MCS")

If this does not work. We need to update policy to allow this perhaps with a boolean, and then work to fix the kernel so 
fsetfilecon(pipe[1], "svirt_t:MCS")

does work.

Comment 14 Laine Stump 2011-01-24 13:56:04 UTC

fsetfilecon() *almost* works. selinux doesn't allow it to be done on fifos. Dan - do we need to file a separate bug to get that policy added, or can you just reference this BZ#?

If selinux is in permissive mode, once I call fsetfilecon() on the pipe, I get a couple of extra AVCs, but not the AVC from qemu, and the restore is successful:

Jan 21 14:11:03 stinkstation kernel: type=1400 audit(1295637063.215:40603): avc:  denied  { relabelfrom } for  pid=13513 comm="libvirtd" name="" dev=pipefs ino=28036915 scontext=unconfined_u:system_r:virtd_t:s0-s0:c0.c1023 tcontext=unconfined_u:system_r:virtd_t:s0-s0:c0.c1023 tclass=fifo_file
Jan 21 14:11:03 stinkstation kernel: type=1400 audit(1295637063.238:40604): avc:  denied  { relabelto } for  pid=13513 comm="libvirtd" name="" dev=pipefs ino=28036915 scontext=unconfined_u:system_r:virtd_t:s0-s0:c0.c1023 tcontext=system_u:system_r:svirt_t:s0:c440,c936 tclass=fifo_file
Jan 21 14:11:03 stinkstation kernel: type=1400 audit(1295637063.260:40605): avc:  denied  { associate } for  pid=13513 comm="libvirtd" name="" dev=pipefs ino=28036915 scontext=system_u:system_r:svirt_t:s0:c440,c936 tcontext=system_u:object_r:fs_t:s0 tclass=filesystem

I should note that Dan Walsh suggested in irc that we should be using "svirt_image_t" rather than "svirt_t". If this is correct, then should that be for *everything* that uses seclabel.label, or just in certain cases?

Comment 15 Daniel Walsh 2011-01-24 15:55:34 UTC

No for this case only since SELinux seems to be putting this label on a file system.

svirt_image_t is for labels on disk.

svirt_t is for process labels.  In this case you are relabling a fifo_file, which I guess the kernel stores in some kind of file system.

That would remove the second to avc messages.  The first avc message will require a polciy change.

Comment 16 Daniel Walsh 2011-01-24 15:56:29 UTC

allow virtd_t self:fifo_file { manage_fifo_file_perms relabelfrom relabelto };

Comment 17 Laine Stump 2011-01-25 15:33:41 UTC

The libvirt-side fix for this has been posted upstream:

https://www.redhat.com/archives/libvir-list/2011-January/msg00991.html

Comment 18 Laine Stump 2011-01-25 15:49:50 UTC

I tried a locally patched libvirt with selinux-policy-3.7.19-68.el6 (which contains the required policy change to allow fsetfilecon on a fifo), and restores from root-squash NFS are now successful. As soon as the libvirt patches are committed upstream, I will backport them to RHEL6.1

Comment 19 Jiri Denemark 2011-01-27 14:59:27 UTC

The backported series was sent to rhvirt-patches.

Comment 20 Laine Stump 2011-01-27 15:04:40 UTC

For completeness, here are the upstream commit IDs of the patches. Note that the RHEL6.1 patches are slightly different, as the security driver code was refactored post-0.8.7. Details are in the patches sent to rhvirt-patches.

commit d89608f994025aef9809bcb224e2d71f35fb85e9
Author: Laine Stump <laine>
Date:   Sun Jan 23 16:02:42 2011 -0500

    Add a function to the security driver API that sets the label of an open fd.

commit 34a19dda1c525e3e94a7b51cd161fafba8f2fbe8
Author: Laine Stump <laine>
Date:   Sun Jan 23 16:09:40 2011 -0500

    Set SELinux context label of pipes used for qemu migration

commit c9c794b52bea18d998e9affa0c166c6bcf475348
Author: Laine Stump <laine>
Date:   Mon Jan 24 11:58:15 2011 -0500

    Manually kill gzip if restore fails before starting qemu

    (this isn't strictly necessary to fix the problem outlined in the bug
    report, but is in the same area, and is worth putting in while
    we're there)

Comment 23 Laine Stump 2011-01-31 15:14:23 UTC

What is the setting of dynamic_ownership in /etc/libvirt/qemu.conf? If dynamic_ownership gets set back to 1, you will see a failure like this. It needs to be set to 0.

Comment 24 Laine Stump 2011-01-31 20:33:47 UTC

Note also that the original problem was related to the *save image* also being on the NFS share, not just the disk. In the test in Comment 22, you only have the disk on NFS, but have put the save image on local disk.

(See the previous comment for the question - I forgot to set needinfo when I posted it.)

Comment 25 Min Zhan 2011-02-11 07:26:15 UTC

(In reply to comment #23)

I have already set dynamic_ownership = 0, and I have tried again, Now this bug is verified as Passed.

Environment :
# uname -a
Linux dhcp-65-85.nay.redhat.com 2.6.32-99.el6.x86_64 #1 SMP Fri Jan 14 10:46:00
EST 2011 x86_64 x86_64 x86_64 GNU/Linux

libvirt-0.8.7-4.el6.x86_64
kernel-2.6.32-99.el6.x86_64
qemu-kvm-0.12.1.2-2.132.el6.x86_64
selinux-policy-3.7.19-68.el6.noarch


Steps:
1. # setenforce 1

# getenforce
Enforcing

# setsebool virt_use_nfs on

# getsebool -a|grep virt_use_nfs
virt_use_nfs --> on

2. nfs server is on 10.66.65.85
# cat /etc/exports 
/var/lib/libvirt/images *(rw,root_squash)

# service nfs start

# iptables -F

3.# mount 10.66.65.85:/var/lib/libvirt/images/ /var/lib/libvirt/migrate/

# ll -d /var/lib/libvirt/images/
drwxr-xr-x. 2 qemu qemu 4096 Jan 28 06:39 /var/lib/libvirt/images/

4.# virsh list --all
 Id Name                 State
----------------------------------
  - rhel6                shut off

# virsh dumpxml rhel6
...
 <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/migrate/rhel6.img'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
    </disk>
...

# ll -d rhel6.img 
-rwxrwxrwx. 1 qemu qemu 5368709120 Feb 11 22:42 rhel6.img

5. start the guest
# virsh start rhel6
Domain rhel6 started

[root@dhcp-65-85 migrate]# virsh save rhel6 /var/lib/libvirt/migrate/rhel6.save
Domain rhel6 saved to /var/lib/libvirt/migrate/rhel6.save

[root@dhcp-65-85 migrate]# virsh restore rhel6.save 
Domain restored from rhel6.save

# virsh list --all
 Id Name                 State
----------------------------------
  4 rhel6                running

------
I also can reproduce this bug with libvirt-0.8.7-1.el6.x86_64.
# virsh restore /var/lib/libvirt/migrate/rhel6.save 
error: Failed to restore domain from /var/lib/libvirt/migrate/rhel6.save
error: cannot close file: Bad file descriptor

Comment 26 Haim 2011-03-06 20:55:05 UTC

based on the above comment and on latest test conducted on libvirt, removing need info.

Comment 27 Eric Blake 2011-03-28 22:38:27 UTC

see also bug 691499 for a manifestation of the problem when using compressed save images from a libvirtd run in an unconfined_t context

Comment 28 Vivian Bian 2011-04-20 06:53:12 UTC

tested with 
libvirt-0.8.7-18.el6.x86_64
qemu-kvm-0.12.1.2-2.158.el6.x86_64
kernel-2.6.32-131.0.1.el6.x86_64
selinux-policy-3.7.19-73.el6.noarch


was able to restore vm after hibernate when selinux is on . So keep the bug status VERIFIED

Comment 31 errata-xmlrpc 2011-05-19 13:25:29 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html