Bug 921135

Summary: qemu: could not load kernel ... Permission denied
Product: Red Hat Enterprise Linux 7 Reporter: Dave Allan <dallan>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: berrange, clalancette, cwei, dyuan, itamar, jdenemar, jforbes, laine, libvirt-maint, mprivozn, mzhan, rbalakri, rjones, veillard, virt-maint, yafu, zhanghongming
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.3.1-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 871196 Environment:
Last Closed: 2016-11-03 18:05:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1269975    
Bug Blocks: 910269, 910270, 922891    

Description Dave Allan 2013-03-13 14:40:21 UTC
+++ This bug was initially created as a clone of Bug #871196 +++

Description of problem:

This manifests itself as the libguestfs test suite
occasionally failing like this:

*stdin*:2: libguestfs: error: could not create appliance through libvirt: internal error process exited while connecting to monitor: 2012-10-29 21:27:55.614+0000: 20366: debug : virFileClose:72 : Closed fd 22
2012-10-29 21:27:55.614+0000: 20366: debug : virFileClose:72 : Closed fd 35
2012-10-29 21:27:55.630+0000: 20366: debug : virFileClose:72 : Closed fd 3
2012-10-29 21:27:55.653+0000: 20367: debug : virCommandHook:2143 : Hook is done 0
qemu: could not load kernel '/home/rjones/d/libguestfs/tmp/.guestfs-1000/kernel.20244': Permission denied
 [code=1 domain=10]

There are multiple parallel libguestfs instances running.
The best explanation of this I can come up with is that
libvirtd is unlabelling the <kernel/> element at the same
time that another qemu is trying to open it.

Version-Release number of selected component (if applicable):

libvirt-0.10.2-3.fc18.x86_64

How reproducible:

Very rare.

--- Additional comment from Richard W.M. Jones on 2013-03-13 07:12:22 EDT ---

Still happening with libguestfs upstream and recent libvirt.

libvirt-daemon-1.0.2-3.fc19.x86_64

A test program which demonstrates this and nearly always
fails for me is here:

https://github.com/libguestfs/libguestfs/blob/master/tests/parallel/test-parallel.c

Comment 1 Michal Privoznik 2013-03-13 14:47:31 UTC
I believe this can be solved with my "Keep original file label" patchset:

https://www.redhat.com/archives/libvir-list/2013-March/msg00497.html

Although I am solving this issue for DAC only for now. I am introducing reference counter on XATTR level to trace how many times did libvirt label a file, so it the label is restored only at the last restore request. So if your explanation is right, the kernel image will be:

1. labeled due to domain A startup, refCount = 1,
2. labeled due to domain B startup, refCount = 2,
3. domain A shutdown will just decrease refCount to 1,
4. shutting down domain B will decrease refCount and since it's 0 now, the original file label is restored.

Comment 3 Dave Allan 2013-07-12 20:05:50 UTC
(In reply to Michal Privoznik from comment #1)
> I believe this can be solved with my "Keep original file label" patchset:
> 
> https://www.redhat.com/archives/libvir-list/2013-March/msg00497.html

Any motion on this patchset?

Comment 4 Michal Privoznik 2013-07-15 07:48:44 UTC
Seems like we've got upstream agreement on design. However, the implementation won't be trivial (as it'll involve sanlock/virlockd and nothing involving sanlock is trivial, is it? :) ). It's on my TODO list though.

Comment 6 Michal Privoznik 2013-12-11 16:02:28 UTC
Rich,

by the time I have something useful to send upstream, I think this workaround may be sufficient for you: just copy the kernel images for each domain that about to run under different permissions.

It would be the best, if qemu would allow us to pass every file (including kernel and initrd) via FD. Then this bug would instantly go away.

Comment 7 Richard W.M. Jones 2013-12-12 08:45:26 UTC
We do have a separate copy of the kernel for every UID already.

The problem is we have a multi-threaded program running multiple
libvirt domains from different threads.  Each thread has the same
UID & PID [obviously] and thus uses the same kernel file, called
$TMPDIR/.guestfs-$UID/kernel.$PID

However libvirt labels the kernel before each domain runs and
unlabels it when the domain exits.  Since multiple domains are
starting and stopping in the same program, there is a race between
one thread labelling the kernel and another (closing domain) thread
unlabelling the kernel.

Comment 8 Michal Privoznik 2013-12-12 14:39:35 UTC
(In reply to Richard W.M. Jones from comment #7)
> We do have a separate copy of the kernel for every UID already.
> 

What about a separate kernel per each domain?

Comment 9 Richard W.M. Jones 2013-12-12 14:47:21 UTC
Sure, it's possible.  We would have to copy the kernel because we can't
just link it (as labels are per-inode), and the kernel is 5 MB and
we launch 12 domains in parallel, so that's like ~60 MB.

Comment 15 Richard W.M. Jones 2016-01-14 09:44:05 UTC
*** Bug 1298124 has been marked as a duplicate of this bug. ***

Comment 16 Richard W.M. Jones 2016-01-14 09:44:09 UTC
*** Bug 1298122 has been marked as a duplicate of this bug. ***

Comment 17 Jiri Denemark 2016-01-15 10:12:38 UTC
This specific case can be fixed even without handling bug 547546. Kernel/initrd files are essentially read-only shareable images thus should be handled in the same way. Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2016-January/msg00675.html

Comment 19 Jiri Denemark 2016-01-15 10:27:24 UTC
Pushed upstream as v1.3.1-rc2-2-g68acc70:

commit 68acc701bd449481e3206723c25b18fcd3d261b7
Author: Jiri Denemark <jdenemar>
Date:   Fri Jan 15 10:55:58 2016 +0100

    security: Do not restore kernel and initrd labels
    
    Kernel/initrd files are essentially read-only shareable images and thus
    should be handled in the same way. We already use the appropriate label
    for kernel/initrd files when starting a domain, but when a domain gets
    destroyed we would remove the labels which would make other running
    domains using the same files very unhappy.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=921135
    
    Signed-off-by: Jiri Denemark <jdenemar>

Comment 20 yafu 2016-01-29 07:05:07 UTC
Hi, jdenemar, I tried to reproduce the issue using test-virt-alignment-scan-guests.sh in the libguestfs (https://github.com/libguestfs/libguestfs/blob/master/align/test-virt-alignment-scan-guests.sh , the script will start up lots of parallel libvirt instances), and the libvirtd process hang because of deadlock caused by parallel labelling the same file.
Would you please help me to check the issue whether is same as this bug? Thank you very much.

test version:
libvirt-1.2.17-13.el7_2.2.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.5.x86_64

steps to reproduce:
1.Run test-virt-alignment-scan-guests.sh in the libguestfs:
# while true ; do ./test-virt-alignment-scan-guests.sh ; done

2.Check the output of 'virsh list' at the same time:
#watch virsh list

3.After about 20 hours, the libvirtd daemon hang and the output of 'virsh list' did not change any more.

4.Using gdb to print the libvirtd process backtrace:
(gdb)bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fec0cafed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007fec0cafec08 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7febec1d92b0) at pthread_mutex_lock.c:64
#3  0x00007fec0f491ba5 in virMutexLock (m=m@entry=0x7febec1d92b0) at util/virthread.c:89
#4  0x00007fec0f47922e in virObjectLock (anyobj=anyobj@entry=0x7febec1d92a0) at util/virobject.c:323
#5  0x00007fec0f60db1c in virSecurityManagerSetSocketLabel (mgr=0x7febec1d92a0, vm=vm@entry=0x7febd800bb20) at security/security_manager.c:431
#6  0x00007fec0f60ade3 in virSecurityStackSetSocketLabel (mgr=<optimized out>, vm=0x7febd800bb20) at security/security_stack.c:456
#7  0x00007fec0f60db29 in virSecurityManagerSetSocketLabel (mgr=0x7febec1d91f0, vm=0x7febd800bb20) at security/security_manager.c:432
#8  0x00007febf66e896e in qemuProcessHook (data=0x7febfee740f0) at qemu/qemu_process.c:3227
#9  0x00007fec0f43f8fa in virExec (cmd=cmd@entry=0x7febd8007a70) at util/vircommand.c:692
#10 0x00007fec0f442267 in virCommandRunAsync (cmd=cmd@entry=0x7febd8007a70, pid=pid@entry=0x0) at util/vircommand.c:2429
#11 0x00007fec0f442616 in virCommandRun (cmd=cmd@entry=0x7febd8007a70, exitstatus=exitstatus@entry=0x0) at util/vircommand.c:2261
#12 0x00007febf66f053d in qemuProcessStart (conn=conn@entry=0x7febe0008110, driver=driver@entry=0x7febec11d7b0, vm=<optimized out>, asyncJob=asyncJob@entry=0, 
    migrateFrom=migrateFrom@entry=0x0, stdin_fd=stdin_fd@entry=-1, stdin_path=stdin_path@entry=0x0, snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, 
    flags=flags@entry=5) at qemu/qemu_process.c:4859
#13 0x00007febf673d7df in qemuDomainCreateXML (conn=0x7febe0008110, xml=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:1768
#14 0x00007fec0f520a11 in virDomainCreateXML (conn=0x7febe0008110, 
    xmlDesc=0x7febd8000cb0 "<?xml version=\"1.0\"?>\n<domain type=\"kvm\" xmlns:qemu=\"http://libvirt.org/schemas/domain/qemu/1.0\">\n  <name>guestfs-fxyf5s4hmc0iipn6</name>\n  <memory unit=\"MiB\">500</memory>\n  <currentMemory unit=\"MiB\">"..., flags=2) at libvirt-domain.c:180
#15 0x00007fec1018411a in remoteDispatchDomainCreateXML (server=0x7fec11e1bb60, msg=0x7fec11e362f0, ret=0x7febd800d0b0, args=0x7febd80087f0, rerr=0x7febfee74c30, client=0x7fec11e3e320)
    at remote_dispatch.h:3754
#16 remoteDispatchDomainCreateXMLHelper (server=0x7fec11e1bb60, client=0x7fec11e3e320, msg=0x7fec11e362f0, rerr=0x7febfee74c30, args=0x7febd80087f0, ret=0x7febd800d0b0)
    at remote_dispatch.h:3732
#17 0x00007fec0f59c3c2 in virNetServerProgramDispatchCall (msg=0x7fec11e362f0, client=0x7fec11e3e320, server=0x7fec11e1bb60, prog=0x7fec11e30000) at rpc/virnetserverprogram.c:437
#18 virNetServerProgramDispatch (prog=0x7fec11e30000, server=server@entry=0x7fec11e1bb60, client=0x7fec11e3e320, msg=0x7fec11e362f0) at rpc/virnetserverprogram.c:307
#19 0x00007fec0f59763d in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7fec11e1bb60) at rpc/virnetserver.c:135
#20 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fec11e1bb60) at rpc/virnetserver.c:156
#21 0x00007fec0f4924f5 in virThreadPoolWorker (opaque=opaque@entry=0x7fec11e10de0) at util/virthreadpool.c:145
#22 0x00007fec0f491a18 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
#23 0x00007fec0cafcdc5 in start_thread (arg=0x7febfee75700) at pthread_create.c:308
#24 0x00007fec0c82a1cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 21 Jiri Denemark 2016-01-29 08:52:11 UTC
See bug 1302965

Comment 22 Jiri Denemark 2016-01-29 10:01:21 UTC
Actually, this backtrace is different from 1302965; this one looks like you were lucky and fork() was called when another thread had security manager locked. Please, file a separate bug for that. Don't forget to include debug logs from libvirtd.

Comment 23 yafu 2016-01-29 11:26:32 UTC
(In reply to Jiri Denemark from comment #22)
> Actually, this backtrace is different from 1302965; this one looks like you
> were lucky and fork() was called when another thread had security manager
> locked. Please, file a separate bug for that. Don't forget to include debug
> logs from libvirtd.


Thanks for your quick reply. 
File another bug for the issue described in comment #20 :
https://bugzilla.redhat.com/show_bug.cgi?id=1303031

Comment 25 yafu 2016-04-13 10:33:21 UTC
Hi, Jiri,
 I tried to verify this bug with libvirt-1.3.3-1.el7.x86_64. Would you please help to check the steps below are enough for verifying this bug? Thanks a lot.

Steps:
1.Define a guest with direct kernel boot:
#virsh edit rhel7
...
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <kernel>/images/vmlinuz</kernel>
    <initrd>/images/initrd.img</initrd>
    <cmdline>method=http://download.englab.nay.redhat.com/pub/rhel/rel-eng/RHEL-7.2-20150820.0/compose/Server/x86_64/os/</cmdline>
    <boot dev='hd'/>
  </os>
...

2.Start the guest:
#virsh start rhel7

3.Check the labels of the kernel and initrd file:
#ll -Z /images/vmlinuz
-rw-rw-r--. qemu qemu system_u:object_r:virt_content_t:s0 /images/vmlinuz
#ll -Z /images/initrd.img
-rw-rw-r--. qemu qemu system_u:object_r:virt_content_t:s0 /images/initrd.img

4.Destroy the guest:
 #virsh destroy rhel7

5.Check the label of the kernel and initrd file again, could see the labels of the kernel and initrd file is not restored:
#ll -Z /images/vmlinuz
-rw-rw-r--. qemu qemu system_u:object_r:virt_content_t:s0 /images/vmlinuz
#ll -Z /images/initrd.img
-rw-rw-r--. qemu qemu system_u:object_r:virt_content_t:s0 /images/initrd.img

As above, the labels of the kernel and initrd file are not restored when a domain gets destroyed.

Comment 26 yafu 2016-04-15 03:13:10 UTC
Reproduce this bug with libvirt-1.2.17-13.el7.x86_64.

With steps as comment 25, when the guest gets destroyed, the labels of kernel and initrd file are restored:
#ll -Z /images/vmlinuz 
-rw-rw-r--. root root system_u:object_r:default_t:s0   /images/vmlinuz
#ll -Z /images/initrd.img 
-rw-rw-r--. root root system_u:object_r:default_t:s0   /images/initrd.img

Comment 27 Jiri Denemark 2016-04-29 14:38:14 UTC
Yes, the steps from comment 25 are sufficient for verifying the bug.

Comment 28 yafu 2016-08-31 03:01:32 UTC
Verified pass with libvirt-2.0.0-6.el7.x86_64.

Comment 30 errata-xmlrpc 2016-11-03 18:05:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html