Bug 984409

Summary: Nova file injection failed: link: [kernel]: Operation not permitted
Product: [Fedora] Fedora Reporter: Attila Fazekas <afazekas>
Component: openstack-novaAssignee: Mark McLoughlin <markmc>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 19CC: afazekas, akscram, alexander.sakhnov, apevec, asalkeld, berrange, bfilippov, breu, dprince, Jan.van.Eldik, jonathansteffan, jose.castro.leon, markmc, mbooth, mlvov, mmagr, ndipanov, pbrady, p, rbryant, rjones, rkukura, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-16 09:44:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
qemu-system.py
none
0001-nova-Force-the-attach-method-to-be-appliance.patch
none
0001-nova-Don-t-change-the-default-attach-method.patch none

Description Attila Fazekas 2013-07-15 07:58:08 UTC
I tried the below command which triggered the permission issue:
$ nova boot MyServer --flavor 1 --image cirros-0.3.1-x86_64-uec --file=test.txt=/etc/passwd

I see the following in the nova compute log:
2013-07-14 11:33:02.730 ERROR nova.compute.manager [req-4a4295a6-4576-45a3-9ded-c659ef258b54 admin admin] [instance: c6074884-c22f-4f85-bf3d-beb8e6afb6c9] Error: ['Traceback (most recent call last):\n', '  File "/opt/stack/new/nova/nova/compute/manager.py", line 995, in _build_instance\n    set_access_ip=set_access_ip)\n', '  File "/opt/stack/new/nova/nova/compute/manager.py", line 1249, in _spawn\n    LOG.exception(_(\'Instance failed to spawn\'), instance=instance)\n', '  File "/opt/stack/new/nova/nova/compute/manager.py", line 1245, in _spawn\n    block_device_info)\n', '  File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1594, in spawn\n    admin_pass=admin_password)\n', '  File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2001, in _create_image\n    instance=instance)\n', '  File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1995, in _create_image\n    mandatory=(\'files\',))\n', '  File "/opt/stack/new/nova/nova/virt/disk/api.py", line 294, in inject_data\n    fs.setup()\n', '  File "/opt/stack/new/nova/nova/virt/disk/vfs/guestfs.py", line 114, in setup\n    {\'imgfile\': self.imgfile, \'e\': e})\n', 'NovaException: Error mounting /opt/stack/data/nova/instances/c6074884-c22f-4f85-bf3d-beb8e6afb6c9/disk with libguestfs (link: /var/tmp/.guestfs-1000/kernel /var/tmp/.guestfs-1000/kernel.17636: Operation not permitted)\n']

$ ls -il /var/tmp/.guestfs-1000
total 854256
658310 -rwxr-xr-x. 1 afazekas libvirtd         64 Jul 14 12:24 checksum
658309 -rw-r--r--. 2 root     root        1282048 Jul 14 11:21 initrd
658309 -rw-r--r--. 2 root     root        1282048 Jul 14 11:21 initrd.17636
658308 -rw-r--r--. 1 root     root        5058520 Jul 14 11:21 kernel
658311 -rw-r--r--. 2 afazekas qemu     4294967296 Jul 14 11:21 root
658311 -rw-r--r--. 2 afazekas qemu     4294967296 Jul 14 11:21 root.17636
$ ls -ild /var/tmp/.guestfs-1000
658305 drwxr-xr-x. 2 afazekas libvirtd 4096 Jul 14 12:49 /var/tmp/.guestfs-1000

All OpenStack component expected to run as not root user and all of them has minimal filtering on the sudo commands, but the chown is allowed.

The live disk images owner is the service user (now it is 'afazekas', normally it is 'nova').
(Without the file injection arguments the libguestfs code path does not reached.)

ls -li /opt/stack/data/nova/instances/3c45c00f-06a2-4174-9a6a-5ac4e252e1ff
total 18680
407254 -rw-rw----. 1 afazekas qemu        23058 Jul 14 11:41 console.log
407267 -rw-r--r--. 1 afazekas qemu     10485760 Jul 14 11:42 disk
407261 -rw-rw-r--. 1 afazekas qemu      4955792 Jul 14 11:37 kernel
407269 -rw-rw-r--. 1 afazekas libvirtd     1571 Jul 14 11:37 libvirt.xml
407266 -rw-rw-r--. 1 afazekas qemu      3714968 Jul 14 11:37 ramdisk

It is based on cirros-uec image and it has 3 parts.
- initramfs image
- kernel
- An empty root filesystem

I guess the permissions also could be set by the libvirtd. 
Do you think is it possible libvirtd was the process who initiated the permission change ?

At the moment, I do not know why the libguestfs tries to create a hard link.
I have a write permission on the directory and write permission on the real disk file (root).
A soft link would be possible.

Now, the openstack-nova-compute just tried to add a file before boot.

Why the 'kernel' needs to be hard linked ?
Is the libguestfs ever changes the initrd or the kernel images ?

Looks like the hard link is succeeded with the initrd, at least it has a hard linked pair.


After the permission issue the libguestfs-test-tool says:
     ************************************************************
     *                    IMPORTANT NOTICE
     *
     * When reporting bugs, include the COMPLETE, UNEDITED
     * output below in your bug report.
     *
     ************************************************************
PATH=/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/afazekas/.local/bin:/home/afazekas/bin
SELinux: Permissive
library version: 1.22.4fedora=19,release=2.fc19,libvirt
guestfs_get_append: (null)
guestfs_get_backend: libvirt
guestfs_get_autosync: 1
guestfs_get_cachedir: /var/tmp
guestfs_get_direct: 0
guestfs_get_memsize: 500
guestfs_get_network: 0
guestfs_get_path: /usr/lib64/guestfs
guestfs_get_pgroup: 0
guestfs_get_program: libguestfs-test-tool
guestfs_get_qemu: /usr/bin/qemu-kvm
guestfs_get_recovery_proc: 1
guestfs_get_selinux: 0
guestfs_get_smp: 1
guestfs_get_tmpdir: /tmp
guestfs_get_trace: 0
guestfs_get_verbose: 1
host_cpu: x86_64
Launching appliance, timeout set to 600 seconds.
libguestfs: launch: backend=libvirt
libguestfs: launch: tmpdir=/tmp/libguestfsnbkYb6
libguestfs: launch: umask=0002
libguestfs: launch: euid=1000
libguestfs: libvirt version = 1000005 (1.0.5)
libguestfs: [00001ms] connect to libvirt
libguestfs: opening libvirt handle: URI = NULL, auth = virConnectAuthPtrDefault, flags = 0
libguestfs: successfully opened libvirt handle: conn = 0x7fe95e94fbc0
libguestfs: [02888ms] get libvirt capabilities
libguestfs: [02900ms] parsing capabilities XML
libguestfs: [02902ms] build appliance
libguestfs: command: run: supermin-helper
libguestfs: command: run: \ --verbose
libguestfs: command: run: \ -f checksum
libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d
libguestfs: command: run: \ x86_64
supermin helper [00000ms] whitelist = (not specified), host_cpu = x86_64, kernel = (null), initrd = (null), appliance = (null)
supermin helper [00000ms] inputs[0] = /usr/lib64/guestfs/supermin.d
checking modpath /lib/modules/3.9.5-301.fc19.x86_64 is a directory
picked vmlinuz-3.9.5-301.fc19.x86_64 because modpath /lib/modules/3.9.5-301.fc19.x86_64 exists
checking modpath /lib/modules/3.9.9-301.fc19.x86_64 is a directory
picked vmlinuz-3.9.9-301.fc19.x86_64 because modpath /lib/modules/3.9.9-301.fc19.x86_64 exists
supermin helper [00001ms] finished creating kernel
supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d
supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/base.img
supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/daemon.img
supermin helper [00001ms] visiting /usr/lib64/guestfs/supermin.d/hostfiles
supermin helper [00044ms] visiting /usr/lib64/guestfs/supermin.d/init.img
supermin helper [00044ms] visiting /usr/lib64/guestfs/supermin.d/udev-rules.img
supermin helper [00044ms] adding kernel modules
supermin helper [00067ms] finished creating appliance
libguestfs: checksum of existing appliance: f70642933db4d0be1ed3d2995f2028e98b9bd895dbbbf5494e0fb636d8694b52
libguestfs: error: link: /var/tmp/.guestfs-1000/kernel /var/tmp/.guestfs-1000/kernel.23445: Operation not permitted
libguestfs-test-tool: failed to launch appliance
libguestfs: closing guestfs handle 0x7fe95e94f550 (state 0)
libguestfs: command: run: rm
libguestfs: command: run: \ -rf /tmp/libguestfsnbkYb6


Another thing what is not clear to me:
The nova-compute is a single process (thread group), so the thread group leader PID will be the same in his lifetime.
The 'initrd','kernel','root' names does not seams to be unique, the '/var/tmp/.guestfs-1000' just contains my user ID.
Is the currently used directory structure expected to be parallel safe ? 


The system is running in a VM and it is using the qemu softemu inside the VM,
can I expect any difference on a physical machine or with kvm nested guest. ?

Version: libguestfs-1.22.4-2.fc19

Comment 1 Richard W.M. Jones 2013-07-15 08:04:16 UTC
Adding Padraig in case he's got any ideas.

Comment 2 Richard W.M. Jones 2013-07-15 08:10:43 UTC
So the key problem here are the permissions on this directory:

$ ls -il /var/tmp/.guestfs-1000
total 854256
658310 -rwxr-xr-x. 1 afazekas libvirtd         64 Jul 14 12:24 checksum
658309 -rw-r--r--. 2 root     root        1282048 Jul 14 11:21 initrd
658309 -rw-r--r--. 2 root     root        1282048 Jul 14 11:21 initrd.17636
658308 -rw-r--r--. 1 root     root        5058520 Jul 14 11:21 kernel
658311 -rw-r--r--. 2 afazekas qemu     4294967296 Jul 14 11:21 root
658311 -rw-r--r--. 2 afazekas qemu     4294967296 Jul 14 11:21 root.17636
$ ls -ild /var/tmp/.guestfs-1000
658305 drwxr-xr-x. 2 afazekas libvirtd 4096 Jul 14 12:49 /var/tmp/.guestfs-1000

(Note UID 1000 = afazekas)

The error is:

link: /var/tmp/.guestfs-1000/kernel /var/tmp/.guestfs-1000/kernel.23445: Operation not permitted

This directory stores the libguestfs appliance cache for UID 1000.  Because
libvirt may run qemu as a different user (especially when libvirt runs as
root) it may chown certain files such as kernel & initrd so that qemu is
able to access them.  That's my best explanation for how those files managed
to end up owned by root.

I'm not sure if libvirt restores permissions afterwards (but even if it didn't,
there might still be a problem with parallel handles).

Is there a UID transition happening in the nova process itself?

GID "libvirtd" is a strange one.  I do not have a libvirtd group on my system.

Comment 3 Richard W.M. Jones 2013-07-15 11:18:49 UTC
I think what happened here is that the fix for
bug 913345 (https://github.com/openstack/nova/commit/014499acf5d6d6a557c9415aa49c536817a02a0a)
causes libguestfs to use the libvirt URI qemu:///system.

This is an unusual configuration from the libguestfs point
of view because it means that we're effectively "running
libvirt as root" [to use a shortcut .. this is not an
exact analogy].

What this means is that libvirt will chown the kernel
and initrd to root, and that will cause an error when a
subsequent libguestfs instance tries to make the hard
link[1] to the kernel.

However that can't be the whole story, because making a
hard link to a root-owned file is permissible, at least
on my ordinary ext4 filesystem:

$ ll -a
total 624
drwxrwxr-x.  2 rjones rjones   4096 Jul 15 12:16 .
drwxrwxrwt. 63 root   root    24576 Jul 15 12:16 ..
-r--r--r--.  1 root   root   603423 Jul 15 12:16 foobar
$ ln foobar baz

So perhaps there is something else going on.  SELinux?

Reporter: are there any SELinux AVCs when this happens?

[1] Relevant libguestfs code:

https://github.com/libguestfs/libguestfs/blob/ae78381287771a781f939f26a414fc8cfdc05fd6/src/appliance.c#L68

Comment 4 Richard W.M. Jones 2013-07-15 11:34:51 UTC
Assumption: https://lwn.net/Articles/482544/
didn't go upstream.  It's not in the upstream kernel, nor
in Fedora Rawhide kernel.

Comment 5 Richard W.M. Jones 2013-07-15 12:57:54 UTC
Created attachment 773725 [details]
qemu-system.py

I am finally able to reproduce the problem using the attached
script and the following steps.

(1) Prerequistes: libguestfs >= 1.22, Fedora >= 19
    Install python-libguestfs.

(2) Download attached script, chmod +x qemu-system.py

(3) You may need to enable libvirt management access by your
    current non-root user.  See the instructions here:
    https://bugs.launchpad.net/devstack/+bug/1086784

(4) Run the script *as non-root*.

You may need to repeat step (4) several times before you see
the error.

You may see the following error, which should be ignored:

RuntimeError: could not create appliance through libvirt: internal error process exited while connecting to monitor: qemu-system-x86_64: -chardev socket,id=charserial0,path=/tmp/libguestfslVQXad/console.sock: Failed to connect to socket: Permission denied

That's a different bug (bug 913774).

The correct error you are looking for is:

RuntimeError: link: /var/tmp/.guestfs-1000/kernel /var/tmp/.guestfs-1000/kernel.7570: Operation not permitted

To diagnose further, run libguestfs-test-tool (again, NON-root),
and you should see the same error with a lot more debug.

There are no SELinux errors, so clearing NEEDINFO flag.

Comment 6 Richard W.M. Jones 2013-07-15 13:02:41 UTC
Attila, please try the following patch (you can probably
just find and patch the disk/vfs/guestfs.py file directly):

diff --git a/nova/virt/disk/vfs/guestfs.py b/nova/virt/disk/vfs/guestfs.py
index 29b3965..f735583 100644
--- a/nova/virt/disk/vfs/guestfs.py
+++ b/nova/virt/disk/vfs/guestfs.py
@@ -98,9 +98,7 @@ class VFSGuestFS(vfs.VFS):
 
         try:
             self.handle.add_drive_opts(self.imgfile, format=self.imgfmt)
-            if self.handle.get_attach_method() == 'libvirt':
-                libvirt_url = 'libvirt:' + libvirt_driver.LibvirtDriver.uri()
-                self.handle.set_attach_method(libvirt_url)
+            self.handle.set_attach_method("appliance")
             self.handle.launch()
 
             self.setup_os()

After applying the patch and before restarting nova, do:

sudo rm -rf /var/tmp/.guestfs-*

just to make sure that no old broken cache directory exists.

Comment 7 Attila Fazekas 2013-07-15 16:49:56 UTC
The above patch lets the machine boot and reaching the ACTIVE state.

Comment 8 Richard W.M. Jones 2013-07-15 17:12:29 UTC
Returning to the original bug which we thought we were
fixing:
https://bugzilla.redhat.com/show_bug.cgi?id=913345

There were originally two problems:

(a) In Fedora 18+, we switched to the libvirt backend.
This uses a libvirt URI of NULL which for non-root users
is the same as using qemu:///session.  However libvirt
cannot use qemu:///session if the user's $HOME directory
does not exist.  You could consider this to be a bug in
libvirt (it needs a home directory) or in nova (it doesn't
create a home directory).

(b) libguestfs fails to work if you set the backend to
libvirt:qemu:///system.  You will see an error about 'console.sock'
(bug 913774).  This is a bug in libvirt.

The fix that was made to nova was to set the backend to
(effectively) libvirt:qemu:///system.  Issue (b) seems
to be disguised (see comment 5).  On reflection I don't
think the fix to nova is correct because it's "running
libvirt as root", adding the temporary libguestfs appliance
to the global libvirt guest space.  (Although it's
"running libvirt as root", qemu itself actually runs as
qemu.qemu and sVirt is still used, so it's not a complete
loss, but it does greatly complicate things that we have
nova, libvirtd and qemu all running as 3 separate users)

I think we should stick to running libguestfs, libvirt and
qemu as the same user as nova, which means going back to
either:

(1) Fix problem (a) and revert the nova patch, or:

(2) Set attach_method to "appliance", as in comment 6.

Note that (2) will disable sVirt, which is not particularly
good, so if we go for (2) as a workaround, we should have a
plan for doing (1).

Comment 9 Richard W.M. Jones 2013-07-15 17:37:03 UTC
I also found out why the hard link behaviour is
different between Fedora 18 & Fedora 19+.  It's because
Fedora has in its infinite wisdom decided to break
hard linking:

http://danwalsh.livejournal.com/64493.html

Comment 10 Attila Fazekas 2013-07-15 18:22:04 UTC
Yes, the hard link's unexpected behavior was surprising to me as well.

Do you see any chance, to the default hard-link behavior will be restored to the expected behavior ?

BTW on F18 I am using openstack with the
/etc/libvirt/qemu.conf:
user=root
group=root

Settings.

Comment 11 Attila Fazekas 2013-07-15 20:37:59 UTC
Issuing the
$ sudo sysctl -w fs.protected_hardlinks=0

and changing the qemu user to to same as the service user ('afazekas', normally 'nova') makes the file injection working without the nova patch from the comment #6.

At the first look 
 qemu.user == nova.user && nova.user != 'root'
is OK.

----------

The nova user has a home directory if someone wants to configure the old style migration. Basically one nova service uses the scp for copy the vm image to an another machine.

---------

Why those links has to be hard ?
I guess the permission issue would not happen if they were soft.

(Looks like the 'new' hard-link behavior is influenced by a SELinux less distro.)

Comment 12 Richard W.M. Jones 2013-07-16 15:21:46 UTC
(In reply to Attila Fazekas from comment #10)
> Yes, the hard link's unexpected behavior was surprising to me as well.
> 
> Do you see any chance, to the default hard-link behavior will be restored to
> the expected behavior ?

Given the response on Fedora devel, it seems unlikely.  There's
no good way to "fix" libguestfs to deal with this kernel API
change.

We need to revert the nova fix and do the changes in comment 8.
Therefore I'm reassigning this bug to nova.  I'll come up with
a more complete patch in a minute.

Comment 13 Richard W.M. Jones 2013-07-16 15:33:51 UTC
Created attachment 774377 [details]
0001-nova-Force-the-attach-method-to-be-appliance.patch

Patch version #1.  This is a workaround which forces the
attach method to be appliance.

Comment 14 Richard W.M. Jones 2013-07-16 15:35:04 UTC
Created attachment 774378 [details]
0001-nova-Don-t-change-the-default-attach-method.patch

This is alternative patch #2.  No workarounds, just don't
fiddle with attach-method at all.

Comment 15 Richard W.M. Jones 2013-08-05 13:32:43 UTC
I've filed a Gerrit review for the second patch here:

https://review.openstack.org/#/c/40222/

We might consider carrying the first patch just in Fedora,
but I think we should not carry any hacks/workarounds
at all, and if there are any bugs in libvirt we should
just fix them instead.

Comment 16 Pádraig Brady 2013-09-02 23:38:27 UTC
Re comment 8 point a)
The nova home directory should be in place?
I just checked on a compute node:

$ ls -lad ~nova
drwxr-xr-x. 10 nova nova 4096 Mar 14 23:45 /var/lib/nova

Comment 17 Richard W.M. Jones 2013-09-03 07:28:32 UTC
(In reply to Pádraig Brady from comment #16)
> Re comment 8 point a)
> The nova home directory should be in place?
> I just checked on a compute node:
> 
> $ ls -lad ~nova
> drwxr-xr-x. 10 nova nova 4096 Mar 14 23:45 /var/lib/nova

Yeah, turns out the home directory (lack of or otherwise) is
not the problem here.

The problem is it's wrong to override the libvirt/backend URI
(see my patch in comment 14).

For some reason which I don't understand the gerrit review
failed tests, although they work fine for me.

Comment 18 Pádraig Brady 2013-09-03 08:36:06 UTC
OK sorry, I thought you were implying that we'd need some packaging change if we were to change the URI as you suggest. I've added dprince to the upstream review as he was involved with the orig change.

Could you click the "restore change" button in the web gui and we can get a recheck done. BTW I also created and linked an upstream bug and would suggest that you update the change to reference that instead like:
Fixes bug: 1220102

thanks

Comment 19 Richard W.M. Jones 2013-09-03 08:56:20 UTC
Patch v2 (just rebased) uploaded to gerrit:

https://review.openstack.org/#/c/40222/

Comment 20 Richard W.M. Jones 2013-10-16 09:44:42 UTC
As the fix for this was committed, I'm closing the bug.

If you see issues with libguestfs & nova, please open a
new bug describing the problem and including all the
usual debugging output:
http://libguestfs.org/guestfs-faq.1.html#debug