Bug 984409
Summary: | Nova file injection failed: link: [kernel]: Operation not permitted | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Attila Fazekas <afazekas> | ||||||||
Component: | openstack-nova | Assignee: | Mark McLoughlin <markmc> | ||||||||
Status: | CLOSED UPSTREAM | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | unspecified | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 19 | CC: | afazekas, akscram, alexander.sakhnov, apevec, asalkeld, berrange, bfilippov, breu, dprince, Jan.van.Eldik, jonathansteffan, jose.castro.leon, markmc, mbooth, mlvov, mmagr, ndipanov, pbrady, p, rbryant, rjones, rkukura, virt-maint | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2013-10-16 09:44:42 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Attila Fazekas
2013-07-15 07:58:08 UTC
Adding Padraig in case he's got any ideas. So the key problem here are the permissions on this directory: $ ls -il /var/tmp/.guestfs-1000 total 854256 658310 -rwxr-xr-x. 1 afazekas libvirtd 64 Jul 14 12:24 checksum 658309 -rw-r--r--. 2 root root 1282048 Jul 14 11:21 initrd 658309 -rw-r--r--. 2 root root 1282048 Jul 14 11:21 initrd.17636 658308 -rw-r--r--. 1 root root 5058520 Jul 14 11:21 kernel 658311 -rw-r--r--. 2 afazekas qemu 4294967296 Jul 14 11:21 root 658311 -rw-r--r--. 2 afazekas qemu 4294967296 Jul 14 11:21 root.17636 $ ls -ild /var/tmp/.guestfs-1000 658305 drwxr-xr-x. 2 afazekas libvirtd 4096 Jul 14 12:49 /var/tmp/.guestfs-1000 (Note UID 1000 = afazekas) The error is: link: /var/tmp/.guestfs-1000/kernel /var/tmp/.guestfs-1000/kernel.23445: Operation not permitted This directory stores the libguestfs appliance cache for UID 1000. Because libvirt may run qemu as a different user (especially when libvirt runs as root) it may chown certain files such as kernel & initrd so that qemu is able to access them. That's my best explanation for how those files managed to end up owned by root. I'm not sure if libvirt restores permissions afterwards (but even if it didn't, there might still be a problem with parallel handles). Is there a UID transition happening in the nova process itself? GID "libvirtd" is a strange one. I do not have a libvirtd group on my system. I think what happened here is that the fix for bug 913345 (https://github.com/openstack/nova/commit/014499acf5d6d6a557c9415aa49c536817a02a0a) causes libguestfs to use the libvirt URI qemu:///system. This is an unusual configuration from the libguestfs point of view because it means that we're effectively "running libvirt as root" [to use a shortcut .. this is not an exact analogy]. What this means is that libvirt will chown the kernel and initrd to root, and that will cause an error when a subsequent libguestfs instance tries to make the hard link[1] to the kernel. However that can't be the whole story, because making a hard link to a root-owned file is permissible, at least on my ordinary ext4 filesystem: $ ll -a total 624 drwxrwxr-x. 2 rjones rjones 4096 Jul 15 12:16 . drwxrwxrwt. 63 root root 24576 Jul 15 12:16 .. -r--r--r--. 1 root root 603423 Jul 15 12:16 foobar $ ln foobar baz So perhaps there is something else going on. SELinux? Reporter: are there any SELinux AVCs when this happens? [1] Relevant libguestfs code: https://github.com/libguestfs/libguestfs/blob/ae78381287771a781f939f26a414fc8cfdc05fd6/src/appliance.c#L68 Assumption: https://lwn.net/Articles/482544/ didn't go upstream. It's not in the upstream kernel, nor in Fedora Rawhide kernel. Created attachment 773725 [details] qemu-system.py I am finally able to reproduce the problem using the attached script and the following steps. (1) Prerequistes: libguestfs >= 1.22, Fedora >= 19 Install python-libguestfs. (2) Download attached script, chmod +x qemu-system.py (3) You may need to enable libvirt management access by your current non-root user. See the instructions here: https://bugs.launchpad.net/devstack/+bug/1086784 (4) Run the script *as non-root*. You may need to repeat step (4) several times before you see the error. You may see the following error, which should be ignored: RuntimeError: could not create appliance through libvirt: internal error process exited while connecting to monitor: qemu-system-x86_64: -chardev socket,id=charserial0,path=/tmp/libguestfslVQXad/console.sock: Failed to connect to socket: Permission denied That's a different bug (bug 913774). The correct error you are looking for is: RuntimeError: link: /var/tmp/.guestfs-1000/kernel /var/tmp/.guestfs-1000/kernel.7570: Operation not permitted To diagnose further, run libguestfs-test-tool (again, NON-root), and you should see the same error with a lot more debug. There are no SELinux errors, so clearing NEEDINFO flag. Attila, please try the following patch (you can probably just find and patch the disk/vfs/guestfs.py file directly): diff --git a/nova/virt/disk/vfs/guestfs.py b/nova/virt/disk/vfs/guestfs.py index 29b3965..f735583 100644 --- a/nova/virt/disk/vfs/guestfs.py +++ b/nova/virt/disk/vfs/guestfs.py @@ -98,9 +98,7 @@ class VFSGuestFS(vfs.VFS): try: self.handle.add_drive_opts(self.imgfile, format=self.imgfmt) - if self.handle.get_attach_method() == 'libvirt': - libvirt_url = 'libvirt:' + libvirt_driver.LibvirtDriver.uri() - self.handle.set_attach_method(libvirt_url) + self.handle.set_attach_method("appliance") self.handle.launch() self.setup_os() After applying the patch and before restarting nova, do: sudo rm -rf /var/tmp/.guestfs-* just to make sure that no old broken cache directory exists. The above patch lets the machine boot and reaching the ACTIVE state. Returning to the original bug which we thought we were fixing: https://bugzilla.redhat.com/show_bug.cgi?id=913345 There were originally two problems: (a) In Fedora 18+, we switched to the libvirt backend. This uses a libvirt URI of NULL which for non-root users is the same as using qemu:///session. However libvirt cannot use qemu:///session if the user's $HOME directory does not exist. You could consider this to be a bug in libvirt (it needs a home directory) or in nova (it doesn't create a home directory). (b) libguestfs fails to work if you set the backend to libvirt:qemu:///system. You will see an error about 'console.sock' (bug 913774). This is a bug in libvirt. The fix that was made to nova was to set the backend to (effectively) libvirt:qemu:///system. Issue (b) seems to be disguised (see comment 5). On reflection I don't think the fix to nova is correct because it's "running libvirt as root", adding the temporary libguestfs appliance to the global libvirt guest space. (Although it's "running libvirt as root", qemu itself actually runs as qemu.qemu and sVirt is still used, so it's not a complete loss, but it does greatly complicate things that we have nova, libvirtd and qemu all running as 3 separate users) I think we should stick to running libguestfs, libvirt and qemu as the same user as nova, which means going back to either: (1) Fix problem (a) and revert the nova patch, or: (2) Set attach_method to "appliance", as in comment 6. Note that (2) will disable sVirt, which is not particularly good, so if we go for (2) as a workaround, we should have a plan for doing (1). I also found out why the hard link behaviour is different between Fedora 18 & Fedora 19+. It's because Fedora has in its infinite wisdom decided to break hard linking: http://danwalsh.livejournal.com/64493.html Yes, the hard link's unexpected behavior was surprising to me as well. Do you see any chance, to the default hard-link behavior will be restored to the expected behavior ? BTW on F18 I am using openstack with the /etc/libvirt/qemu.conf: user=root group=root Settings. Issuing the $ sudo sysctl -w fs.protected_hardlinks=0 and changing the qemu user to to same as the service user ('afazekas', normally 'nova') makes the file injection working without the nova patch from the comment #6. At the first look qemu.user == nova.user && nova.user != 'root' is OK. ---------- The nova user has a home directory if someone wants to configure the old style migration. Basically one nova service uses the scp for copy the vm image to an another machine. --------- Why those links has to be hard ? I guess the permission issue would not happen if they were soft. (Looks like the 'new' hard-link behavior is influenced by a SELinux less distro.) (In reply to Attila Fazekas from comment #10) > Yes, the hard link's unexpected behavior was surprising to me as well. > > Do you see any chance, to the default hard-link behavior will be restored to > the expected behavior ? Given the response on Fedora devel, it seems unlikely. There's no good way to "fix" libguestfs to deal with this kernel API change. We need to revert the nova fix and do the changes in comment 8. Therefore I'm reassigning this bug to nova. I'll come up with a more complete patch in a minute. Created attachment 774377 [details]
0001-nova-Force-the-attach-method-to-be-appliance.patch
Patch version #1. This is a workaround which forces the
attach method to be appliance.
Created attachment 774378 [details]
0001-nova-Don-t-change-the-default-attach-method.patch
This is alternative patch #2. No workarounds, just don't
fiddle with attach-method at all.
I've filed a Gerrit review for the second patch here: https://review.openstack.org/#/c/40222/ We might consider carrying the first patch just in Fedora, but I think we should not carry any hacks/workarounds at all, and if there are any bugs in libvirt we should just fix them instead. Re comment 8 point a) The nova home directory should be in place? I just checked on a compute node: $ ls -lad ~nova drwxr-xr-x. 10 nova nova 4096 Mar 14 23:45 /var/lib/nova (In reply to Pádraig Brady from comment #16) > Re comment 8 point a) > The nova home directory should be in place? > I just checked on a compute node: > > $ ls -lad ~nova > drwxr-xr-x. 10 nova nova 4096 Mar 14 23:45 /var/lib/nova Yeah, turns out the home directory (lack of or otherwise) is not the problem here. The problem is it's wrong to override the libvirt/backend URI (see my patch in comment 14). For some reason which I don't understand the gerrit review failed tests, although they work fine for me. OK sorry, I thought you were implying that we'd need some packaging change if we were to change the URI as you suggest. I've added dprince to the upstream review as he was involved with the orig change. Could you click the "restore change" button in the web gui and we can get a recheck done. BTW I also created and linked an upstream bug and would suggest that you update the change to reference that instead like: Fixes bug: 1220102 thanks Patch v2 (just rebased) uploaded to gerrit: https://review.openstack.org/#/c/40222/ As the fix for this was committed, I'm closing the bug. If you see issues with libguestfs & nova, please open a new bug describing the problem and including all the usual debugging output: http://libguestfs.org/guestfs-faq.1.html#debug |