Created attachment 697455 [details] vm_start_error screen shot After upgrading to vdsm 4.9.6 NFS images permissions are 440 and vm cannot start (qemu-kvm error "could not open disk...permission denied). Changing 660 permission to the image file resolves the problem. Unfortunately didn't reproduce in home. Setup (sfdc#00784032): - NFS Data Center; - upgrade from 2.2 using migration tool to 3.0. [at this point customer also tried to upgrade to 3.1, but rhevm upgrade failed (he had 2.2 DC) and rolled back unsuccessfully [BZ#909449] and other problems. fixed now.] - create new cluster with 3.0 compatibility mode and attach there RHEV-H based RHEL6.3 - vdsm-4.9.6-45.2.el6_3.x86_64, 20130129.0.el6_3 - try starting the VM on the new cluster - fail with "permission denied error" (vm_start_error.jpg attached) [Q#1] Why the permissions became 440? When? On which log can we trace it? Additional Info: Similar problem (does not seem like same problem, but same resolution) (sfdc#00776680): - Local storage - RHEV-M upgraded from 3.0 to 3.1 - vdsm upgraded from versions 4.9-xxx to 4.9.6-xx - Red Hat Enterprise Linux (RHEL) - upgrade from 6.2 to 6.3 - customer had 755 permissions on its images, changing to 660 resolved the problem here. We don't know how the customer ended up with 755 permissions initially. We suspect that going from vdsm version 4.9 to 4.9.6 the userid used by vdsm in this case was now 'kvm', whereas before it was 'vdsm', and therefore with 755 permissions, 'kvm' did not have write access. [Q#2] Why we cannot set 660 permissions to the images of a VM regardless, when the VM is started? KCS created: https://access.redhat.com/knowledge/solutions/295513
(In reply to comment #0) > Created attachment 697455 [details] > vm_start_error screen shot > > After upgrading to vdsm 4.9.6 NFS images permissions are 440 and vm cannot > start (qemu-kvm error "could not open disk...permission denied). > Changing 660 permission to the image file resolves the problem. > Unfortunately didn't reproduce in home. > > Setup (sfdc#00784032): > - NFS Data Center; > - upgrade from 2.2 using migration tool to 3.0. > [at this point customer also tried to upgrade to 3.1, but rhevm upgrade > failed (he had 2.2 DC) and rolled back unsuccessfully [BZ#909449] and other > problems. fixed now.] > - create new cluster with 3.0 compatibility mode and attach there RHEV-H > based RHEL6.3 - vdsm-4.9.6-45.2.el6_3.x86_64, 20130129.0.el6_3 > - try starting the VM on the new cluster - fail with "permission denied > error" (vm_start_error.jpg attached) > > [Q#1] Why the permissions became 440? When? On which log can we trace it? Probably started out as 440. Did you try to see if vdsm in rhev 2.2 was able to handle 440 permissions? > > > Additional Info: > Similar problem (does not seem like same problem, but same resolution) > (sfdc#00776680): > - Local storage > - RHEV-M upgraded from 3.0 to 3.1 > - vdsm upgraded from versions 4.9-xxx to 4.9.6-xx > - Red Hat Enterprise Linux (RHEL) - upgrade from 6.2 to 6.3 > - customer had 755 permissions on its images, changing to 660 resolved the > problem here. > We don't know how the customer ended up with 755 permissions initially. > We suspect that going from vdsm version 4.9 to 4.9.6 the userid used by vdsm > in this case was now 'kvm', > whereas before it was 'vdsm', and therefore with 755 permissions, 'kvm' did > not have write access. > > [Q#2] Why we cannot set 660 permissions to the images of a VM regardless, > when the VM is started? > > KCS created: https://access.redhat.com/knowledge/solutions/295513
On 2.2, nfs files permissions are 740. (.meta is 660). On clean 3.0 it is 660 by default. Starting RHEL 6.3 qemu is the user used to start the vm process, and not vdsm, as it used to be on 2.2. Though, qemu and vdsm both belong to kvm group, qemu cannot access the image file which has read only permissions for the group. Possible resolution: Option 1: when trying to start a VM, check the images permissions, and if not 660, change to. But this does not sound right to me from security perspective. (Maybe the DB is wrong and we are trying to access something that we should not). Option 2: when changing vm property to run on a different cluster on 3.0 and it is nfs - update permissions. Means, the change should be both on backend and vdsm. This should be fixed to the current 3.0.z, IMO. Of course we have a kcs, but still it can prevent many many opened cases and customers frustration.
(In reply to comment #3) > On 2.2, nfs files permissions are 740. (.meta is 660). > On clean 3.0 it is 660 by default. Since you cannot reproduce it, the problem is probably a bit different. Afaiu, when you install 2.2 and upgrade according to the manual it works fine, right? My question was what happens in 2.2 if you manually change permssions to 440? Does 2.2 live with it? (iirc vdsm ran qemu directly as vdsm user so it should work). > > Starting RHEL 6.3 qemu is the user used to start the vm process, and not > vdsm, as it used to be on 2.2. > Though, qemu and vdsm both belong to kvm group, qemu cannot access the image > file which has read only permissions for the group. > > > Possible resolution: > Option 1: when trying to start a VM, check the images permissions, and if > not 660, change to. But this does not sound right to me from security > perspective. (Maybe the DB is wrong and we are trying to access something > that we should not). > > Option 2: when changing vm property to run on a different cluster on 3.0 and > it is nfs - update permissions. Means, the change should be both on backend > and vdsm. > > This should be fixed to the current 3.0.z, IMO. > Of course we have a kcs, but still it can prevent many many opened cases and > customers frustration.
This comment is not in response to any of the above, but is in addition to the original description and comment 3. Another suggestion here is to display a clearer, more meaningful error message. In the customer's case that I worked on where the permissions of a VM's image'file were 755, the error reported was; libvirtError: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/01ff74eb-28e5-41a3-8f5c-78a84295767b/00bb31c2-3bcc-408f-a3bb-e0a3ecdb62e5/images/55bc6afa-e982-4092-b763-2002f30af5dc/e0800002-d5dc-4cd5-9b29-865bf3ffa115,if=none,id=drive-virtio-disk0,format=raw,serial=55bc6afa-e982-4092-b763-2002f30af5dc,cache=none,werror=stop,rerror=stop,aio=threads: could not open disk image /rhev/data-center/01ff74eb-28e5-41a3-8f5c-78a84295767b/00bb31c2-3bcc-408f-a3bb-e0a3ecdb62e5/images/55bc6afa-e982-4092-b763-2002f30af5dc/e0800002-d5dc-4cd5-9b29-865bf3ffa115: Permission denied Now I know it says "Permission denied", but considering that this VM was able to be started successfully immediately prior to the host being upgraded and that, when looking at this file, no changes had been made for some time, then it wasn't obvious what the problem was. It wasn't until the subtree was visibly checked against one from a lab system, that it became apparent that the permission requirements had potentially changed in the vdsm 4.9.6 version. Therefore, my suggestion is to display a message saying something like; "Current file permissions are 755, expected permissions are 660" or, if it's not possible to determine the precise file permissions, then even something like; "Invalid file permissions, expected permissions are 660" or if 660 is just one possibility of a valid value then; "Invalid file permissions, group requires write access" You get the idea. Thanks, GFW.
(In reply to comment #6) > (In reply to comment #3) > > On 2.2, nfs files permissions are 740. (.meta is 660). > > On clean 3.0 it is 660 by default. > > Since you cannot reproduce it, the problem is probably a bit different. > Afaiu, when you install 2.2 and upgrade according to the manual it works > fine, right? > My question was what happens in 2.2 if you manually change permssions to 440? > Does 2.2 live with it? (iirc vdsm ran qemu directly as vdsm user so it > should work). > Ayal, I didn't try reproducing from 2.2 directly, but having 2.2 compatibility mode DC on 3.0 (which didn't help). Till today I setup 2.2. So, changing the permissions of the file on 2.2 to 440 -> vm fails to start with "bad volume specification".
(In reply to comment #8) > (In reply to comment #6) > > (In reply to comment #3) > > > On 2.2, nfs files permissions are 740. (.meta is 660). > > > On clean 3.0 it is 660 by default. > > > > Since you cannot reproduce it, the problem is probably a bit different. > > Afaiu, when you install 2.2 and upgrade according to the manual it works > > fine, right? > > My question was what happens in 2.2 if you manually change permssions to 440? > > Does 2.2 live with it? (iirc vdsm ran qemu directly as vdsm user so it > > should work). > > > Ayal, I didn't try reproducing from 2.2 directly, but having 2.2 > compatibility mode DC on 3.0 (which didn't help). Till today I setup 2.2. > > So, changing the permissions of the file on 2.2 to 440 -> vm fails to start > with "bad volume specification". Dan, I seem to recall you made the relevant changes in vdsm for the transition to vdsm:kvm. Where in the process do we update the permissions?
Just to make it completely clear (running ls -l nfs.image) 2.2: -rwxr----- 1 vdsm kvm 3.0: -rw-rw----. 1 vdsm kvm 3.1 -rw-rw----. 1 vdsm kvm
To update on the other scenario on the case description - not sure why Local Storage would be affected by this bug. Just tested local storage images permissions on 3.0 are: -rw-rw----. 1 vdsm kvm And there were no Local Storage on 2.2. Then it should be a different bug or customer did something wrong.
(In reply to comment #9) > > Dan, I seem to recall you made the relevant changes in vdsm for the > transition to vdsm:kvm. > Where in the process do we update the permissions? Yes, I was involved there, and created the helper function copyUserModeToGroup() in fileVolume.llPrepare(). However, it cannot help if the owner has no write permissions to begin with. I don't understand how mode 0440 could have ever worked with anything.
(In reply to comment #15) > (In reply to comment #9) > > > > Dan, I seem to recall you made the relevant changes in vdsm for the > > transition to vdsm:kvm. > > Where in the process do we update the permissions? > > Yes, I was involved there, and created the helper function > copyUserModeToGroup() in fileVolume.llPrepare(). However, it cannot help if > the owner has no write permissions to begin with. I don't understand how > mode 0440 could have ever worked with anything. To summarize current status of the bug: We started a bug with 2 different scenarios and then narrowed it down to one on comment 3: https://bugzilla.redhat.com/show_bug.cgi?id=911417#c3 and comment 10: https://bugzilla.redhat.com/show_bug.cgi?id=911417#c10. Among the 4 attached cases only one had 440 permissions. All the rest had 740. The question is why this function copyUserModeToGroup() didn't work. When is it supposed to be applied? And what the code should do with the scenarios where it failed?
Looking at the code, I think that commit 5a0b2c912fb0ea5a305f191e9b558385ef249caa introduced a regression whereupon prepareVolume is no longer called and as a result copyUserModeToGroup is not invoked when running VMs. Fede, please take a look.
Checked on RHEVM - 3.2 - SF14 vdsm-4.10.2-16.0.el6ev.x86_64 Permissions of volume changed back to 660 after running the VM. -bash-4.1$ chmod 600 32ab6d6d-3611-43eb-bd00-28e5c7b0d910 -bash-4.1$ ls -l total 1028 -rw-------. 1 vdsm kvm 10737418240 Apr 30 2013 32ab6d6d-3611-43eb-bd00-28e5c7b0d910 -rw-rw----. 1 vdsm kvm 1048576 Apr 30 2013 32ab6d6d-3611-43eb-bd00-28e5c7b0d910.lease -rw-r--r--. 1 vdsm kvm 268 Apr 30 2013 32ab6d6d-3611-43eb-bd00-28e5c7b0d910.meta -bash-4.1$ ls -l total 1028 -rw-rw----. 1 vdsm kvm 10737418240 Apr 30 2013 32ab6d6d-3611-43eb-bd00-28e5c7b0d910 -rw-rw----. 1 vdsm kvm 1048576 Apr 30 2013 32ab6d6d-3611-43eb-bd00-28e5c7b0d910.lease -rw-r--r--. 1 vdsm kvm 268 Apr 30 2013 32ab6d6d-3611-43eb-bd00-28e5c7b0d910.meta
Simon, Please tell me whether the permissions I've specified in the Doc Text field (660) are the correct permissions. Thanks, Zac
(In reply to Zac Dover from comment #28) > Simon, > > Please tell me whether the permissions I've specified in the Doc Text field > (660) are the correct permissions. > > Thanks, > > Zac Yes
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0886.html