Bug 997350 - permission denied when start a guest in rhevm with libvirt-22 package
permission denied when start a guest in rhevm with libvirt-22 package
Status: CLOSED DUPLICATE of bug 964359
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.5
x86_64 Linux
urgent Severity urgent
: rc
: ---
Assigned To: Eric Blake
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-15 04:47 EDT by EricLee
Modified: 2013-08-26 10:37 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-26 10:37:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
libvirtd.log (80.27 KB, text/plain)
2013-08-15 04:48 EDT, EricLee
no flags Details
vdsm.log (116.72 KB, text/plain)
2013-08-15 04:49 EDT, EricLee
no flags Details
/var/log/libvirt/qemu/$guest.log (3.11 KB, text/plain)
2013-08-15 04:49 EDT, EricLee
no flags Details

  None (edit)
Description EricLee 2013-08-15 04:47:41 EDT
Description of problem:
permission denied when start a guest in rhevm with libvirt-22 package

Version-Release number of selected component (if applicable):
libvirt-0.10.2-22.el6.x86_64
kernel-2.6.32-410.el6.x86_64         
qemu-kvm-rhev-0.12.1.2-2.390.el6.x86_64
vdsm-4.10.2-24.1.el6ev.x86_64

How reproducible:
always

Steps to Reproduce:
1. new a guest from rhevm.
2. start it, will get error:
2013-Aug-15, 14:53 VM test is down. Exit message: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/f1200157-a994-4e04-bbd2-54a0287a372d/images/90a02bc3-0589-4ecd-9882-60182ed38a87/ac0fb0c3-8d64-435f-85fc-db1ac81deaac,if=none,id=drive-virtio-disk0,format=raw,serial=90a02bc3-0589-4ecd-9882-60182ed38a87,cache=none,werror=stop,rerror=stop,aio=threads: could not open disk image /rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/f1200157-a994-4e04-bbd2-54a0287a372d/images/90a02bc3-0589-4ecd-9882-60182ed38a87/ac0fb0c3-8d64-435f-85fc-db1ac81deaac: Permission denied .

Actual results:
as step

Expected results:
should start normally

Additional Info:
Both for spice and vnc as display have this problem.
But with libvirt-0.10.2-21.el6 works fine, so I think it is a regression problem, but I will not set that keywords, hope the bug Assignee will help to confirm it.
However, this bug blocks our test, so I will set blocker ? flag.

More logs please see the attachments.
Comment 1 EricLee 2013-08-15 04:48:30 EDT
Created attachment 786865 [details]
libvirtd.log
Comment 2 EricLee 2013-08-15 04:49:04 EDT
Created attachment 786866 [details]
vdsm.log
Comment 3 EricLee 2013-08-15 04:49:45 EDT
Created attachment 786867 [details]
/var/log/libvirt/qemu/$guest.log
Comment 4 Daniel Berrange 2013-08-15 04:56:00 EDT
Build -22 included Eric's upstream patches to change the way we initialize supplemental groups, and this seems like the only patch in that build which is likely to cause permission errors. There was one later patch upstream we don't seem to have picked up though

commit 3d0e3c1a297bcecb774902774bcd21f0ac4cfa1c
Author: Guido Günther <agx@sigxcpu.org>
Date:   Mon Aug 5 11:07:27 2013 +0200

    virGetGroupList: always include the primary group


I wonder if the lack of this patch is what's causing the permission denied error.
Comment 5 EricLee 2013-08-15 06:53:46 EDT
BTW, only vnc display guest in iscsi storage can be launched successfully.
And tls certification also failed, I think it is caused by fixing bug 975201.
Comment 6 Daniel Berrange 2013-08-15 07:05:38 EDT
(In reply to EricLee from comment #5)
> BTW, only vnc display guest in iscsi storage can be launched successfully.
> And tls certification also failed, I think it is caused by fixing bug 975201.

Please keep to one issue per bug report. File separate bugs for any separate issues you find like these.
Comment 7 EricLee 2013-08-15 07:17:22 EDT
(In reply to Daniel Berrange from comment #6)
> (In reply to EricLee from comment #5)
> > BTW, only vnc display guest in iscsi storage can be launched successfully.
> > And tls certification also failed, I think it is caused by fixing bug 975201.
> 
> Please keep to one issue per bug report. File separate bugs for any separate
> issues you find like these.

We know that spice display vm in rhevm need tls certification, so I thought that's the root cause for permission denied problem. Do you think they are different probelm? If so, I will file anther separate bug.
Comment 8 EricLee 2013-08-15 23:11:29 EDT
(In reply to Daniel Berrange from comment #6)
> (In reply to EricLee from comment #5)
> > BTW, only vnc display guest in iscsi storage can be launched successfully.
> > And tls certification also failed, I think it is caused by fixing bug 975201.
> 

For this issue, I have added a comment to https://bugzilla.redhat.com/show_bug.cgi?id=975201#c11, and waiting for Jiri's reply.

> Please keep to one issue per bug report. File separate bugs for any separate
> issues you find like these.
Comment 11 Michal Privoznik 2013-08-20 08:40:49 EDT
Eric,

I've prepared a scratch build for you:

https://brewweb.devel.redhat.com/taskinfo?taskID=6187847

Can you please give it a try to see if it does fix the issue?
Comment 12 EricLee 2013-08-20 09:03:54 EDT
(In reply to Michal Privoznik from comment #11)
> Eric,
> 
> I've prepared a scratch build for you:
> 
> https://brewweb.devel.redhat.com/taskinfo?taskID=6187847
> 
> Can you please give it a try to see if it does fix the issue?

Yes, I will have a try, and give you the result tomorrow.
Comment 13 EricLee 2013-08-20 21:47:17 EDT
(In reply to Michal Privoznik from comment #11)
> Eric,
> 
> I've prepared a scratch build for you:
> 
> https://brewweb.devel.redhat.com/taskinfo?taskID=6187847
> 
> Can you please give it a try to see if it does fix the issue?

Oooops, not work, get the same error as bug description:

2013-Aug-21, 09:40 VM test is down. Exit message: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/b6c56cf1-ee9c-468d-bc9e-e1bdc7a2f4ae/images/16c2191b-0d3f-4de1-84ac-4effa2f77c87/18a8908b-8042-4103-9d82-14bd2fe1f8ed,if=none,id=drive-virtio-disk0,format=raw,serial=16c2191b-0d3f-4de1-84ac-4effa2f77c87,cache=none,werror=stop,rerror=stop,aio=threads: could not open disk image /rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/b6c56cf1-ee9c-468d-bc9e-e1bdc7a2f4ae/images/16c2191b-0d3f-4de1-84ac-4effa2f77c87/18a8908b-8042-4103-9d82-14bd2fe1f8ed: Permission denied .
	
my packages:
# rpm -qa libvirt qemu-kvm-rhev vdsm; uname -r
vdsm-4.10.2-23.0.el6ev.x86_64
libvirt-0.10.2-22.el6997350.x86_64
qemu-kvm-rhev-0.12.1.2-2.397.el6.x86_64
2.6.32-412.el6.x86_64
Comment 14 Michal Privoznik 2013-08-21 07:54:17 EDT
Eric,

what's your libvirtd.conf and qemu.conf? Which selinux booleans do you have set?
Comment 15 EricLee 2013-08-21 08:39:47 EDT
(In reply to Michal Privoznik from comment #14)
> Eric,
> 
> what's your libvirtd.conf and qemu.conf? Which selinux booleans do you have
> set?

Hi Michal,

There is no difference between setting selinux boolean as 1 or 0, and my libvirtd.conf and qemu.conf is configurated by vdsm default:

# tail -13 /etc/libvirt/libvirtd.conf 
## beginning of configuration section by vdsm-4.10.2
listen_addr="0.0.0.0"
unix_sock_group="kvm"
unix_sock_rw_perms="0770"
auth_unix_rw="sasl"
host_uuid="edbc2d6f-51e8-4409-9fb7-1eb86c68d4dd"
keepalive_interval=-1
log_outputs="1:file:/var/log/libvirtd.log"
log_filters="3:virobject 3:virfile 2:virnetlink 3:cgroup 3:event 3:json 1:libvirt 1:util 1:qemu"
ca_file="/etc/pki/vdsm/certs/cacert.pem"
cert_file="/etc/pki/vdsm/certs/vdsmcert.pem"
key_file="/etc/pki/vdsm/keys/vdsmkey.pem"
## end of configuration section by vdsm-4.10.2

# tail -5 /etc/libvirt/qemu.conf 
## beginning of configuration section by vdsm-4.10.2
dynamic_ownership=0
spice_tls=1
spice_tls_x509_cert_dir="/etc/pki/vdsm/libvirt-spice"
## end of configuration section by vdsm-4.10.2

As I mention in comment #5, iscsi storage domain is different with nfs:

iscsi + spice will get error: 
2013-Aug-21, 20:17 VM test is down. Exit message: internal error process exited while connecting to monitor: ((null):13727): Spice-Warning **: reds.c:3247:reds_init_ssl: Could not use private key file failed to initialize spice server.

do not get permission denied error directly.

nfs + spice(the scenario which tested all above) will get error:
2013-Aug-21, 20:28 VM test1 is down. Exit message: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/aced69cb-2151-4093-a098-ef852ef36541/images/1e7fa974-8a5d-4e58-83e0-792ead290084/ec7ba0d8-60d7-4576-b0f6-8f65d46793d2,if=none,id=drive-virtio-disk0,format=raw,serial=1e7fa974-8a5d-4e58-83e0-792ead290084,cache=none,werror=stop,rerror=stop,aio=threads: could not open disk image /rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/aced69cb-2151-4093-a098-ef852ef36541/images/1e7fa974-8a5d-4e58-83e0-792ead290084/ec7ba0d8-60d7-4576-b0f6-8f65d46793d2: Permission denied .

and an iscsi + vnc guest can start normally.


Thanks,
EricLee
Comment 16 Michal Privoznik 2013-08-22 06:10:22 EDT
btw, after some debugging:

the disk image is owned by vdsm:kvm

-rw-rw----. 1 vdsm kvm 6442450944 Aug 22 16:46 /rhev/data-center/...

and from the /etc/passwd:

qemu:x:107:107:qemu user:/:/sbin/nologin 
vdsm:x:36:36:Node Virtualization Manager:/var/lib/vdsm:/sbin/nologin

/etc/group:

kvm:x:36:qemu,sanlock
qemu:x:107:vdsm,sanlock
sanlock:x:179:vdsm

So I guess we are still not getting the group list right.
Comment 17 Michal Privoznik 2013-08-22 06:31:26 EDT
I just confirmed with debugging we are trying to run qemu under 107:107:

virSecurityDACSetProcessLabel (mgr=<value optimized out>, def=0x7f20d0010180) at security/security_dac.c:893
893         VIR_DEBUG("Dropping privileges of DEF to %u:%u, %d supplemental groups",
(gdb) 
896         if (virSetUIDGID(user, group, groups, ngroups) < 0)
(gdb) p user
$1 = 107
(gdb) p group
$2 = 107
(gdb) p groups
$3 = (gid_t *) 0x0
(gdb) p ngroups
$4 = 0


Eric - any though on this?
Comment 18 Eric Blake 2013-08-22 08:35:23 EDT
(In reply to Michal Privoznik from comment #17)
> I just confirmed with debugging we are trying to run qemu under 107:107:
> 
> virSecurityDACSetProcessLabel (mgr=<value optimized out>,
> def=0x7f20d0010180) at security/security_dac.c:893
> 893         VIR_DEBUG("Dropping privileges of DEF to %u:%u, %d supplemental
> groups",
> (gdb) 
> 896         if (virSetUIDGID(user, group, groups, ngroups) < 0)
> (gdb) p user
> $1 = 107
> (gdb) p group
> $2 = 107
> (gdb) p groups
> $3 = (gid_t *) 0x0
> (gdb) p ngroups
> $4 = 0
> 
> 
> Eric - any though on this?

Ouch - this looks like a problem with the patch for bug 964359.  Working on it now...
Comment 20 Eric Blake 2013-08-23 11:35:31 EDT
I have a fix in my local tree, and am building a scratch build now.  Stay tuned...
Comment 29 Michael McConachie 2013-08-23 19:59:35 EDT
oVirt + FC18 user here -- so I don't mean to encroach.  Read before slamming please.  

If I read OP's initial issue correctly, I had very similar issues (even from a fresh install, or an install where I recovered old Datastores into a new install).

I just spent the better part of two days debugging this same issue.  It was driving me INSANE.  The issue was happening with, or without SELinux, NFS shares (labeled correctly btw) or even on local storage.  

The culprit / workaround I found was by editing:

-- /etc/libvirt/qemu.conf:

I had to uncomment two lines, and change the values.  After doing so spice, and VNC to start working again.  Note: For me this issue started after a recent yum update where I guess it had to do with updates to vdsm, and libvirt.

So that it reads as such:

   # The user ID for QEMU processes run by the system instance.
   user = "vdsm"

   # The group ID for QEMU processes run by the system instance.
   group = "kvm"

Again, I'm sorry in advance if this is the wrong thread, but I had this same issue with oVirt as of this week.  I hope it lends some ideas for the community.
Comment 31 Eric Blake 2013-08-26 10:34:30 EDT
(In reply to Michael McConachie from comment #29)
> The culprit / workaround I found was by editing:
> 
> -- /etc/libvirt/qemu.conf:
> 
> I had to uncomment two lines, and change the values.

That means your issue is different than the issue in this bug, and was caused by a misconfiguration.  ALL hosts in your network must have the same user/group settings in /etc/libvirt/qemu.conf, or you are liable to get domains with storage images slammed to read-only thanks to mismatched permissions.  There may still be a bug (more likely in VDSM instead of libvirt) if you ended up in a situation with inconsistent permissions in the first place, but if that's the case, file it as a separate bug rather than trying to piggyback on this unrelated issue.
Comment 33 Eric Blake 2013-08-26 10:37:47 EDT
Closing this as a duplicate of bug 964359, since this was caused by a latent bug in the patch used for that series, and as that series has not passed QE yet.

*** This bug has been marked as a duplicate of bug 964359 ***

Note You need to log in before you can comment on or make changes to this bug.