Bug 824155 - Can't enable VM resources
Can't enable VM resources
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: selinux-policy (Show other bugs)
6.2
All Linux
unspecified Severity unspecified
: rc
: ---
Assigned To: Miroslav Grepl
Milos Malik
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-22 16:31 EDT by digimer
Modified: 2013-01-16 07:59 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-12-19 08:34:21 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Related cluster.conf (2.79 KB, application/octet-stream)
2012-05-22 16:31 EDT, digimer
no flags Details

  None (edit)
Description digimer 2012-05-22 16:31:52 EDT
Created attachment 586170 [details]
Related cluster.conf

Description of problem:

I've got a basic cluster (cluster.conf will be attached) with one VM. When I try to enable this VM, it fails. I've tried increasing rgmanager's log verbosity but nothing else is recorded beyond what is shown below.

When I try to enable a VM, the following is recorded in syslog:
==[ command ]=============
[root@il0049 ~]# clusvcadm -e vm:vm01-id
Local machine trying to enable vm:vm01-id...Failure
==[ node 1 ]==============
May 22 16:29:29 il0049 rgmanager[2525]: Starting disabled service vm:vm01-id
May 22 16:29:29 il0049 rgmanager[2525]: start on vm "vm01-id" returned 1 (generic error)
May 22 16:29:29 il0049 rgmanager[2525]: #68: Failed to start vm:vm01-id; return value: 1
May 22 16:29:29 il0049 rgmanager[2525]: Stopping service vm:vm01-id
May 22 16:29:30 il0049 rgmanager[2525]: Service vm:vm01-id is recovering
May 22 16:29:30 il0049 rgmanager[2525]: #71: Relocating failed service vm:vm01-id
May 22 16:29:31 il0049 rgmanager[2525]: Service vm:vm01-id is stopped
==[ node 2 ]==============
May 22 16:29:30 il0050 rgmanager[2542]: Recovering failed service vm:vm01-id
May 22 16:29:31 il0050 rgmanager[2542]: start on vm "vm01-id" returned 1 (generic error)
May 22 16:29:31 il0050 rgmanager[2542]: #68: Failed to start vm:vm01-id; return value: 1
May 22 16:29:31 il0050 rgmanager[2542]: Stopping service vm:vm01-id
May 22 16:29:31 il0050 rgmanager[2542]: Service vm:vm01-id is recovering
==========================

Version-Release number of selected component (if applicable):

rgmanager-3.0.12.1-5.el6.x86_64
resource-agents-3.9.2-7.el6.x86_64

How reproducible:

Rare; This is the first cluster I've run into this problem on.

Steps to Reproduce:
1. Create a cluster using the cluster.conf attached.
2. Try to start the VM using clusvcadm -e ...
3.
  
Actual results:

Fails on the target, tries to recover on peer, fails and enters "failed" state.

Expected results:

Start the VM, or at least generate a slightly more useful error.

Additional info:

I worry that I am making a dumb mistake and this will act as a permanent recording of my silliness.
Comment 1 digimer 2012-05-22 16:44:48 EDT
In vm.sh, I've added an 'echo' of the 'cmdline' variable which spits out:

=============
cmdline C: [virsh create /shared/definitions//vm01-id.xml]
May 22 16:42:42 il0050 rgmanager[2542]: start on vm "vm01-id" returned 1 (generic error)
May 22 16:42:42 il0050 rgmanager[2542]: #68: Failed to start vm:vm01-id; return value: 1
=============

If I manually run 'virsh create /shared/definitions//vm01-id.xml', the VM starts and the exit code $? is 0.
Comment 2 digimer 2012-05-22 16:54:34 EDT
I've narrowed it down now, but it makes no sense from what I can tell. I've edited /usr/share/cluster/vm.sh thusly:

============
--- /usr/share/cluster/vm.sh	2011-12-07 17:40:15.000000000 -0500
+++ /usr/share/cluster/test.vm.sh	2012-05-22 16:49:43.000000000 -0400
@@ -427,7 +427,9 @@
 
 	ocf_log debug "$cmdline"
 
+	echo "A ?: [$?] - cmdline: [$cmdline]" >> /var/log/messages
 	$cmdline
+	echo "B ?: [$?]" >> /var/log/messages
 	return $?
 }
============

When this runs;

============
A ?: [0] - cmdline: [virsh create /shared/definitions//vm01-id.xml]
B ?: [1]
============

When I run the command directly though, it's just fine:

============
[root@il0050 files]# virsh create /shared/definitions//vm01-id.xml 
Domain vm01-id created from /shared/definitions//vm01-id.xml
============

I'm stumped.
Comment 3 digimer 2012-05-22 16:56:19 EDT
Changing to 'resource-agents' as this seems to be a problem with 'vm.sh' specifically.
Comment 4 Lon Hohberger 2012-05-22 17:00:42 EDT
SElinux denial?
Comment 6 digimer 2012-05-22 20:04:09 EDT
From /var/log/audit/audit.log:
============
type=VIRT_RESOURCE msg=audit(1337731120.514:37): user pid=3356 uid=0 auid=4294967295 ses=4294967295 msg='virt=kvm resrc=cgroup reason=allow vm="vm01-id" uuid=557bf583-dfa8-d6cf-c61c-f3b992dbf320 cgroup="/cgroup/devices/libvirt/qemu/vm01-id/" class=path path=/dev/null rdev=01:03 acl=rw: exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success'
============

You called that one. Now if you'll excuse me, I need to build an effigy to selinux...
Comment 7 digimer 2012-05-22 20:06:28 EDT
I can't close this, mind doing the honours? NOTABUG, DIGIMISSEDTHEOBVIOUS. :)
Comment 8 Lon Hohberger 2012-05-23 10:43:55 EDT
It's still a bug, but in selinux-policy.
Comment 9 digimer 2012-05-23 10:53:52 EDT
Reopening then.
Comment 10 Miroslav Grepl 2012-05-24 07:22:33 EDT
So it works in permissive mode?
Comment 11 digimer 2012-05-24 12:26:40 EDT
I disabled selinux, but I expect permissive would have worked just fine as it doesn't block. If you wish, I can test this. If so, do you want me to watch for any particular log file or other output?
Comment 12 Miroslav Grepl 2012-05-25 02:40:37 EDT
It would be great. You just need to test it in permissive mode and then execute

# ausearch -m avc -ts recent
Comment 13 digimer 2012-05-25 16:18:06 EDT
=========
[root@il0049 ~]# clustat
Cluster Status for cc-cluster-A @ Fri May 25 16:15:28 2012
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 il0049.optical-online.com                   1 Online, Local, rgmanager
 il0050.optical-online.com                   2 Online, rgmanager

 Service Name                   Owner (Last)                   State         
 ------- ----                   ----- ------                   -----         
 service:storage_il49           il0049.optical-online.com      started       
 service:storage_il50           il0050.optical-online.com      started       
 vm:vm01-id                     (none)                         disabled      
=========
[root@il0049 ~]# sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   permissive
Mode from config file:          permissive
Policy version:                 24
Policy from config file:        targeted
=========
[root@il0049 ~]# clusvcadm -F -e vm:vm01-id
Local machine trying to enable vm:vm01-id...Success
vm:vm01-id is now running on il0049.optical-online.com
=========
[root@il0049 ~]# ausearch -m avc -ts recent
----
time->Fri May 25 16:17:17 2012
type=SYSCALL msg=audit(1337977037.785:48): arch=c000003e syscall=2 success=yes exit=11 a0=1aef6c0 a1=0 a2=7fff0bd8e890 a3=7fff0bd8e5d0 items=0 ppid=4564 pid=4652 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="virsh" exe="/usr/bin/virsh" subj=system_u:system_r:xm_t:s0 key=(null)
type=AVC msg=audit(1337977037.785:48): avc:  denied  { open } for  pid=4652 comm="virsh" name="vm01-id.xml" dev=dm-0 ino=131140 scontext=system_u:system_r:xm_t:s0 tcontext=unconfined_u:object_r:file_t:s0 tclass=file
type=AVC msg=audit(1337977037.785:48): avc:  denied  { read } for  pid=4652 comm="virsh" name="vm01-id.xml" dev=dm-0 ino=131140 scontext=system_u:system_r:xm_t:s0 tcontext=unconfined_u:object_r:file_t:s0 tclass=file
=========
[root@il0049 ~]# clustat
Cluster Status for cc-cluster-A @ Fri May 25 16:17:29 2012
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 il0049.optical-online.com                   1 Online, Local, rgmanager
 il0050.optical-online.com                   2 Online, rgmanager

 Service Name                   Owner (Last)                   State         
 ------- ----                   ----- ------                   -----         
 service:storage_il49           il0049.optical-online.com      started       
 service:storage_il50           il0050.optical-online.com      started       
 vm:vm01-id                     il0049.optical-online.com      started
Comment 14 Miroslav Grepl 2012-05-27 15:29:44 EDT
You need to fix labelinf for "vm:vm01-id" file. Then it should work.

file_t indicates that the file has no label.

Execute

# restorecon -R -v PATHO/vm:vm01-id
Comment 15 digimer 2012-05-28 15:03:54 EDT
This did not solve the problem.

=========
[root@il0049 ~]# clustat 
Cluster Status for cc-cluster-A @ Mon May 28 14:37:39 2012
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 il0049.optical-online.com                   1 Online, Local, rgmanager
 il0050.optical-online.com                   2 Online, rgmanager

 Service Name                   Owner (Last)                   State         
 ------- ----                   ----- ------                   -----         
 service:storage_il49           il0049.optical-online.com      started       
 service:storage_il50           il0050.optical-online.com      started       
 vm:vm01-id                     (none)                         disabled      
=========
[root@il0049 ~]# sestatus 
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted
=========
[root@il0049 ~]# restorecon -R -v PATHO/vm:vm01-id
=========
[root@il0049 ~]# clusvcadm -F -e vm:vm01-id
Local machine trying to enable vm:vm01-id...Failure
=========[ Syslog from il0049 ]=========
May 28 14:47:35 il0049 rgmanager[2437]: start on vm "vm01-id" returned 1 (generic error)
May 28 14:47:35 il0049 rgmanager[2437]: #68: Failed to start vm:vm01-id; return value: 1
May 28 14:47:35 il0049 rgmanager[2437]: Stopping service vm:vm01-id
May 28 14:47:35 il0049 rgmanager[2437]: Service vm:vm01-id is recovering
=========[ Syslog from il0050 ]=========
May 28 14:47:35 il0050 rgmanager[2453]: Recovering failed service vm:vm01-id
May 28 14:47:36 il0050 rgmanager[2453]: start on vm "vm01-id" returned 1 (generic error)
May 28 14:47:36 il0050 rgmanager[2453]: #68: Failed to start vm:vm01-id; return value: 1
May 28 14:47:36 il0050 rgmanager[2453]: Stopping service vm:vm01-id
May 28 14:47:36 il0050 rgmanager[2453]: Service vm:vm01-id is recovering
(it sits here with the service in 'recovering' state.
=========
[root@il0049 ~]# ausearch -m avc -ts recent
----
time->Mon May 28 14:42:27 2012
type=SYSCALL msg=audit(1338230547.712:9): arch=c000003e syscall=21 success=no exit=-13 a0=173e208 a1=1 a2=7fff19afe240 a3=100 items=0 ppid=1941 pid=1942 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="find" exe="/bin/find" subj=system_u:system_r:corosync_t:s0 key=(null)
type=AVC msg=audit(1338230547.712:9): avc:  denied  { execute } for  pid=1942 comm="find" name="vm.sh.test" dev=md2 ino=393922 scontext=system_u:system_r:corosync_t:s0 tcontext=unconfined_u:object_r:usr_t:s0 tclass=file
=========
Cluster Status for cc-cluster-A @ Mon May 28 14:51:38 2012
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 il0049.optical-online.com                   1 Online, Local, rgmanager
 il0050.optical-online.com                   2 Online, rgmanager

 Service Name                   Owner (Last)                   State         
 ------- ----                   ----- ------                   -----         
 service:storage_il49           il0049.optical-online.com      started       
 service:storage_il50           il0050.optical-online.com      started       
 vm:vm01-id                     none                           recovering
Comment 16 Miroslav Grepl 2012-05-29 01:43:21 EDT
Ok, this is a different issue. Where is the "vm.sh.test" exactly located?
Comment 18 digimer 2012-09-05 08:52:19 EDT
I believe that I had left it in the same directory as vm.sh. When I removed it, things started working.
Comment 19 Miroslav Grepl 2012-09-11 04:37:03 EDT
actually

# matchpathcon /usr/share/cluster/test.vm.sh
/usr/share/cluster/test.vm.sh	system_u:object_r:bin_t:s0

so I guess you needed just execute restorecon on this script.
Comment 23 Miroslav Grepl 2012-12-19 08:34:21 EST
The file got wrong context because of the wrong file extension. All files in /usr/share/cluster ending with .sh will get the correct context.

Note You need to log in before you can comment on or make changes to this bug.