Bug 236276

Summary: rgmanager fails to start any <vm ...> service
Product: Red Hat Enterprise Linux 5 Reporter: Scott Bachmann <bachmann>
Component: rgmanagerAssignee: Lon Hohberger <lhh>
Status: CLOSED NOTABUG QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.0CC: capel, cluster-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-04-17 03:35:42 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Cluster configuration using <vm ...>
none
Output of test run using clusvcadm none

Description Scott Bachmann 2007-04-12 20:06:17 UTC
Description of problem:
The resource group manager fails to start any "vm" service.

Version-Release number of selected component (if applicable):
rgmanager-2.0.23 / CVS

How reproducible:
Always

Steps to Reproduce:
1. Create a cluster.conf that has a <vm ...> child of a <service ...>
  
Actual results:
Resource group manager fails to start or stop the vm service.

Expected results:
Resource group manager would properly start, stop and monitor a vm service.

Additional info:
The file /usr/share/cluster/service.sh fails to list "vm" as a possible 
child.  After adding <child type="vm" ...>, the resource group manager was 
able to properly start, stop and monitor the vm service.

Comment 1 Scott Bachmann 2007-04-12 20:06:17 UTC
Created attachment 152504 [details]
Cluster configuration using <vm ...>

Comment 2 Scott Bachmann 2007-04-12 20:11:27 UTC
*** Bug 236279 has been marked as a duplicate of this bug. ***

Comment 3 Lon Hohberger 2007-04-12 20:44:14 UTC
It works for me; what do your logs look like?  It sounds like cluster.conf
didn't get updated correctly on a particular node or something.

<child> tags are not a requirement; if unspecified any unlisted resource is
started after all defined child resource types.  So, if you added a file system
to your service, the <vm> instance would be started after the file system.

What does rg_test say for your config?

[root@asuka resources]# /usr/sbin/rg_test test /etc/cluster/cluster.conf
Running in test mode.
Loaded 18 resource rules
=== Resources List ===
Resource type: service [INLINE]
Instances: 1/1
Agent: service.sh
Attributes:
  name = test [ primary unique required ]

Resource type: vm [INLINE]
Instances: 1/1
Agent: vm.sh
Attributes:
  name = foo [ primary ]

=== Resource Tree ===
service {
  name = "test";
  vm {
    name = "foo";
  }
}
[root@asuka resources]# /usr/sbin/rg_test test /etc/cluster/cluster.conf start
service test
Running in test mode.
Starting test...
# xm command line: foo restart="never"
Error: Unable to open config file: foo
... (rest of xm errors) ...
Failed to start test



As a side note <vm> instances are not meant to be encapsulated in <service>
blocks (doing so will prevent live migration; which will be fixed in the next
errata).  You can put them on the same level as services (with failover domains
/ restart policies / etc... if you want).


Comment 4 Lon Hohberger 2007-04-12 20:49:01 UTC
        <rm>
                <failoverdomains>
                        <failoverdomain name="XenSrvA" ordered="1" restricted="1">
                                <failoverdomainnode name="sys-a" priority="1"/>
                                <failoverdomainnode name="sys-b" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <service name="site-a" domain="XenSrvA" autostart="0">
                      <vm name="website_a"/>
                </service>
        </rm>

FWIW, you could have done:

        <rm>
                <failoverdomains>
                        <failoverdomain name="XenSrvA" ordered="1">
                                <failoverdomainnode name="sys-a" priority="1"/>
                                <failoverdomainnode name="sys-b" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <vm name="website_a" domain="XenSrvA" autostart="0"/>
        </rm>

However, your configuration *should* work.

Comment 5 Scott Bachmann 2007-04-12 21:19:01 UTC
The output of rg_test is similar to yours, as expected.

[root@sys-a cluster]# rg_test test /etc/cluster/cluster.conf
Running in test mode.
Loaded 17 resource rules
=== Resources List ===
Resource type: service [INLINE]
Instances: 1/1
Agent: service.sh
Attributes:
  name = site-a [ primary unique required ]
  domain = XenSrvA
  autostart = 1

Resource type: vm [INLINE]
Instances: 1/1
Agent: vm.sh
Attributes:
  name = website_a [ primary ]

=== Resource Tree ===
service {
  name = "site-a";
  domain = "XenSrvA";
  autostart = "1";
  vm {
    name = "website_a";
  }
}

And I'm able to both start and stop the service using rg_test test, as you 
showed.  However, I'm unable to control the service from clusvcadm.  Enable 
and disable return a Success.  I modified vm.sh to echo the calling argument 
(start, stop, etc) to a log file, and the log file only shows meta-data 
requests.  No start or stop.  



Comment 6 Scott Bachmann 2007-04-12 21:20:55 UTC
Created attachment 152507 [details]
Output of test run using clusvcadm

Comment 7 Lon Hohberger 2007-04-12 21:39:38 UTC
That's really strange; my logs work (despite not having an actual VM, it does
try to start/stop it as it should):

Apr 12 16:39:34 asuka clurgmgrd[11869]: <notice> Starting stopped service
service:test
Apr 12 16:39:34 asuka clurgmgrd[11869]: <notice> start on vm "foo" returned 1
(generic error)
Apr 12 16:39:34 asuka clurgmgrd[11869]: <warning> #68: Failed to start
service:test; return value: 1
Apr 12 16:39:34 asuka clurgmgrd[11869]: <notice> Stopping service service:test
Apr 12 16:39:39 asuka clurgmgrd[11869]: <notice> Service service:test is recovering
Apr 12 16:39:40 asuka clurgmgrd[11869]: <warning> #71: Relocating failed service
service:test
Apr 12 16:39:40 asuka clurgmgrd[11869]: <notice> Stopping service service:test
Apr 12 16:39:45 asuka clurgmgrd[11869]: <notice> Service service:test is stopped

I'll keep this around until I can reproduce it.

Does putting <vm> at the top level work for you?

Comment 8 Lon Hohberger 2007-04-12 21:45:12 UTC
If not, run with the configuration that works for you until we figure out why
it's not working for you.

Comment 9 Scott Bachmann 2007-04-13 13:10:49 UTC
Putting <vm> at the top level also fails.  I'll go back and reinstall the 
system from scratch, and follow up on this when I have more information.  
Since I may not be running the version you have, what's the latest suggested 
version, CVS?

Comment 10 Lon Hohberger 2007-04-13 19:32:49 UTC
I was testing on 2.0.23

Comment 11 Scott Bachmann 2007-04-16 14:15:52 UTC
After reinstalling and testing with 2.0.23, I was able to use <vm>.  I think 
that I remembered incorrectly on if I had tested it with 2.0.23 while working 
out a few issues with the quorum disk (of which I see have already been fixed 
in the CVS).  The CVS version (RHEL5 branch) still fails, so I'll need to take 
another look at this or just wait for the next set of official updates. Thanks 
for your time!

Comment 12 Nate Straz 2007-12-13 17:19:04 UTC
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.

Comment 13 Lon Hohberger 2009-02-12 19:02:24 UTC
Another data point -- SELinux up to and including RHEL 5.3 can prevent rgmanager from starting VMs, even rg_test works.

We will be trying to resolve this in RHEL 5.4