From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.7) Gecko/20050417 Fedora/1.7.7-1.3.1 Description of problem: In rgmanager checked out from the RHEL4 branch, A Service's children are not stopped in the stop order specified in the service.sh meta-data section. They are actually stopped in the order of the start property, rather than the stop property. It occures this way in both clurgmgrd, and rg_test. Relevant parts of cluster config file: __BEGIN__ <resources> <service name="postgresql"/> <group name="postgresql" domain="mainfailover"/> <ip address="172.19.30.204" monitor_link="yes"/> <fs fstype="ext3" name="Postgresql Drive" mountpoint="/var/lib/pgsql" device="/dev/SharedGroup00/PostgresVol00" o ptions="noatime,acl"/> <script name="Postgresql Service" file="/etc/init.d/postgresql"/> </resources> <service ref="postgresql"> <group name="postgresql"/> <ip ref="172.19.30.204"/> <script ref="Postgresql Service"/> <fs ref="Postgresql Drive"/> </service> __END__ Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Configure a cluster service with a service, and a filesystem that is used by the service 2. Start the service with clusvcadm -e <service> 3. attempt to stop the service with clusvcadm -s <service> Actual Results: The services failed to stop, and the service went into failed, and a manual stop was required. checking /var/log/messages will show that it tried to stop the filesystem before stopping the service. Expected Results: The service is stopped, then the filesystem is stopped. Cluster resources in the stopped state. Additional info: This patch fixed it for me, it may not take every stop situation into account. But does cause resource moves to work correctly, as well as clusvcadm -s requests to not require manual intervention. Index: restree.c =================================================================== RCS file: /cvs/cluster/cluster/rgmanager/src/daemons/restree.c,v retrieving revision 1.10.2.3 diff -u -r1.10.2.3 restree.c --- restree.c 5 May 2005 20:41:08 -0000 1.10.2.3 +++ restree.c 9 May 2005 19:37:42 -0000 @@ -675,7 +675,11 @@ for (x = 0; rule->rr_childtypes && rule->rr_childtypes[x].rc_name; x++) { - lev = rule->rr_childtypes[x].rc_startlevel; + if(op == RS_STOP) + lev = rule->rr_childtypes[x].rc_stoplevel; + else + lev = rule->rr_childtypes[x].rc_startlevel; + if (!lev || lev != l) continue;
That patch is correct.
Patches in head and RHEL4 branch
BTW, you'll generally want "clusvcadm -d" rather than "clusvcadm -s". Stopping is temporary; services in the "stopped" state are evaluated by rgmanager after cluster membership changes to see if they should be started. A stopped service will be started again; a disabled service won't. (Just FYI)
Good to know. I've synced my local copy with the patches you put in CVS, since I missed the second place that it reads the start/stop ordering. Also, while trying to configure my cluster I noticed a few cluster.conf examples in the source tree that are out of date. Would you rather patches go to you, bugzilla or the linux-cluster mailing list?
For the example stuff, linux-cluster and CC me.