Description of problem: I've been seeing this a lot lately and it's annoying me. :) I'll try and debug this a bit more later but here's what I've been seeing: [root@taft-01 ~]# cat /proc/cluster/services Service Name GID LID State Code [root@taft-01 ~]# service cman stop Stopping cman: [FAILED] [root@taft-01 ~]# cat /proc/cluster/nodes Node Votes Exp Sts Name from syslog: May 1 10:30:42 taft-01 kernel: CMAN: we are leaving the cluster. May 1 10:30:42 taft-01 ccsd[8725]: Cluster manager shutdown. Attemping to reconnect... May 1 10:30:45 taft-01 cman: failed to stop cman failed Version-Release number of selected component (if applicable): [root@taft-01 ~]# rpm -q cman cman-1.0.4-0 How reproducible: almost every time
Needs passing to whever manages the init scripts. All the subsystems need to be shut down. This includes ccsd which polls cman to see if it is active.
First, when did that change? Ccsd must be started before cman can be started and we've always (since the begining of rhel4) stopped cman before stopping ccsd and it worked just fine. Second, stopping ccsd first still doen't help: [root@taft-02 ~]# service ccsd stop Stopping ccsd: [ OK ] [root@taft-02 ~]# service cman stop Stopping cman: [FAILED] In both cases (before and after ccsd is stopped), a leave by hand shows that the stop did infact work: [root@taft-02 ~]# cman_tool leave cman_tool: Error leaving cluster: Cluster software not started
Created attachment 128512 [details] fix for bz The cman init script ties the success of 'service cman stop' to successfully completing a 'modprobe -r cman'. The 'modprobe -r' doesn't always succeed because other modules (gfs, lock_dlm) may still be using it. This patch ties the success of 'service cman stop' to 'cman_tool leave' which makes more sense to me. Let me know if this works for you and I'll commit the patch.
Having the success tied to cluster membership removal does make more sense. The patch works for me.
Checked in fix to RHEL4, HEAD and STABLE.
fix verified.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2006-0556.html