According to the conga help: Delete Node - when a node is deleted, it is made to leave the cluster, all cluster services are stopped on the node, its cluster.conf file is deleted, and a new cluster.conf file is propagated to the remaining nodes in the cluster with the deleted node removed from the configuration. Note that deleting a node does not remove the installed cluster packages from the node. First, after the delete operation is done, the view conga shows you afterward still has that node in it. It is not removed from conga's view until conga is reloaded by clicking on the cluster tab. Second, the cluster.conf is not only "propagated to the remaining nodes" but als o to the node that was just deleted. Third, the other nodes don't appear to belive the systme is gone, just that it's currently not running: [root@taft-03 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M 84 2007-07-20 15:34:46 taft-02.lab.msp.redhat.com 2 M 92 2007-07-20 15:53:22 taft-01.lab.msp.redhat.com 3 X 88 taft-04.lab.msp.redhat.com 4 M 80 2007-07-20 15:34:46 taft-03.lab.msp.redhat.com Version-Release number of selected component (if applicable): ricci-0.10.0-2.el5 luci-0.10.0-2.el5
More delete stuff... After attempting to re-add the deleted node I see the following from conga: Status messages: * Host taft-04.lab.msp.redhat.com is already authenticated The following errors occurred: * taft-04.lab.msp.redhat.com reports it is already a member of cluster "taft_cluster"
Which version of cman do you have? This looks like it might be the same problem fixed here https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=244867
cman-2.0.69-1.el5
With the latest, all I see when attempting to delete is the following: Deletion of node "taft-03.lab.msp.redhat.com" from cluster "TAFT_CLUSTER" failed And nothing happens at all.
When this happens, it's an indication that at least one cluster service on the node could not be stopped. The delete procedure tries to stop all cluster services, then delete the cluster.conf and propagate a new conf to the remaining cluster members. If the first step fails, luci bails out. When the node couldn't be deleted, were there any init scripts hung trying to stop? If not, are you able to stop cluster services manually? If you edit /var/lib/luci/Extensions/conga_constants, at the bottom of the file, there are three constants that control debugging output. If you set LUCI_DEBUG_MODE to True and LUCI_DEBUG_VERBOSITY to something >= 3, and configure syslogd to log authpriv.debug, luci will produce (a lot) of debugging output that should indicate what went wrong.
Sorry, that should read /var/lib/luci/Extensions/conga_constants.py above.
Created attachment 161700 [details] log from luci server Here is the log that you requested during a delete attempt. Again, all I saw was, "Deletion of node "link-08.lab.msp.redhat.com" from cluster "LINK_278" failed"
Thanks for the log. Turns out the bug has nothing specific to do with node deletion; the real problem is luci is not properly handling clusters whose names are not all lowercase in a few places. Fix committed to CVS and will be in the next build.
fix verified in 0.10.0-5.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2007-0640.html