Red Hat Bugzilla – Bug 249291
delete node task fails to do all items listed in the help document
Last modified: 2009-04-16 18:58:55 EDT
According to the conga help:
Delete Node - when a node is deleted, it is made to leave the cluster, all
cluster services are stopped on the node, its cluster.conf file is deleted, and
a new cluster.conf file is propagated to the remaining nodes in the cluster with
the deleted node removed from the configuration. Note that deleting a node does
not remove the installed cluster packages from the node.
First, after the delete operation is done, the view conga shows you afterward
still has that node in it. It is not removed from conga's view until conga is
reloaded by clicking on the cluster tab.
Second, the cluster.conf is not only "propagated to the remaining nodes" but als
o to the node that was just deleted.
Third, the other nodes don't appear to belive the systme is gone, just that it's
currently not running:
[root@taft-03 ~]# cman_tool nodes
Node Sts Inc Joined Name
1 M 84 2007-07-20 15:34:46 taft-02.lab.msp.redhat.com
2 M 92 2007-07-20 15:53:22 taft-01.lab.msp.redhat.com
3 X 88 taft-04.lab.msp.redhat.com
4 M 80 2007-07-20 15:34:46 taft-03.lab.msp.redhat.com
Version-Release number of selected component (if applicable):
More delete stuff...
After attempting to re-add the deleted node I see the following from conga:
* Host taft-04.lab.msp.redhat.com is already authenticated
The following errors occurred:
* taft-04.lab.msp.redhat.com reports it is already a member of cluster
Which version of cman do you have? This looks like it might be the same problem
fixed here https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=244867
With the latest, all I see when attempting to delete is the following:
Deletion of node "taft-03.lab.msp.redhat.com" from cluster "TAFT_CLUSTER" failed
And nothing happens at all.
When this happens, it's an indication that at least one cluster service on the
node could not be stopped. The delete procedure tries to stop all cluster
services, then delete the cluster.conf and propagate a new conf to the
remaining cluster members. If the first step fails, luci bails out. When the
node couldn't be deleted, were there any init scripts hung trying to stop? If
not, are you able to stop cluster services manually?
If you edit /var/lib/luci/Extensions/conga_constants, at the bottom of the
file, there are three constants that control debugging output. If you set
LUCI_DEBUG_MODE to True and LUCI_DEBUG_VERBOSITY to something >= 3, and
configure syslogd to log authpriv.debug, luci will produce (a lot) of debugging
output that should indicate what went wrong.
Sorry, that should read /var/lib/luci/Extensions/conga_constants.py above.
Created attachment 161700 [details]
log from luci server
Here is the log that you requested during a delete attempt. Again, all I saw
was, "Deletion of node "link-08.lab.msp.redhat.com" from cluster "LINK_278"
Thanks for the log. Turns out the bug has nothing specific to do with node
deletion; the real problem is luci is not properly handling clusters whose names
are not all lowercase in a few places. Fix committed to CVS and will be in the
fix verified in 0.10.0-5.el5.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.