Bug 339471 - Impossible to remove a dead node from cman
Impossible to remove a dead node from cman
Status: CLOSED DUPLICATE of bug 244867
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
All Linux
low Severity high
: ---
: ---
Assigned To: Ryan O'Hara
GFS Bugs
Depends On:
  Show dependency treegraph
Reported: 2007-10-19 07:04 EDT by Tim Verhoeven
Modified: 2009-04-16 19:00 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-12-14 11:56:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Tim Verhoeven 2007-10-19 07:04:57 EDT
Description of problem:
A dead node makes it impossible to make changes to the cluster config

Version-Release number of selected component (if applicable):

How reproducible:
Have not attempted to reproduce. This is a live cluster.

Steps to Reproduce:
1. Removed node from cluster using conga
2. ccs_tool lsnode list remaining nodes
3. cman_tool nodes list remaining nodes and removed node
4. ccs_tool update ... fails because it can't contact the removed node
Additional info:
There should be a way to remove a dead node from cman. For example, you have a
hardware failure. It takes time to replace that node. During that time you
cannot make any updates to the cluster configuration.

A workaround was to reboot each remaining node of the cluster indivually. This
cleared up cman_tool nodes. When all nodes where rebooted changes to the
configuration where possible again.

Should I also create a bugzille for the fact that conga/luci did not do a
cman_tool leave remove ? Or is that a know bug ? (luci version was 0.9.2-6.el5)
Comment 1 David Teigland 2007-12-04 12:53:28 EST
It seems to me that a failed node should not prevent ccs_tool from
updating cluster.conf
Comment 2 Ryan O'Hara 2007-12-11 16:32:56 EST
Could this be an issue with Conga? I've never seen this behavior before. Added
one of the Conga developers to the BZ CC list to perhaps help answer this question.
Comment 3 RHEL Product and Program Management 2007-12-11 18:14:23 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
Comment 4 Ryan McCabe 2007-12-12 13:41:35 EST
IIRC there was a bug in ccsd prior to 5.1 (I think the fix went out as a
z-stream update) that caused ccsd to report failure when it you attempted to
propagate a new configuration and there was at least one node that was not a
member or estranged. It also happened if you were using qdisk (it'd try to send
the new conf to node 0). See bug #244867 for more info. I just checked the
sources for 2.0.64-1.0.1.el5 and the fix for that bug is not in there. Upgrading
to the latest cman package ought to fix this.
Comment 5 Ryan O'Hara 2007-12-12 14:02:52 EST
I'm unable to recreate this bug by using command-line tools. I guess I am unsure
what step #1 (aka removing a node from a running cluster) actually means. It
seems that removing the node means actually removing it from the cluster.conf
file. This seems to be the case since it is stated that 'ccs_tool lsnode' lists
only the remaining nodes, and 'ccs_tool lsnode' parses the cluster.conf file
directory. So the node must be removed from the cluster.conf file. Its unclear
that the node in question is still a member of the cluster or not, so I tested
both scenarios.

Running 'ccs_tool update /etc/cluster/cluster.conf' worked for me every time.
Specifically, when the node was removed from the cluster.conf and still in the
cluster and also when the node was removed from the cluster.conf but also left
the running cluster.
Comment 6 Ryan O'Hara 2007-12-12 14:04:12 EST
Ah. Comment #4 seems to explain the problem. I was testing on a RHEL5.1 machine,
which has the fix that Ryan referenced. I think the problem reported here has
been fixed in 5.1.

Comment 7 Ryan O'Hara 2007-12-14 11:56:21 EST
Closing as dup of #244867, which is fixed in 5.1.

*** This bug has been marked as a duplicate of 244867 ***

Note You need to log in before you can comment on or make changes to this bug.