Bug 144309 - cman_tool leave remove: not adjusting quorum for continued operation
cman_tool leave remove: not adjusting quorum for continued operation
Status: CLOSED NEXTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: cman (Show other bugs)
4
All Linux
medium Severity medium
: ---
: ---
Assigned To: Christine Caulfield
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-05 15:44 EST by Derek Anderson
Modified: 2009-04-16 15:59 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-02-11 13:11:11 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Derek Anderson 2005-01-05 15:44:43 EST
Description of problem:
From the cman_tool manpage under the leave section:
"If this node is to be down for an extended period of time and you
need to keep  the  cluster  running, add the remove option, and the
remaining nodes will recalculate quorum such that activity can continue."

Test is to have a three node cluster up and quorate with no other
services running, then to run 'cman_tool leave remove' on two of them;
expect that the remaining node does not block activity.

Nodes are link-10,link-11,link-12.

### First run 'cman_tool leave remove' on link-10.  
link-10 kernel: CMAN: we are leaving the cluster. Removed
link-11 kernel: <no messages>
link-12 kernel: CMAN: Node link-10 is leaving the cluster, Removed

### Now run 'cman_tool leave remove on link-11.
link-11 kernel: CMAN: we are leaving the cluster. Removed
link-12 kernel: CMAN: Node link-11 is leaving the cluster, Removed
link-12 kernel: CMAN: quorum lost, blocking activity

### Double check status of remaining node, link-12.
[root@link-12 root]# cat /proc/cluster/status
Protocol version: 4.0.1
Config version: 1
Cluster name: MILTON
Cluster ID: 4812
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 3
Total_votes: 1
Quorum: 2  Activity blocked
Active subsystems: 0
Node addresses: 192.168.44.162

Version-Release number of selected component (if applicable):
6.1 RPMS built Wed 15 Dec 2004 01:13:08 PM CST

How reproducible:
Yes.

Steps to Reproduce:
1. 3 node quorate cman cluster
2. remove 2 of the nodes with 'cman_tool leave remove'
3.
  
Actual results:
Activity blocked on the remaining node due to loss of quorum

Expected results:
Activity not blocked.

Additional info:
Comment 1 Christine Caulfield 2005-01-06 11:41:19 EST
This was fixed in a checkin on the 16th December - and works for me
with current CVS.
Comment 2 Derek Anderson 2005-01-11 14:54:55 EST
I am still seeing this with the RPMs built yesterday, Monday January 10.
Comment 3 Christine Caulfield 2005-01-13 09:14:20 EST
Missed a corner case, sorry

Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/membership.c,v  <--  membership.c
new revision: 1.47; previous revision: 1.46
done
Comment 4 Derek Anderson 2005-02-10 12:06:33 EST
Still doesn't appear to be working.  These messages are from link-10;
ran 'cman_tool leave remove' on link-11 and then link-12.  15 seconds
later activity is blocked.

CMAN: removing node link-12 from the cluster : Removed
Feb 10 11:03:16 link-10 kernel: CMAN: Node link-12 is leaving the
cluster, Removed
Feb 10 11:03:16 link-10 kernel: CMAN: removing node link-12 from the
cluster : Removed
CMAN: removing node link-11 from the cluster : Removed
Feb 10 11:03:41 link-10 kernel: CMAN: Node link-11 is leaving the
cluster, Removed
Feb 10 11:03:41 link-10 kernel: CMAN: removing node link-11 from the
cluster : Removed
CMAN: quorum lost, blocking activity
Feb 10 11:03:56 link-10 kernel: CMAN: quorum lost, blocking activity

Node  Votes Exp Sts  Name
   1    1    3   M   link-10
   2    1    3   X   link-11
   3    1    3   X   link-12
Protocol version: 5.0.1
Config version: 2
Cluster name: MILTON
Cluster ID: 4812
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 3
Total_votes: 1
Quorum: 2  Activity blocked
Active subsystems: 9
Node addresses: 192.168.44.160

Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[1]

DLM Lock Space:  "clvmd"                             2   3 run       -
[1]

DLM Lock Space:  "data1"                             3   4 run       -
[1]

DLM Lock Space:  "data2"                             5   6 run       -
[1]

GFS Mount Group: "data1"                             4   5 run       -
[1]

GFS Mount Group: "data2"                             6   7 run       -
[1]
Comment 5 Christine Caulfield 2005-02-10 12:28:52 EST
You've got 2 GFS filesystems mounted, it shouldn't even start shutdown
on that node. Did you get a failure message from cman_tool leave?
Comment 6 Derek Anderson 2005-02-10 12:43:35 EST
<puzzled> ? 
 
Yes, I have 2 filesystems mounted on link-10.  I'm not trying to 
shut down cman on link-10, however.  The expectation is that since 
the other two nodes left with "remove" the last node would remain 
quorate and not go into "Activity Blocked" mode. 
Comment 7 Christine Caulfield 2005-02-11 05:20:35 EST
Apologies, I read that message just before I left and thought it
referred to a different bz.

Looks like a bit more patience would have helped me when testing the
previous fix too. It seems that the transition timer was being set for
a single-node transition when it shouldn't have. So after the node had
settled down nicely and self-quorate, the timer kicked in 15 seconds
later and spoiled it all.

This checkin should fix it, and it will also get rid of the duplicate
leave message too.

Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/membership.c,v  <--  membership.c
new revision: 1.60; previous revision: 1.59
done
Checking in membership.c;
/cvs/cluster/cluster/cman-kernel/src/membership.c,v  <--  membership.c
new revision: 1.44.2.9; previous revision: 1.44.2.8
done
Comment 8 Derek Anderson 2005-02-11 13:11:11 EST
Fix verified in cman-kernel-2.6.9-18.0.

Note You need to log in before you can comment on or make changes to this bug.