Bug 442008 - expected_votes recalculation doesn't work when a node is removed
expected_votes recalculation doesn't work when a node is removed
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman (Show other bugs)
5.1
All Linux
low Severity low
: rc
: ---
Assigned To: Christine Caulfield
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-11 03:57 EDT by Thorsten Scherf
Modified: 2009-04-16 18:51 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-04-11 08:36:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Thorsten Scherf 2008-04-11 03:57:56 EDT
Description of problem:
I have a 4-node cluster. all nodes are build as xen virtual machines.
when all nodes are online, I see this when calling cman_tool status:

[root@c10n1 ~]# cman_tool status
Version: 6.0.1
Config Version: 19
Cluster Name: cluster10
Cluster Id: 53602
Cluster Member: Yes
Cluster Generation: 320
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Quorum: 3  
Active subsystems: 7
Flags: 
Ports Bound: 0 177  
Node name: c10n1.example.com
Node ID: 1
Multicast addresses: 239.192.209.52 
Node addresses: 172.16.50.101 
[root@c10n1 ~]# 

now, when I call "cman_tool leave remove" to remove one node from the cluster, I
would expect that the number ob expected votes is recalculated by the number of
nodes required to be online to gain quorum for the actual setup (aka number of
nodes actually online). this is according to this info which is my understanding
as well:

http://post-office.corp.redhat.com/archives/cluster-list/2007-July/msg00145.html

but that's not the case. when I remove one of the nodes from the cluster using
"cman_tool force leave remove", or when I cleanly stop cman on one of the nodes,
I get this: 

[root@c10n1 ~]# cman_tool status
Version: 6.0.1
Config Version: 19
Cluster Name: cluster10
Cluster Id: 53602
Cluster Member: Yes
Cluster Generation: 324
Membership state: Cluster-Member
Nodes: 3
Expected votes: 4
Total votes: 3
Quorum: 3  
Active subsystems: 7
Flags: 
Ports Bound: 0 177  
Node name: c10n1.example.com
Node ID: 1
Multicast addresses: 239.192.209.52 
Node addresses: 172.16.50.101 
[root@c10n1 ~]# 

expected votes is till 4. with my understanding it should be reduced to 2, since
I now have 3 nodes online in the cluster, 1 node was cleanly removed from the
cluster, to gain quorum for _this_ setup I now need 2 votes, that's what I
expected to see from the expected votes line.

  

Version-Release number of selected component (if applicable):
cman-2.0.73-1.el5.i386.rpm

How reproducible:
every time.

Steps to Reproduce:
1.cleanly remove a node from the cluster
2.
3.
  
Actual results:
expected votes does not change

Expected results:
expected votes is recalculated

Additional info:
I already tested this with cman-2.0.73-1.el5_1.4, same result.
Comment 1 Christine Caulfield 2008-04-11 04:10:15 EDT
Thorsten Scherf wrote:
> Thats weird. I tested it several times with the result I reported. Now I
> tested it again several times and I have a completely different result,
> check my BZ #442008 for this. The number of expected votes is _not_
> recalculated even if I cleanly remove a node from the cluster. really looks
> like something is not working proper in cman code...
> 
> could you try to verify this? could it be related to xen?
 
I spend most of yesterday trying to verify it, and it all works fine on my test
cluster ... which is a Xen one.

What do you mean by "cleanly remove" a node? The only way to reduce expected
votes by removing a node is to use the "cman_tool leave remove" command. The
init scripts do NOT do this TTBOMK (unless that's changed very recently).

If you can reproduce it, can you start the cluster wit "cman tool join -d" and
paste the output of all nodes into the BZ please ?
Comment 2 Thorsten Scherf 2008-04-11 04:31:30 EDT
well, I have to use "cman_tool force leave remove", otherwise I get this:

[root@c10n4 ~]# cman_tool leave remove
cman_tool: Error leaving cluster: Device or resource busy
[root@c10n4 ~]# 

could this be the difference? why do I have to use the force option here?
guessing there are still some processes requiring cman?

according to
http://post-office.corp.redhat.com/archives/cluster-list/2007-July/msg00145.html
calling service cman stop should also cleanly remove a node from the cluster,
but haven't checked the init script so far.
Comment 3 Christine Caulfield 2008-04-11 04:35:13 EDT
'remove' or 'force' ?

They are very different options.
Comment 4 Thorsten Scherf 2008-04-11 04:54:39 EDT
I used this command:

[root@c10n4 ~]# cman_tool force leave remove

as force is a leave option, according to the man page.
Comment 5 Christine Caulfield 2008-04-11 05:01:40 EDT
force overrides remove, so it's correct that quorum is not adjusted with that
command.

You should not need to use force if all the cluster subsystems are shut down
correctly.
Comment 6 Thorsten Scherf 2008-04-11 05:55:06 EDT
Only cman runs on the node where I call "cman_tool leave remove". Nevertheless I
get the device or ressource busy message. 
 
Comment 7 Christine Caulfield 2008-04-11 08:36:04 EDT
Can we clarify what is happening here please ?

I think the expected_votes bug is not a bug, just a misunderstanding. So on that
basis alone I'm tempted to close this BZ

If you think that cman is incorrectly preventing you from leaving the cluster
then  we need some more information, just /what/ processes are running on the
system; output from cman_tool status; and an lsof output to see if any processes
are still connected to cman.

Note You need to log in before you can comment on or make changes to this bug.