Bug 442008 - expected_votes recalculation doesn't work when a node is removed
Summary: expected_votes recalculation doesn't work when a node is removed
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: cman
Version: 5.1
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-04-11 07:57 UTC by Thorsten Scherf
Modified: 2009-04-16 22:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-04-11 12:36:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Thorsten Scherf 2008-04-11 07:57:56 UTC
Description of problem:
I have a 4-node cluster. all nodes are build as xen virtual machines.
when all nodes are online, I see this when calling cman_tool status:

[root@c10n1 ~]# cman_tool status
Version: 6.0.1
Config Version: 19
Cluster Name: cluster10
Cluster Id: 53602
Cluster Member: Yes
Cluster Generation: 320
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Quorum: 3  
Active subsystems: 7
Flags: 
Ports Bound: 0 177  
Node name: c10n1.example.com
Node ID: 1
Multicast addresses: 239.192.209.52 
Node addresses: 172.16.50.101 
[root@c10n1 ~]# 

now, when I call "cman_tool leave remove" to remove one node from the cluster, I
would expect that the number ob expected votes is recalculated by the number of
nodes required to be online to gain quorum for the actual setup (aka number of
nodes actually online). this is according to this info which is my understanding
as well:

http://post-office.corp.redhat.com/archives/cluster-list/2007-July/msg00145.html

but that's not the case. when I remove one of the nodes from the cluster using
"cman_tool force leave remove", or when I cleanly stop cman on one of the nodes,
I get this: 

[root@c10n1 ~]# cman_tool status
Version: 6.0.1
Config Version: 19
Cluster Name: cluster10
Cluster Id: 53602
Cluster Member: Yes
Cluster Generation: 324
Membership state: Cluster-Member
Nodes: 3
Expected votes: 4
Total votes: 3
Quorum: 3  
Active subsystems: 7
Flags: 
Ports Bound: 0 177  
Node name: c10n1.example.com
Node ID: 1
Multicast addresses: 239.192.209.52 
Node addresses: 172.16.50.101 
[root@c10n1 ~]# 

expected votes is till 4. with my understanding it should be reduced to 2, since
I now have 3 nodes online in the cluster, 1 node was cleanly removed from the
cluster, to gain quorum for _this_ setup I now need 2 votes, that's what I
expected to see from the expected votes line.

  

Version-Release number of selected component (if applicable):
cman-2.0.73-1.el5.i386.rpm

How reproducible:
every time.

Steps to Reproduce:
1.cleanly remove a node from the cluster
2.
3.
  
Actual results:
expected votes does not change

Expected results:
expected votes is recalculated

Additional info:
I already tested this with cman-2.0.73-1.el5_1.4, same result.

Comment 1 Christine Caulfield 2008-04-11 08:10:15 UTC
Thorsten Scherf wrote:
> Thats weird. I tested it several times with the result I reported. Now I
> tested it again several times and I have a completely different result,
> check my BZ #442008 for this. The number of expected votes is _not_
> recalculated even if I cleanly remove a node from the cluster. really looks
> like something is not working proper in cman code...
> 
> could you try to verify this? could it be related to xen?
 
I spend most of yesterday trying to verify it, and it all works fine on my test
cluster ... which is a Xen one.

What do you mean by "cleanly remove" a node? The only way to reduce expected
votes by removing a node is to use the "cman_tool leave remove" command. The
init scripts do NOT do this TTBOMK (unless that's changed very recently).

If you can reproduce it, can you start the cluster wit "cman tool join -d" and
paste the output of all nodes into the BZ please ?


Comment 2 Thorsten Scherf 2008-04-11 08:31:30 UTC
well, I have to use "cman_tool force leave remove", otherwise I get this:

[root@c10n4 ~]# cman_tool leave remove
cman_tool: Error leaving cluster: Device or resource busy
[root@c10n4 ~]# 

could this be the difference? why do I have to use the force option here?
guessing there are still some processes requiring cman?

according to
http://post-office.corp.redhat.com/archives/cluster-list/2007-July/msg00145.html
calling service cman stop should also cleanly remove a node from the cluster,
but haven't checked the init script so far.


Comment 3 Christine Caulfield 2008-04-11 08:35:13 UTC
'remove' or 'force' ?

They are very different options.

Comment 4 Thorsten Scherf 2008-04-11 08:54:39 UTC
I used this command:

[root@c10n4 ~]# cman_tool force leave remove

as force is a leave option, according to the man page.

Comment 5 Christine Caulfield 2008-04-11 09:01:40 UTC
force overrides remove, so it's correct that quorum is not adjusted with that
command.

You should not need to use force if all the cluster subsystems are shut down
correctly.


Comment 6 Thorsten Scherf 2008-04-11 09:55:06 UTC
Only cman runs on the node where I call "cman_tool leave remove". Nevertheless I
get the device or ressource busy message. 
 

Comment 7 Christine Caulfield 2008-04-11 12:36:04 UTC
Can we clarify what is happening here please ?

I think the expected_votes bug is not a bug, just a misunderstanding. So on that
basis alone I'm tempted to close this BZ

If you think that cman is incorrectly preventing you from leaving the cluster
then  we need some more information, just /what/ processes are running on the
system; output from cman_tool status; and an lsof output to see if any processes
are still connected to cman.


Note You need to log in before you can comment on or make changes to this bug.