Description of problem: I have a 4-node cluster. all nodes are build as xen virtual machines. when all nodes are online, I see this when calling cman_tool status: [root@c10n1 ~]# cman_tool status Version: 6.0.1 Config Version: 19 Cluster Name: cluster10 Cluster Id: 53602 Cluster Member: Yes Cluster Generation: 320 Membership state: Cluster-Member Nodes: 4 Expected votes: 4 Total votes: 4 Quorum: 3 Active subsystems: 7 Flags: Ports Bound: 0 177 Node name: c10n1.example.com Node ID: 1 Multicast addresses: 239.192.209.52 Node addresses: 172.16.50.101 [root@c10n1 ~]# now, when I call "cman_tool leave remove" to remove one node from the cluster, I would expect that the number ob expected votes is recalculated by the number of nodes required to be online to gain quorum for the actual setup (aka number of nodes actually online). this is according to this info which is my understanding as well: http://post-office.corp.redhat.com/archives/cluster-list/2007-July/msg00145.html but that's not the case. when I remove one of the nodes from the cluster using "cman_tool force leave remove", or when I cleanly stop cman on one of the nodes, I get this: [root@c10n1 ~]# cman_tool status Version: 6.0.1 Config Version: 19 Cluster Name: cluster10 Cluster Id: 53602 Cluster Member: Yes Cluster Generation: 324 Membership state: Cluster-Member Nodes: 3 Expected votes: 4 Total votes: 3 Quorum: 3 Active subsystems: 7 Flags: Ports Bound: 0 177 Node name: c10n1.example.com Node ID: 1 Multicast addresses: 239.192.209.52 Node addresses: 172.16.50.101 [root@c10n1 ~]# expected votes is till 4. with my understanding it should be reduced to 2, since I now have 3 nodes online in the cluster, 1 node was cleanly removed from the cluster, to gain quorum for _this_ setup I now need 2 votes, that's what I expected to see from the expected votes line. Version-Release number of selected component (if applicable): cman-2.0.73-1.el5.i386.rpm How reproducible: every time. Steps to Reproduce: 1.cleanly remove a node from the cluster 2. 3. Actual results: expected votes does not change Expected results: expected votes is recalculated Additional info: I already tested this with cman-2.0.73-1.el5_1.4, same result.
Thorsten Scherf wrote: > Thats weird. I tested it several times with the result I reported. Now I > tested it again several times and I have a completely different result, > check my BZ #442008 for this. The number of expected votes is _not_ > recalculated even if I cleanly remove a node from the cluster. really looks > like something is not working proper in cman code... > > could you try to verify this? could it be related to xen? I spend most of yesterday trying to verify it, and it all works fine on my test cluster ... which is a Xen one. What do you mean by "cleanly remove" a node? The only way to reduce expected votes by removing a node is to use the "cman_tool leave remove" command. The init scripts do NOT do this TTBOMK (unless that's changed very recently). If you can reproduce it, can you start the cluster wit "cman tool join -d" and paste the output of all nodes into the BZ please ?
well, I have to use "cman_tool force leave remove", otherwise I get this: [root@c10n4 ~]# cman_tool leave remove cman_tool: Error leaving cluster: Device or resource busy [root@c10n4 ~]# could this be the difference? why do I have to use the force option here? guessing there are still some processes requiring cman? according to http://post-office.corp.redhat.com/archives/cluster-list/2007-July/msg00145.html calling service cman stop should also cleanly remove a node from the cluster, but haven't checked the init script so far.
'remove' or 'force' ? They are very different options.
I used this command: [root@c10n4 ~]# cman_tool force leave remove as force is a leave option, according to the man page.
force overrides remove, so it's correct that quorum is not adjusted with that command. You should not need to use force if all the cluster subsystems are shut down correctly.
Only cman runs on the node where I call "cman_tool leave remove". Nevertheless I get the device or ressource busy message.
Can we clarify what is happening here please ? I think the expected_votes bug is not a bug, just a misunderstanding. So on that basis alone I'm tempted to close this BZ If you think that cman is incorrectly preventing you from leaving the cluster then we need some more information, just /what/ processes are running on the system; output from cman_tool status; and an lsof output to see if any processes are still connected to cman.