Red Hat Bugzilla – Bug 133254
ccsd cluster.conf version drops 2 version levels after an update follwed by recovery
Last modified: 2009-04-16 16:03:59 EDT
Description of problem:
1. get all nodes in cluster to a particlular cluster.conf version
level greater than 2
2. then update one node
3. send sig -HUP on that node
4. then kill that node
5. verify the cluster.conf version levels on all other nodes, should
have gotten bumped.
6. bring killed node back up
7. check it's cluster.conf version, should be same as all others
8. start ccsd
9. check it's cluster.conf version, should be same as all others
10. attempt cman_tool join
You then get a:
CMAN: Cluster membership rejected
11. check it's cluster.conf version, it goes down exactly 2 version
Updates with the proper version and component name.
Updates with the proper version and component name. Again, just love out tools.
- fix bug 143165, 134604, and 133254 - update related issues
These all seem to be related to the same issue, that is, remote
nodes were erroneously processing an update as though they were
the originator - taking on some tasks that didn't belong to them.
This was causing connect failures, version rollbacks, etc.
With exact same senario, the cman_tool join command nolonger fails
with an error. HOWEVER, rather than dropping 2 version levels, it
drops all the way down to first known version level.
So while all others are at say v8, the recovered node with v8 before
the cman_tool join attempt, drops down to v1 after the cman_tool join.
The problem stemmed from the fact that connect and broadcast request
processing went through different code paths.
When an update happens, a bit is set telling the daemon that the next
request should trigger a read of the config file. While this happened
correctly for a connect, it did not for a broadcast.
When the node comes back up, the daemon started, and cman_tool join
initiated; the daemon broadcasts for config files. The other nodes
were responding back with an old version (ignoring the update bit).
Since the other nodes are quorate, ccsd respects their view instead of
the more recent view it has - thus resulting in a version regression.
Now that this is fixed, the cman_tool join error will come back. (I
have seen it in the logs - I don't know what Dave has done to the
printout on the command line.) This is a result of the fact that cman
has not been made aware of the update (via 'cman_tool version -r
<new>'). So the incomming node will have a version number that is
different (higher) than the rest of the nodes - thus, it is rejected.
ccs_tool will soon (i.e now) take away the need to run 'cman_tool
version -r <new>'