Bug 133254 - ccsd cluster.conf version drops 2 version levels after an update follwed by recovery
ccsd cluster.conf version drops 2 version levels after an update follwed by r...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: ccs (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Jonathan Earl Brassow
GFS Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-22 15:08 EDT by Corey Marthaler
Modified: 2009-04-16 16:03 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-01-10 19:06:16 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2004-09-22 15:08:35 EDT
Description of problem:
1. get all nodes in cluster to a particlular cluster.conf version
level greater than 2
2. then update one node
3. send sig -HUP on that node
4. then kill that node
5. verify the cluster.conf version levels on all other nodes, should
have gotten bumped.
6. bring killed node back up
7. check it's cluster.conf version, should be same as all others
8. start ccsd
9. check it's cluster.conf version, should be same as all others
10. attempt cman_tool join

You then get a:
CMAN: Cluster membership rejected

11. check it's cluster.conf version, it goes down exactly 2 version
levels.

CRAZY! :)

How reproducible:
Always
Comment 1 Kiersten (Kerri) Anderson 2004-11-04 10:08:13 EST
Updates with the proper version and component name.
Comment 2 Kiersten (Kerri) Anderson 2004-11-04 10:17:01 EST
Updates with the proper version and component name.
Comment 3 Kiersten (Kerri) Anderson 2004-11-04 10:21:08 EST
Updates with the proper version and component name. Again, just love out tools.
Comment 4 Jonathan Earl Brassow 2004-12-17 12:53:56 EST
- fix bug 143165, 134604, and 133254 - update related issues
  These all seem to be related to the same issue, that is, remote
  nodes were erroneously processing an update as though they were
  the originator - taking on some tasks that didn't belong to them.

  This was causing connect failures, version rollbacks, etc.
Comment 5 Corey Marthaler 2004-12-20 15:19:26 EST
With exact same senario, the cman_tool join command nolonger fails
with an error. HOWEVER, rather than dropping 2 version levels, it
drops all the way down to first known version level. 

So while all others are at say v8, the recovered node with v8 before
the cman_tool join attempt, drops down to v1 after the cman_tool join.
Comment 6 Jonathan Earl Brassow 2005-01-05 16:54:09 EST
The problem stemmed from the fact that connect and broadcast request
processing went through different code paths.

When an update happens, a bit is set telling the daemon that the next
request should trigger a read of the config file.  While this happened
correctly for a connect, it did not for a broadcast.

When the node comes back up, the daemon started, and cman_tool join
initiated; the daemon broadcasts for config files.  The other nodes
were responding back with an old version (ignoring the update bit). 
Since the other nodes are quorate, ccsd respects their view instead of
the more recent view it has - thus resulting in a version regression.

Now that this is fixed, the cman_tool join error will come back.  (I
have seen it in the logs - I don't know what Dave has done to the
printout on the command line.)  This is a result of the fact that cman
has not been made aware of the update (via 'cman_tool version -r
<new>').  So the incomming node will have a version number that is
different (higher) than the rest of the nodes - thus, it is rejected.

ccs_tool will soon (i.e now) take away the need to run 'cman_tool
version -r <new>'

Comment 7 Corey Marthaler 2005-01-10 19:06:16 EST
fix verified.

Note You need to log in before you can comment on or make changes to this bug.