Red Hat Bugzilla – Bug 473102
Nodes GATHER but don't form a configuration
Last modified: 2016-04-26 10:15:09 EDT
Created attachment 324732 [details]
Logs from all 28 nodes and revolver
Description of problem:
While running recovery tests on a large cluster (28 nodes) the membership fell apart and nodes formed their own rings and would not re-form the 28 node cluster. In /var/log/messages I see:
openais: [TOTEM] entering GATHER state from 11.
This message repeats about 20 times then a configuration with just one node.
I'm using the following parameters in cluster.conf:
<totem token="30000" consensus="29000" join="5000" send_join="80"/>
The attached logs are from two revolver scenarios. In scenario 1.3, one node less than quorum was shot by revolver with "reboot -fin," which completed recovery and passed. In scenario 1.4, one node more than quorum was shot and I hit the problem described above.
Version-Release number of selected component (if applicable):
I've run into this scenario many times before, but it probably takes a few tries to hit this.
Putting this on the 5.4 radar so we can support large configurations.
Development Management has reviewed and declined this request. You may appeal
this decision by reopening this request.