Bug 473102 - Nodes GATHER but don't form a configuration
Nodes GATHER but don't form a configuration
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync (Show other bugs)
6.0
All Linux
medium Severity medium
: rc
: ---
Assigned To: Steven Dake
Cluster QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-26 10:20 EST by Nate Straz
Modified: 2016-04-26 10:15 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-06-16 00:18:19 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Logs from all 28 nodes and revolver (11.95 MB, application/x-gzip)
2008-11-26 10:20 EST, Nate Straz
no flags Details

  None (edit)
Description Nate Straz 2008-11-26 10:20:17 EST
Created attachment 324732 [details]
Logs from all 28 nodes and revolver

Description of problem:

While running recovery tests on a large cluster (28 nodes) the membership fell apart and nodes formed their own rings and would not re-form the 28 node cluster.  In /var/log/messages I see:

openais[2719]: [TOTEM] entering GATHER state from 11.

This message repeats about 20 times then a configuration with just one node.

I'm using the following parameters in cluster.conf:
  <totem token="30000" consensus="29000" join="5000" send_join="80"/>

The attached logs are from two revolver scenarios.  In scenario 1.3, one node less than quorum was shot by revolver with "reboot -fin," which completed recovery and passed.  In scenario 1.4, one node more than quorum was shot and I hit the problem described above.  

Version-Release number of selected component (if applicable):
cman-2.0.97-1.el5
openais-0.80.3-21.el5


How reproducible:
I've run into this scenario many times before, but it probably takes a few tries to hit this.

Actual results:


Expected results:


Additional info:
Comment 1 Nate Straz 2008-12-01 13:15:07 EST
Putting this on the 5.4 radar so we can support large configurations.
Comment 4 RHEL Product and Program Management 2009-06-16 00:18:19 EDT
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.