Bug 473102
Summary: | Nodes GATHER but don't form a configuration | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Nate Straz <nstraz> | ||||
Component: | corosync | Assignee: | Steven Dake <sdake> | ||||
Status: | CLOSED WONTFIX | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 6.0 | CC: | cluster-maint, edamato | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-06-16 04:18:19 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Putting this on the 5.4 radar so we can support large configurations. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. |
Created attachment 324732 [details] Logs from all 28 nodes and revolver Description of problem: While running recovery tests on a large cluster (28 nodes) the membership fell apart and nodes formed their own rings and would not re-form the 28 node cluster. In /var/log/messages I see: openais[2719]: [TOTEM] entering GATHER state from 11. This message repeats about 20 times then a configuration with just one node. I'm using the following parameters in cluster.conf: <totem token="30000" consensus="29000" join="5000" send_join="80"/> The attached logs are from two revolver scenarios. In scenario 1.3, one node less than quorum was shot by revolver with "reboot -fin," which completed recovery and passed. In scenario 1.4, one node more than quorum was shot and I hit the problem described above. Version-Release number of selected component (if applicable): cman-2.0.97-1.el5 openais-0.80.3-21.el5 How reproducible: I've run into this scenario many times before, but it probably takes a few tries to hit this. Actual results: Expected results: Additional info: