RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 473102 - Nodes GATHER but don't form a configuration
Summary: Nodes GATHER but don't form a configuration
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-26 15:20 UTC by Nate Straz
Modified: 2016-04-26 14:15 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-16 04:18:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs from all 28 nodes and revolver (11.95 MB, application/x-gzip)
2008-11-26 15:20 UTC, Nate Straz
no flags Details

Description Nate Straz 2008-11-26 15:20:17 UTC
Created attachment 324732 [details]
Logs from all 28 nodes and revolver

Description of problem:

While running recovery tests on a large cluster (28 nodes) the membership fell apart and nodes formed their own rings and would not re-form the 28 node cluster.  In /var/log/messages I see:

openais[2719]: [TOTEM] entering GATHER state from 11.

This message repeats about 20 times then a configuration with just one node.

I'm using the following parameters in cluster.conf:
  <totem token="30000" consensus="29000" join="5000" send_join="80"/>

The attached logs are from two revolver scenarios.  In scenario 1.3, one node less than quorum was shot by revolver with "reboot -fin," which completed recovery and passed.  In scenario 1.4, one node more than quorum was shot and I hit the problem described above.  

Version-Release number of selected component (if applicable):
cman-2.0.97-1.el5
openais-0.80.3-21.el5


How reproducible:
I've run into this scenario many times before, but it probably takes a few tries to hit this.

Actual results:


Expected results:


Additional info:

Comment 1 Nate Straz 2008-12-01 18:15:07 UTC
Putting this on the 5.4 radar so we can support large configurations.

Comment 4 RHEL Program Management 2009-06-16 04:18:19 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.