Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 895549

Summary: corosync rrp multi-homing cluster startup fails with one ring down
Product: Red Hat Enterprise Linux 6 Reporter: Daniel Peess <dpeess>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED CANTFIX QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: medium    
Version: 6.3CC: mtessun, sdake
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-05-07 12:13:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages none

Description Daniel Peess 2013-01-15 13:58:44 UTC
Created attachment 678782 [details]
/var/log/messages

Description of problem:
techpreview multi-homing (redundant-ring) support for corosync does PREVENT cluster STARTUP if any of the two redundant rings is missing.
during runtime everything is fine, any of the two rings may fail and is recovered when it returns automatically.
but cluster startup is completely blocked if one of the two rings is down,
no matter if name or altname path is gone.

Version-Release number of selected component (if applicable):
corosync 1.4.1-7.el6_3.1.x86_64

How reproducible:
-) activate redundant ring support with altname and altmulticast.
-) service cman start
-) startup of cman fails at its last point:
   'Waiting for quorum... Timed-out waiting for cluster'.
  
Actual results:
-) cman startup reports error and shutsdown again, not even attempting any startup fencing.

Expected results:
-) cman/corosync startup uses the remaining active ring to successfully startup cluster.

Additional info:
the message log of the failing startup is attached,
corosync unfortunately logs nothing about the missing second ring.

Comment 3 Jan Friesse 2013-04-16 12:37:59 UTC
Daniel,
what you mean by "any of the two redundant rings is missing"? Like interface is not up? It's blocked on firewall?

I was trying to simulate switch failure (VMs and bridge is not forwarding packets) and it works.

Comment 4 Jan Friesse 2013-04-16 12:39:14 UTC
Also you may be hitting similar problem as was rhbz#787789 so maybe it's fixed in 6.4

Comment 5 Daniel Peess 2013-05-07 12:07:19 UTC
(In reply to comment #3)
> what you mean by "any of the two redundant rings is missing"? Like interface
> is not up? It's blocked on firewall?

jan,
"ifconfig down vnetX" of the heartbeat virtual interface of the node in the hypervisor.
it seems like this is similar to rhbz#787789... just a black hole.
unfortunately no time to test this personally anymore.

Comment 6 Jan Friesse 2013-05-07 12:13:04 UTC
Ok,
so I will close this bug and please reopen when hit you will hit it again.