Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 678782[details]
/var/log/messages
Description of problem:
techpreview multi-homing (redundant-ring) support for corosync does PREVENT cluster STARTUP if any of the two redundant rings is missing.
during runtime everything is fine, any of the two rings may fail and is recovered when it returns automatically.
but cluster startup is completely blocked if one of the two rings is down,
no matter if name or altname path is gone.
Version-Release number of selected component (if applicable):
corosync 1.4.1-7.el6_3.1.x86_64
How reproducible:
-) activate redundant ring support with altname and altmulticast.
-) service cman start
-) startup of cman fails at its last point:
'Waiting for quorum... Timed-out waiting for cluster'.
Actual results:
-) cman startup reports error and shutsdown again, not even attempting any startup fencing.
Expected results:
-) cman/corosync startup uses the remaining active ring to successfully startup cluster.
Additional info:
the message log of the failing startup is attached,
corosync unfortunately logs nothing about the missing second ring.
Daniel,
what you mean by "any of the two redundant rings is missing"? Like interface is not up? It's blocked on firewall?
I was trying to simulate switch failure (VMs and bridge is not forwarding packets) and it works.
(In reply to comment #3)
> what you mean by "any of the two redundant rings is missing"? Like interface
> is not up? It's blocked on firewall?
jan,
"ifconfig down vnetX" of the heartbeat virtual interface of the node in the hypervisor.
it seems like this is similar to rhbz#787789... just a black hole.
unfortunately no time to test this personally anymore.
Created attachment 678782 [details] /var/log/messages Description of problem: techpreview multi-homing (redundant-ring) support for corosync does PREVENT cluster STARTUP if any of the two redundant rings is missing. during runtime everything is fine, any of the two rings may fail and is recovered when it returns automatically. but cluster startup is completely blocked if one of the two rings is down, no matter if name or altname path is gone. Version-Release number of selected component (if applicable): corosync 1.4.1-7.el6_3.1.x86_64 How reproducible: -) activate redundant ring support with altname and altmulticast. -) service cman start -) startup of cman fails at its last point: 'Waiting for quorum... Timed-out waiting for cluster'. Actual results: -) cman startup reports error and shutsdown again, not even attempting any startup fencing. Expected results: -) cman/corosync startup uses the remaining active ring to successfully startup cluster. Additional info: the message log of the failing startup is attached, corosync unfortunately logs nothing about the missing second ring.