Description of problem:
corosync process crash randomly in one of the cluster members
Version-Release number of selected component (if applicable):
1.2.7 and 1.2.8
Last time was when I launched corosync-fplay on another member
Steps to Reproduce:
I've attached the output from 'corosync-fplay' on the node that crashed
could you attach the core file (/var/lib/corosync/core) and tell us which corosync version your using (rpm -qi coroysnc)
There's no core file in /var/lib/corosync
I'm using version 1.2.8
Created attachment 449147 [details]
Coredump of corosync
I've attached the complete dir generated by abrt of a corosync coredump
Created attachment 449148 [details]
Fplay ot the coredump
I've opened a new case with abrt : 636774
you were right!
I've solved the problem reconfiguring the multicast part of the switches
where the four hosts are attached. Two hosts are connected on one and the other
two on the other. The switches are of two different manufacturers and the configuration is different. The main resolution was to force the interface between
the switches to forward traffic for the specific multicast address.
Thank you for your support!
I'd be deeply indebted to you if you would try the attached patch in your environment (with the defective setup) and see if you continue to see aborts.
Created attachment 477496 [details]
patch which may fix the abort
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '13'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 13's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 13 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
*** Bug 629431 has been marked as a duplicate of this bug. ***
*** Bug 636774 has been marked as a duplicate of this bug. ***
I believe this bug was fixed in Corosync 1.4.x (and 1.3.x zstream). Closing as upstream.
Could you verify the patch in this bug is in upstream? If not, can you try this patch on Bug #854216?
Created attachment 632734 [details]
Proposed patch is now upstream as d4db2ea5353c8eedb64a88ae413c04e0757378c9 (or flatiron 81ff0e8c94589bb7139d89e573a75473cfc5d173)
*** Bug 875922 has been marked as a duplicate of this bug. ***