Bug 875922

Summary: [TOTEM] FAILED TO RECEIVE + corosync crash
Product: [Fedora] Fedora Reporter: Ari Tilli <ari.tilli>
Component: corosyncAssignee: Jan Friesse <jfriesse>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 17CC: agk, fdinitto, jfriesse, jpokorny, jruemker, sdake
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 854216 Environment:
Last Closed: 2012-11-19 07:49:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ari Tilli 2012-11-12 20:12:26 UTC
corosync 2.0.2-1.fc17  

Similar bug was marked for RHEL 6. I just report it also for Fedora 17.

I have 4 Fedora 17 quest VMs running in Fedora 17 host.

All quests VMs have pacemaker and corosync running. 

The end result is that corosync crashes all the time in the quests,
in every 15-30 minutes.

Logging level is "info", so probably due to I/O "failed to receive" happens
regularly and crash occurs.

I am reporting this from home, the test cluster is at work and not
connected to net, so I will try to update more details later if needed.
However, the case IMO is similar to #221143, so maybe no info is needed.

Comment 1 Jan Friesse 2012-11-19 07:49:23 UTC
This is duplicate of 636583 and that BZ is now fixed.

But keep in mind that "failed to receive" is ALWAYS because of large number of lost multicast packets. Ensure (for example via omping) that your network is correctly configured, there are no iptables / switches / routers problems. So even CRASH itself is fixed, with huge number of lost multicast packets, you will still observe non perfect behavior.

Also you can try UDPU.

*** This bug has been marked as a duplicate of bug 636583 ***