Bug 875922

Summary:	[TOTEM] FAILED TO RECEIVE + corosync crash
Product:	[Fedora] Fedora	Reporter:	Ari Tilli <ari.tilli>
Component:	corosync	Assignee:	Jan Friesse <jfriesse>
Status:	CLOSED DUPLICATE	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	17	CC:	agk, fdinitto, jfriesse, jpokorny, jruemker, sdake
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	854216	Environment:
Last Closed:	2012-11-19 07:49:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ari Tilli 2012-11-12 20:12:26 UTC

corosync 2.0.2-1.fc17  

Similar bug was marked for RHEL 6. I just report it also for Fedora 17.

I have 4 Fedora 17 quest VMs running in Fedora 17 host.

All quests VMs have pacemaker and corosync running. 

The end result is that corosync crashes all the time in the quests,
in every 15-30 minutes.

Logging level is "info", so probably due to I/O "failed to receive" happens
regularly and crash occurs.

I am reporting this from home, the test cluster is at work and not
connected to net, so I will try to update more details later if needed.
However, the case IMO is similar to #221143, so maybe no info is needed.

Comment 1 Jan Friesse 2012-11-19 07:49:23 UTC

This is duplicate of 636583 and that BZ is now fixed.

But keep in mind that "failed to receive" is ALWAYS because of large number of lost multicast packets. Ensure (for example via omping) that your network is correctly configured, there are no iptables / switches / routers problems. So even CRASH itself is fixed, with huge number of lost multicast packets, you will still observe non perfect behavior.

Also you can try UDPU.

*** This bug has been marked as a duplicate of bug 636583 ***