875922 – [TOTEM] FAILED TO RECEIVE + corosync crash

Bug 875922 - [TOTEM] FAILED TO RECEIVE + corosync crash

Summary: [TOTEM] FAILED TO RECEIVE + corosync crash

Keywords:
Status:	CLOSED DUPLICATE of bug 636583
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	corosync
Sub Component:
Version:	17
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Jan Friesse
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-11-12 20:12 UTC by Ari Tilli
Modified:	2012-11-19 07:49 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Clone Of:	854216
Environment:
Last Closed:	2012-11-19 07:49:23 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	221143	0	None	None	None	2012-11-12 20:12:26 UTC

Description Ari Tilli 2012-11-12 20:12:26 UTC

corosync 2.0.2-1.fc17  

Similar bug was marked for RHEL 6. I just report it also for Fedora 17.

I have 4 Fedora 17 quest VMs running in Fedora 17 host.

All quests VMs have pacemaker and corosync running. 

The end result is that corosync crashes all the time in the quests,
in every 15-30 minutes.

Logging level is "info", so probably due to I/O "failed to receive" happens
regularly and crash occurs.

I am reporting this from home, the test cluster is at work and not
connected to net, so I will try to update more details later if needed.
However, the case IMO is similar to #221143, so maybe no info is needed.

Comment 1 Jan Friesse 2012-11-19 07:49:23 UTC

This is duplicate of 636583 and that BZ is now fixed.

But keep in mind that "failed to receive" is ALWAYS because of large number of lost multicast packets. Ensure (for example via omping) that your network is correctly configured, there are no iptables / switches / routers problems. So even CRASH itself is fixed, with huge number of lost multicast packets, you will still observe non perfect behavior.

Also you can try UDPU.

*** This bug has been marked as a duplicate of bug 636583 ***

Note You need to log in before you can comment on or make changes to this bug.