Bug 875922 - [TOTEM] FAILED TO RECEIVE + corosync crash
Summary: [TOTEM] FAILED TO RECEIVE + corosync crash
Status: CLOSED DUPLICATE of bug 636583
Alias: None
Product: Fedora
Classification: Fedora
Component: corosync
Version: 17
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
Assignee: Jan Friesse
QA Contact: Fedora Extras Quality Assurance
Depends On:
TreeView+ depends on / blocked
Reported: 2012-11-12 20:12 UTC by Ari Tilli
Modified: 2012-11-19 07:49 UTC (History)
6 users (show)

Clone Of: 854216
Last Closed: 2012-11-19 07:49:23 UTC

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 221143 None None None 2012-11-12 20:12:26 UTC

Description Ari Tilli 2012-11-12 20:12:26 UTC
corosync 2.0.2-1.fc17  

Similar bug was marked for RHEL 6. I just report it also for Fedora 17.

I have 4 Fedora 17 quest VMs running in Fedora 17 host.

All quests VMs have pacemaker and corosync running. 

The end result is that corosync crashes all the time in the quests,
in every 15-30 minutes.

Logging level is "info", so probably due to I/O "failed to receive" happens
regularly and crash occurs.

I am reporting this from home, the test cluster is at work and not
connected to net, so I will try to update more details later if needed.
However, the case IMO is similar to #221143, so maybe no info is needed.

Comment 1 Jan Friesse 2012-11-19 07:49:23 UTC
This is duplicate of 636583 and that BZ is now fixed.

But keep in mind that "failed to receive" is ALWAYS because of large number of lost multicast packets. Ensure (for example via omping) that your network is correctly configured, there are no iptables / switches / routers problems. So even CRASH itself is fixed, with huge number of lost multicast packets, you will still observe non perfect behavior.

Also you can try UDPU.

*** This bug has been marked as a duplicate of bug 636583 ***

Note You need to log in before you can comment on or make changes to this bug.