Bug 619496 - make corosync more resilient to delayed multicast packets
Summary: make corosync more resilient to delayed multicast packets
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync   
(Show other bugs)
Version: 6.0
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Steven Dake
QA Contact: Cluster QE
Keywords: ZStream
Depends On:
Blocks: 619536 638592
TreeView+ depends on / blocked
Reported: 2010-07-29 16:54 UTC by Steven Dake
Modified: 2016-04-26 15:20 UTC (History)
5 users (show)

Fixed In Version: corosync-1.2.3-22.el6
Doc Type: Bug Fix
Doc Text:
OpenAIS has been enabled to work in network environments wherein multicast messages are slightly delayed when compared to token messages.
Story Points: ---
Clone Of:
: 619536 (view as bug list)
Last Closed: 2011-05-19 14:24:04 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
patch that introduces the tuneable (6.00 KB, patch)
2010-07-29 18:06 UTC, Steven Dake
no flags Details | Diff

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0764 normal SHIPPED_LIVE corosync bug fix update 2011-05-18 18:08:44 UTC

Description Steven Dake 2010-07-29 16:54:11 UTC
Description of problem:
Many network switches use a software component to "emulate multicast" by sending a multicast to the switch.  Then the switch sends to every member of the igmp group.  This multicast has extra latency compared to the unicast token (I've measured about 200 usec).  When a processor receives a token, it adds all unreceived messages to a retransmit list.  These retransmits result in extra network bandwidth consumption, when in fact the multicast regular message is not lost, but just delayed.

Version-Release number of selected component (if applicable):

How reproducible:
seems 100% using Cisco infrastructure in RH IT labs

Steps to Reproduce:
1. start two node corosync cluster with totem configured to output debug info
2. run cpgbench
3. see retransmits occur

We can tell multicast is delayed by adding a small delay before transmitting the token.  Another mechanism is to use traffic shaping netem as follows to delay the token:
tc qdisc add dev eth0 root handle 1: prio
tc qdisc add dev eth0 parent 1:3 handle 30: netem delay 1ms
tc filter add dev eth0 protocol ip parent 1:0 prio 3 u32 match ip dst 10.16.144.
40/32 flowid 1:3

(note is the target of the next token).
Actual results:
when multicast is delayed, totem retransmits messages unnecessarily

Expected results:
no messages should be transmitted unnecessarily

Additional info:

Comment 1 Steven Dake 2010-07-29 16:55:07 UTC
For those that don't see this problem in their switches, it is possible to emulate via netem by changing the ip address above to the multicast address (hence introducing a 1ms multicast transmit delay).

Comment 2 Steven Dake 2010-07-29 18:06:42 UTC
Created attachment 435364 [details]
patch that introduces the tuneable

Comment 5 Douglas Silas 2011-01-11 23:11:46 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    New Contents:
OpenAIS has been enabled to work in network environments wherein multicast messages are slightly delayed when compared to token messages.

Comment 8 errata-xmlrpc 2011-05-19 14:24:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.