Red Hat Bugzilla – Bug 145551
Use of bonding driver in mode 5 can cause multicast packet loss
Last modified: 2007-11-30 17:07:06 EST
Description of problem:
Multicast packet loss was found when using multicast heartbeating in
clumanager_1.2.16 from the Red Hat Cluster Suite in the following
"no single point of failure" environment:
2 Dell 2650 nodes, each with 2 GigE NICs connected into
a dual Cisco Catalyst 3750 Series switch environment,
and 1 NIC from each node connected into a SAN.
By default, the Cisco 3750 switch uses IGMP snooping, which causes
multicast packets to only be delivered to hosts who have dynamically
joined a multicast group by sending an IGMP packet. The switch
uses a timeout to allow members to quietly leave the group.
With bonding mode 5, there is only 1 interface configured to receive
packets. If both Ethernet interfaces are up, the load balancing
capability in mode 5 can move the sending of IGMP packets to the 2nd
interface. If the switch ends up timing out the interface configured
to receive packets, the multicast packets are only then sent
to the 2nd interface, and dropped since that interface is not set up
to receive them.
Disabling IGMP snooping on the Cisco switches works around the
problem as it forces the switch to always send multicast packets
to all interfaces on the switch. This workaround isn't desirable
for performance reasons though, and some switches may not offer
the ability to disable IGMP snooping.
Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux WS release 3 (Taroon Update 2)
Red Hat Enterprise Linux AS release 3.90 (Nahant)
This problem can be recreated by sending multicast packets
between two hosts with enough traffic on the interfaces to
force the bonding driver to load balance between them.
Steps to Reproduce:
1. Connect hosts into a dual switch environment for no single
point of failure, switch must have IGMP snooping enabled
2. Send traffic between hosts to create a load so that
bonding driver load balances between 2 interfaces
3. Send multicast packets between 2 hosts
If the switch has timed out the interface configured by
the bonding driver to receive packets, multicast packets are
only sent to an interface that cannot receieve them, and dropped.
no multicast packet loss
Created attachment 109979 [details]
I don't really see any way to "fix" this and to maintain the design of
mode 5 of the bonding module. By definition:
balance-tlb or 5
Adaptive transmit load balancing: channel bonding that does
not require any special switch support. The outgoing
traffic is distributed according to the current load
(computed relative to the speed) on each slave. Incoming
traffic is received by the current slave. If the receiving
slave fails, another slave takes over the MAC address of
the failed receiving slave.
Have you tried mode 6 (balance-alb)? I think this has a better chance
of working for you, although I'll have to do more research to know for
sure that transmits and receives will be balanced to the same link for
a given partner host.
Give it a shot and let me know the results? I don't have a switch w/
the IGMP snooping behaviour at my disposal...
Thanks for the help so far and the quick response!
I checked, and I have the same problem with mode 6.
multicast packets can be lost if the switch has IGMP snooping
enabled. (With mode 6 in Redhat EL3, the multicast mode is
still set to "active slave only.")
I realize that there may be no easy "fix" for this problem. I
guess I think of it as a problem with the design of mode 5 and
support of multicast. I'm not sure if you want to start identifying
packets to ensure that IGMP packets are sent down the same interface
that is configured to receive multicast packets. If you want to
say that multicast isn't supported with mode 5, or that switch
modification is possibly required, that's fine. We just wanted
to make you aware of the problem, and wanted to be advised as
to how we should be using mode 5 of the bonding driver with
Mode 5 is documented as:
"Adaptive transmit load balancing: channel bonding that does
not require any special switch support."
This is a bit misleading, since many switches do support
IGMP snooping, and have it enabled by default. In order to
use multicast with bonding mode 5, we need to modify our switch
I have to agree with you that the design of mode 5 (and 6) seems a
little fragile. I'll try to do some more research to see if the
maintainers have considered and/or dealt with this type of situation.
A patch specific to IGMP may even be appropriate.
It seems to me that mode 1 or (w/ appropriate switch support) mode 3
would fulfill your needs. If not, please elaborate?
I would really appreciate any futher research you could do to see if
the maintainers know about, and possibly have plans to address this
situation. Thanks so much for your help so far!
We are hoping to be able to support a "no single point of failure"
system using the Ethernet bonding driver with no switch support, and
get the added benefit of increased aggregate bandwidth when possible.
From the documentation, it looked like mode 5 would provide what we
are looking for. Resilience has a higher priority than performance
for us, but performance is important too, so I think we'll continue to
use mode 5 with IGMP snooping disabled at the switch and hope that we
can turn IGMP snooping on again at some point. Falling back to mode 1
is also an option if we find that disabling IGMP snooping is causing
too many performance problems.
I put together a patch which forces IGMP transmits to go out the
primary bonding port. I have built a test kernel w/ that patch and
made it available here:
(That page is new -- feel free to give feedback!)
Please give that a try and let me know the results. Thanks!
Heidi, any word on the results with this patch?
Patch posted upstream on 3/15...
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.2.EL).
(In reply to comment #12)
> A fix for this problem has just been committed to the RHEL3 U6
> patch pool this evening (in kernel version 2.4.21-32.2.EL).
Is there just a patch apart from the kernel higher revision
that can fix this issue. We are seeing similar issues where multicast
packets are being dropped by the interfaces while being bonded.
Simple network stats does not report that interfaces are dropping traffic
but the application and messaging daemons report inbound packetloss
Created attachment 115194 [details]
This is the patch in question...
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.