Description of problem: When using bonding on mode 0 with multicast IGMP is sent out via a single interface, registering the port with the switch for a particular multicast group. If there is a failure of the port/NIC/cable, IGMP is not sent out another port in the group, resulting in packets for the previously-registered multicast group not being correctly routed. This is not a problem with mode 1, which works as designed.
Just making some notes for myself... There are 7 modes of bonding: Mode 0 (round-robin) doesn't have a concept of a primary slave. Mode 1, 5 and 6 have a concept of a primary slave, so IGMP membership reports will be resent with the primary goes down. Mode 2 doesn't have a concept of a primary slave, but broadcast will transmit traffic on all interfaces so IGMP membership reports will go everywhere. Mode 3 and 4 don't need a concept of primary slave since all interfaces are seen as one logical interface on the connected switch. This really means that mode 0 is the only problematic mode. I could consider making a change and doing some tracking in the round robin xmit code. Other options include making one of the interfaces in this type considered to be primary.
Note to self: Consider using curr_active_slave or first_slave as the slave that sends any IGMP membership reports on RR bonds. If that slave goes down, resend the memership reports on the new curr_active_slave or the next active slave in the list when starting with the first slave. Small changes could be made in bond_change_active_slave around where bond_mc_swap is called, but in this case only bond_resend_igmp_join_requests would be called so the multicast traffic would flow on the first active slave in the bond.
More thoughts -- always send membership reports on the curr_active_slave when any links go down. Don't worry about doing it just when the curr_active_slave goes down as the extra state knowledge probably isn't worth it.
Created attachment 402091 [details] bonding-fixup-failure-with-igmp-and-round-robin.patch Proposed patch. This works with my testing, other feedback from you would be helpful. I will include this in my test kernels and should have some ready by tomorrow, but feel free to build this against any recent rhel5 kernel you would like.
New test kernels available here: http://people.redhat.com/agospoda/#rhel5/ Any feedback you can provide is greatly apprecaited.
If you could give me a ball-park estimate for when you would like to test this I would appreciate it. I'm hoping to push this upstream soon and extra validation would be nice. Thanks.
patch posted upstream: http://marc.info/?l=linux-netdev&m=126955324726981&w=2
The final patch for RHEL will be a bit different than the one currently included, but for testing purposes what you have should be fine.
I will work with Corey to see if we can test this soon. Thanks!
I'll be working on this today...
(In reply to comment #12) > I'll be working on this today... Sweet. You can find me on irc as 'gospo' if needed.
Here's what lon and I tested on the taft's 8 nic (eth2 - eth10) bonded interface: 1. pull eth2, verified that the cluster continued to run correctly 2. pull eth3 and eth4 simultaneously, verified that the cluster continued to run correctly 3. Plug all three back in, verified that the cluster continued to run correctly. We ran this scenario twice because the first time when we did step #3, that node got fenced. However it may have been a fluke because we were not able to reproduce that the second time.
Andy - I opened bug 577407 for the issue I saw when sequentially taking down all the slave NICs.
*** Bug 577407 has been marked as a duplicate of this bug. ***
The test in the description of bug 577407 should also be used as a test case when verifying this bug.
in kernel-2.6.18-200.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When a system was configured using channel bonding in "mode=0" (round-robin balancing) with multicast, IGMP traffic was transmitted via a single interface. If that interface failed (due to a port, NIC or cable failure, for example), IGMP was not transmitted via another port in the group, thus resulting in packets for the previously-registered multicast group not being routed correctly.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html