Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
+++ This bug was initially created as a clone of Bug #610234 +++
Description of problem:
When two systems using bonding devices in adaptive load
balancing (ALB) communicates with each other, an endless
ping-pong of ARP replies starts between these two systems.
What happens? In the ALB mode, bonding driver keeps track
of each client connected in a hash table, so it can do the
receive load balancing (RLB). This hash table is updated
when an ARP reply is received, then it scans for the client
entry, updates its MAC address and flag it to be announced
later. Therefore, two seconds later, the alb monitor runs
and send for each updated client entry two ARP replies
updating this specific client. The same process happens on
the receiving system, causing the endless ping-pong of arp
replies.
See more information including the relevant functions below:
System 1 System 2
bond0 bond0
ping <system2>
ARP request --------->
<--------- ARP reply
+->rlb_arp_recv <---------------------+ <--- loop begins
| rlb_update_entry_from_arp |
| client_info->ntt = 1; |
| bond_info->rx_ntt = 1; |
| |
| <communication succeed> |
| |
| bond_alb_monitor |
| rlb_update_rx_clients |
| rlb_update_client |
| arp_create(ARPOP_REPLY) |
| send ARP reply --------------> V
| send ARP reply -------------->
| rlb_arp_recv
| rlb_update_entry_from_arp
| client_info->ntt = 1;
| bond_info->rx_ntt = 1;
| < snipped, same as in system 1>
+------- <-------------- send ARP reply
<-------------- send ARP reply
Besides the unneeded networking traffic, this loop breaks
a cluster because a backup system can't take over the IP
address. There is always one system sending an ARP reply
poisoning the network.
This patch fixes the problem adding a check for the MAC
address before updating it. Thus, if the MAC address didn't
change, there is no need to update neither to announce it later.
Version-Release number of selected component (if applicable):
5.6
How reproducible:
Always
Steps to Reproduce:
Just ping one system from another with both running bonding in ALB mode and you can see the ARP being sent each 2 seconds between them.
Actual results:
It is impossible for a backup system to take over the IP address when there are other systems sending wrong ARP packets poisoning the network.
Expected results:
Don't send unneeded ARP packets
Additional info:
The patch is available upstream:
http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=42d782ac1bef7cbcdf05b857731345c6e8149f90
The patch is backported for RHEL5 and we are waiting for testing results.
Flavio, you did the leg work on this and you earned the internal credit too.
Please post this when you get positive feedback.
Comment 3RHEL Program Management
2010-10-05 01:54:41 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update release.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
http://rhn.redhat.com/errata/RHSA-2011-0542.html