Bug 971893

Summary: bonding balance-tlb or balance-alb mode sending tons of null LLC packets.
Product: [Fedora] Fedora Reporter: Nitin Sharma <nitinics>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 18CC: dcbw, gansalmon, itamar, jbyers, jklimes, jonathan, kernel-maint, madhu.chinakonda, nhorman, nitinics
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.10.13-101.fc18 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-10-01 01:57:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
2nd slave Pcap
none
1st slave Pcap
none
patch to make learning packet interval configurable
none
[PATCH] bonding: Make alb learning packet interval configurable none

Description Nitin Sharma 2013-06-07 13:59:52 UTC
Description of problem:
Tons of same source mac and destination mac packets, length of 0x60 with NULL payload on each slave interfaces of the bonding interface.

Version-Release number of selected component (if applicable):
NetworkManager (version 0.9.7.0-12)
igb: Intel(R) Gigabit Ethernet Network Driver - version 4.1.2-k
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

How reproducible:
Always

Steps to Reproduce:
1.
Configure bonding with the following, and watch one of the interface with tcpdump.

modprobe bonding mode=balance-alb miimon=100
ifconfig bond0 192.168.30.3 netmask 255.255.255.0 up
ifenslave bond0 p255p1
ifenslave bond0 p255p2

[root@localhost ~]# tcpdump -nXXvvvv -i p255p1
tcpdump: listening on p255p1, link-type EN10MB (Ethernet), capture size 65535 bytes
17:01:45.058184 00:25:90:c0:bb:36 > 00:25:90:c0:bb:36 Null Information, send seq 0, rcv seq 0, Flags [Command], length 46
        0x0000:  0025 90c0 bb36 0025 90c0 bb36 0060 0000  .%...6.%...6.`..
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000            ............
17:01:45.058198 00:25:90:c0:bb:36 > 00:25:90:c0:bb:36 Null Information, send seq 0, rcv seq 0, Flags [Command], length 46
        0x0000:  0025 90c0 bb36 0025 90c0 bb36 0060 0000  .%...6.%...6.`..
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000            ............
17:01:45.058200 00:25:90:c0:bb:36 > 00:25:90:c0:bb:36 Null Information, send seq 0, rcv seq 0, Flags [Command], length 46
        0x0000:  0025 90c0 bb36 0025 90c0 bb36 0060 0000  .%...6.%...6.`..
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000            ............

2.
3.

Actual results:

Expected results:

Additional info:

Comment 1 Nitin Sharma 2013-06-08 20:41:05 UTC
Looks like these are LLC packets, but are all NULL LLC packets. I am trying to understand why this would be sent for balance-alb or tlb, as no reference to LLC is made on the bonding alb source code. Is this expected?

Thanks
Nitin

Comment 2 Neil Horman 2013-07-19 13:52:28 UTC
Please attach the binary tcpdump output, and the output of the command 'ip addr show'

Comment 3 Nitin Sharma 2013-08-05 13:38:35 UTC
Created attachment 782842 [details]
2nd slave Pcap

Comment 4 Nitin Sharma 2013-08-05 13:39:44 UTC
Created attachment 782843 [details]
1st slave Pcap

Comment 5 Nitin Sharma 2013-08-05 13:40:57 UTC
[root@localhost ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: p255p1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:25:90:c0:bb:d5 brd ff:ff:ff:ff:ff:ff
3: p255p2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000
    link/ether 00:25:90:c0:bb:d4 brd ff:ff:ff:ff:ff:ff
4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 00:25:90:c0:bb:d4 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.17/26 brd 10.0.0.63 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::225:90ff:fec0:bbd4/64 scope link
       valid_lft forever preferred_lft forever


[root@localhost ~]# tcpdump -s 0 -i p255p1 -w firstslave.pcap
tcpdump: WARNING: p255p1: no IPv4 address assigned
tcpdump: listening on p255p1, link-type EN10MB (Ethernet), capture size 65535 bytes
^C36 packets captured
37 packets received by filter
0 packets dropped by kernel


[root@localhost ~]# tcpdump -s 0 -i p255p2 -w secondslave.pcap
tcpdump: WARNING: p255p2: no IPv4 address assigned
tcpdump: listening on p255p2, link-type EN10MB (Ethernet), capture size 65535 bytes
^C38 packets captured
38 packets received by filter
0 packets dropped by kernel

Comment 6 Neil Horman 2013-08-05 15:56:32 UTC
What are these ports connected to, and in what bonding mode?  The frames also have the same source and destination mac addresses, which suggest they are receiving their own frames.  Does this happen if you use each interface individually? i.e. without bonding them?

Comment 7 Nitin Sharma 2013-08-05 16:04:34 UTC
(In reply to Neil Horman from comment #6)
> What are these ports connected to, and in what bonding mode?  The frames
> also have the same source and destination mac addresses, which suggest they
> are receiving their own frames.  Does this happen if you use each interface
> individually? i.e. without bonding them?

Ports are connected to a switch. It is on bonding-mode balance-alb. They have the same src and dst mac-addr and these packets don't show up when not bonding. Also does not happen on active-backup bonding mode. These could be for the switch to learn the src-mac of the interface (that could change as per ALB implementation), however the frequency with which these happen is not tunable. Ideally, in our implementation we don't need to learn the src mac-addr with these NULL packets, rather we use a different host learning approach. So It would be great if we could have the frequency of when these events occur in the code tunable.

Thanks
Nitin

Comment 8 Neil Horman 2013-08-15 16:07:12 UTC
You may be right, these do look like learning packets, in that it appears that their length of 96 (hex 0x60) should be their ethertype (which is ETH_P_LOOP), but somehow it is getting interpreted as an 802.3 ethernet frame with a length of 96.  I'll see if I can recreate this and fix it up.

Comment 9 Neil Horman 2013-08-15 19:11:37 UTC
wait a second, is there anything else going on here?  i.e. Are you actually having any other problems with bonding in alb mode?  I ask because the more I look at it, the more this clearly needs cleaning up and consolidation, but there doesn't appear to be anything wrong with this frame.  I'm starting to think that its just wireshark that can't read the frame properly.  I'll clean up the code, but unless you're looking at something else being a problem, I think this is notabug.

Comment 10 Nitin Sharma 2013-08-16 02:17:58 UTC
Correct. It is not a bug, rather a feature request. To be able to tune the below frequency using /sys/class/net dynamically , i.e.

 39 #define BOND_ALB_LP_INTERVAL        1   /* In seconds, periodic send of
 40                                          * learning packets to the switch
 41                                          */

The issue I was facing was specifically with Openflow Switch implementation. This packet is sent very often to the controller for learning (as it is supposed to) taking much of the OF Controller traffic. Ideally this mechanism seems to be used to speedup learning on the switch only in case of failover events. However, it is sent periodic as per the implementation.

Comment 11 Neil Horman 2013-08-16 13:24:59 UTC
ah!  I'm sorry, I wish you would have said that earlier.  Ok, yeah, I can look into doing that.  I imagine we can just make it a module parameter, and you can adjust it on the fly via sysfs then.

Comment 12 Neil Horman 2013-08-16 14:51:22 UTC
Created attachment 787298 [details]
patch to make learning packet interval configurable

ehre you go, this exports the alb learning interval as a module parameter, modifyable in sysfs.  Please give it a try and let me know if it suits your needs.

Comment 13 Nitin Sharma 2013-09-04 15:50:27 UTC
[root@localhost kernel-bond]# modinfo bonding | grep alb
parm:           mode:Mode of operation; 0 for balance-rr, 1 for active-backup, 2 for balance-xor, 3 for broadcast, 4 for 802.3ad, 5 for balance-tlb, 6 for balance-alb (charp)
parm:           alb_lp_interval:The interval on which learning packets are sent in ALB mode (int)

I did apply the patch, however the frequency is still 3 lp burst per second. The configuration was added in 
BONDING_OPTS="miimon=50 mode=balance-alb alb_lp_interval=60" on ifcfg-bond0

Any other way to validate?

Comment 14 Neil Horman 2013-09-04 17:06:07 UTC
you can use /sys/modules/bonding/paramters/alb_lp_interval to ensure that your settings got picked up properly.  It sounds like they may not have.

Comment 15 Nitin Sharma 2013-09-04 18:23:59 UTC
Sorry, my bonding module was compiled into the kernel, so couldn't do it on runtime. So had to change it from the Kernel CMD line. And it works as expected.

Thanks for your help

Comment 16 Neil Horman 2013-09-04 19:19:34 UTC
Understood, thanks.  I'm not sure this will get accepted upstream, but I'll propose it and see how it flies.

Comment 17 Neil Horman 2013-09-09 14:57:53 UTC
Created attachment 795648 [details]
[PATCH] bonding: Make alb learning packet interval configurable


running bonding in ALB mode requires that learning packets be sent periodically,
so that the switch knows where to send responding traffic.  However, depending
on switch configuration, there may not be any need to send traffic at the
default rate of 3 packets per second, which represents little more than wasted
data.  Allow the ALB learning packet interval to be made configurable via sysfs

Signed-off-by: Neil Horman <nhorman>
---
 drivers/net/bonding/bond_alb.c   |  2 +-
 drivers/net/bonding/bond_alb.h   |  8 ++++----
 drivers/net/bonding/bond_main.c  |  1 +
 drivers/net/bonding/bond_sysfs.c | 39 +++++++++++++++++++++++++++++++++++++++
 drivers/net/bonding/bonding.h    |  1 +
 5 files changed, 46 insertions(+), 5 deletions(-)

Comment 18 Neil Horman 2013-09-09 15:02:10 UTC
I'm sorry, could I ask you to test out this alternate patch I've written.  I like it better than my first pass as it allows per-device configuration via sysfs.

Thanks!

Comment 19 Nitin Sharma 2013-09-10 13:27:30 UTC
Not a problem. I applied it and validated.

echo 60 > /sys/class/net/bond0/bonding/lp_interval

Thanks

Comment 20 Neil Horman 2013-09-10 14:17:08 UTC
thanks, posted for review:
http://marc.info/?l=linux-netdev&m=137882251119752&w=2

Comment 21 Neil Horman 2013-09-23 14:53:54 UTC
Fixed in the next F18 kernel release.

Comment 22 Josh Boyer 2013-09-23 15:16:07 UTC
(In reply to Neil Horman from comment #21)
> Fixed in the next F18 kernel release.

This should be applicable to F19 and F20 as well, right?  (3.11.y based)

Comment 23 Neil Horman 2013-09-23 15:22:46 UTC
it is, but I figured that F19 are still planning  updates to 3.12, and they'd get this fix automatically, won't they?

Comment 24 Josh Boyer 2013-09-23 15:24:38 UTC
(In reply to Neil Horman from comment #23)
> it is, but I figured that F19 are still planning  updates to 3.12, and
> they'd get this fix automatically, won't they?

Yes, but not for quite a while.  3.12 is only at -rc1 now.  I'll cherry-pick your fix.  It's easy enough to drop the patch when we do wind up rebasing, and there's no reason to not carry it.

Comment 25 Fedora Update System 2013-09-27 21:42:34 UTC
kernel-3.11.2-201.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/kernel-3.11.2-201.fc19

Comment 26 Fedora Update System 2013-09-28 20:45:12 UTC
kernel-3.10.13-101.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/kernel-3.10.13-101.fc18

Comment 27 Fedora Update System 2013-09-29 01:21:50 UTC
Package kernel-3.11.2-201.fc19:
* should fix your issue,
* was pushed to the Fedora 19 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.11.2-201.fc19'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-17865/kernel-3.11.2-201.fc19
then log in and leave karma (feedback).

Comment 28 Fedora Update System 2013-09-30 16:28:07 UTC
kernel-3.11.2-301.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.11.2-301.fc20

Comment 29 Fedora Update System 2013-10-01 01:57:14 UTC
kernel-3.11.2-201.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 30 Fedora Update System 2013-10-02 06:36:07 UTC
kernel-3.11.2-301.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 31 Fedora Update System 2013-10-03 01:11:29 UTC
kernel-3.10.13-101.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.