This service will be undergoing maintenance at 03:30 UTC, 2016-05-27. It is expected to last about 2 hours
Bug 699083 - The be2net driver in linux kernel 2.6.38.3-17 is not working as it should with tagged VLAN ifaces
The be2net driver in linux kernel 2.6.38.3-17 is not working as it should wit...
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
15
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-22 20:00 EDT by Iliyan Stoyanov
Modified: 2012-07-11 13:51 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-11 13:51:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Iliyan Stoyanov 2011-04-22 20:00:25 EDT
Description of problem:
It seems that the be2net driver is not working properly with VLAN interfaces.

Version-Release number of selected component (if applicable):
Tested with the following kernel versions:
2.6.35.11-83.fc14.x86_64 - working
2.6.37-2.fc14.x86_64 - not working
2.6.38-0.rc8.git4.1.fc14.x86_64 - not working
2.6.38.3-15.rc1.fc14.x86_64 - not working
2.6.38.3-17.fc14 - not working

How reproducible:
Always on 3 different HP PL 465 G7 server blades.

Steps to Reproduce:
1. Take the FC15 kernel from koji
2. rpmbuild --rebuild the kernel
3. install
4. reboot
5. try using tagged vlan interface with bonding and bridgin
  
Actual results:
VLAN traffic not working

Expected results:
VLAN traffic working

Additional info:
I know this is not a proper bug report as this doesn't exactly fit neither F15, F14 or rawhide, but I would like to report it, as it seems this is either a bug or a regression in the be2net driver. I also realize that this might be problem totally unrelated to Fedora but the upstream kernel, but either way I decided to report it here.

The situation is as follows, I have EmulexOneConnect 10Gb cards that I use inside Fedora 14, which I use of libvirt/kvm host. I do the following:

eth0+eth1=bond0 (active-backup with miimon=100) eth0 always active
eth0.2+eth1.2=bond1 (active-backup with miimon=100) eth0.2 always active
eth0.192+eth1.192=bond2 (active-backup with miimon=100) eth0.192 always active

br0=bond0+bridged interfaces from KVM guests (no stp) and I assign an IP to br0
br1=bond1+bridged interfaces from KVM guests (no stp) and I assign an IP to br1
bond2 and I assign an IP from 192.168.xxx.xxx/24 network that I use for libvirt communication and migration of VMs between hosts.

This setup is working like a charm inside Fedora 14 kernel i.e. 2.6.35.11-83.fc14.x86_64, however since I started rebuilding the rawhide/F15 kernels on this machine all network traffic from tagged interfaces is not working. From what I found out the problem might be the following:

with kernel 2.6.35.11-83.fc14.x86_64
[02:18]root@hp5:~ # ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: off
tx-vlan-offload: off
ntuple-filters: off
receive-hashing: off

[02:47]root@hp5:~ # ethtool -i eth0
driver: be2net
version: 2.102.147u
firmware-version: 2.102.517.7
bus-info: 0000:04:00.0

with any other kernel from the ones I listed:

[02:33]root@hp6:~ # ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

[02:42]root@hp6:~ # ethtool -i eth0
driver: be2net
version: 2.103.175u
firmware-version: 2.102.517.7
bus-info: 0000:04:00.0

The problem seems to be the activated rx-vlan-offload & tx-vlan-offload, however ethtool -K eth0 rxvlan off results in:

[02:46]root@hp6:~ # ethtool -K eth0 rxvlan off
Cannot set device flag settings: Operation not supported
as also this one:
[02:46]root@hp6:~ # ethtool -K eth0 txvlan off
Cannot set device flag settings: Operation not supported

Other info. adding vlan on top of bond0 does not work either, no traffic is going through it either, also no traffic is going through vlan tagged eth0 interfaces either, so the problem is not in bonding and/or bridging driver.

There seems to be even newer driver from Emulex version 2.104.225.7, however I cannot compile it on Fedora, as it is targeted towards RHEL6/SL6/CentOS6 kernel (v. 2.6.32) and rpmbuild --rebuild fails with:

[02:55]root@hp6:~ # rpmbuild --rebuild hp-be2net-2.104.225.7-1.src.rpm 
Installing hp-be2net-2.104.225.7-1.src.rpm
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.JdS7Y0
+ umask 022
+ cd /root/rpmbuild/BUILD
+ LANG=C
+ export LANG
+ unset DISPLAY
+ cd /root/rpmbuild/BUILD
+ rm -rf hp-be2net-2.104.225.7
+ /usr/bin/gzip -dc /root/rpmbuild/SOURCES/hp-be2net-2.104.225.7.tar.gz
+ /bin/tar -xvvf -
drwxr-xr-x root/root         0 2011-01-31 18:41 hp-be2net-2.104.225.7/
-rw-r--r-- root/root       210 2011-01-31 18:41 hp-be2net-2.104.225.7/Makefile
-rw-r--r-- root/root     12948 2011-01-31 18:41 hp-be2net-2.104.225.7/be.h
-rw-r--r-- root/root     14794 2011-01-31 18:41 hp-be2net-2.104.225.7/be_compat.h
-rw-r--r-- root/root      1294 2011-01-31 18:41 hp-be2net-2.104.225.7/version.h
-rw-r--r-- root/root     12771 2011-01-31 18:41 hp-be2net-2.104.225.7/be_hw.h
-rw-r--r-- root/root     19939 2011-01-31 18:41 hp-be2net-2.104.225.7/be_ethtool.c
-rw-r--r-- root/root      2616 2011-01-31 18:41 hp-be2net-2.104.225.7/be_misc.c
-rw-r--r-- root/root     30627 2011-01-31 18:41 hp-be2net-2.104.225.7/be_cmds.h
-rw-r--r-- root/root     49096 2011-01-31 18:41 hp-be2net-2.104.225.7/be_cmds.c
-rw-r--r-- root/root     14868 2011-01-31 18:41 hp-be2net-2.104.225.7/be_compat.c
-rw-r--r-- root/root     87577 2011-01-31 18:41 hp-be2net-2.104.225.7/be_main.c
-rw-r--r-- root/root     15180 2011-01-31 18:41 hp-be2net-2.104.225.7/be_proc.c
-rw-r--r-- root/root     18693 2011-01-31 18:41 hp-be2net-2.104.225.7/COPYING
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd hp-be2net-2.104.225.7
+ /bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ set -- COPYING Makefile be.h be_cmds.c be_cmds.h be_compat.c be_compat.h be_ethtool.c be_hw.h be_main.c be_misc.c be_proc.c version.h
+ mkdir source
+ mv COPYING Makefile be.h be_cmds.c be_cmds.h be_compat.c be_compat.h be_ethtool.c be_hw.h be_main.c be_misc.c be_proc.c version.h source/
+ mkdir obj
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.VQ5y0G
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd hp-be2net-2.104.225.7
+ LANG=C
+ export LANG
+ unset DISPLAY
+ export 'EXTRA_CFLAGS=-DVERSION=\"2.104.225.7\"'
+ EXTRA_CFLAGS='-DVERSION=\"2.104.225.7\"'
+ for flavor in default
+ rm -rf obj/default
+ cp -r source obj/default
+ export SRC=/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default
+ SRC=/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default
++ '[' default = default ']'
+ make -C /usr/src/kernels/2.6.38.3-17.fc14-x86_64 modules M=/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default CONFIG_BE2NET=m
make: Entering directory `/usr/src/kernels/2.6.38.3-17.fc14.x86_64'
  CC [M]  /root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_main.o
In file included from /root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be.h:38:0,
                 from /root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_main.c:18:
/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_compat.h:519:28: error: redefinition of 'skb_headlen'
include/linux/skbuff.h:1095:28: note: previous definition of 'skb_headlen' was here
In file included from /root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_main.c:18:0:
/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be.h:307:14: error: 'VLAN_GROUP_ARRAY_LEN' undeclared here (not in a function)
In file included from /root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_main.c:19:0:
/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_cmds.h:1070:11: warning: 'struct dev_mc_list' declared inside parameter list
/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_cmds.h:1070:11: warning: its scope is only this definition or declaration, which is probably not what you want
/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_main.c: In function 'be_set_multicast_list':
/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_main.c:709:44: error: 'struct net_device' has no member named 'mc_count'
/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_main.c:715:58: error: 'struct net_device' has no member named 'mc_list'
/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_main.c:716:9: error: 'struct net_device' has no member named 'mc_count'
make[1]: *** [/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default/be_main.o] Error 1
make: *** [_module_/root/rpmbuild/BUILD/hp-be2net-2.104.225.7/obj/default] Error 2
make: Leaving directory `/usr/src/kernels/2.6.38.3-17.fc14.x86_64'
error: Bad exit status from /var/tmp/rpm-tmp.VQ5y0G (%build)

and I'm not knowledgeable enough to hack the driver code so I can make it compile against 2.6.37/8 kernels. It doesn't compile against FC14 original 2.6.35 kernel either as from what I've gathered the network calls have changed somewhere after 2.6.32.

I found out over the interwebs that other people also had similar problems with vlan offloading albeit on VMware's ESX thingy, which from what I know is RH based and after Emulex provided updated driver that allowed turning off the vlan offloading, everything was working for them so this is one more reason I'm inclined to think that the VLAN offloading is the reason.
Comment 2 Iliyan Stoyanov 2011-04-23 09:30:24 EDT
Just recompiled 2.6.38.3-17 with the proposed patch. Still doesn't work. Not traffic moving through the tagged interfaces. Will try with the 2.6.39 rc kernel from rawhide and -18 kernel with this patch.
Comment 3 Iliyan Stoyanov 2011-04-23 10:12:29 EDT
Here is the status with 2.6.39-0.rc4.git2.0.fc14.x86_64

[16:57]root@hp6:~ # ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off
[17:03]root@hp6:~ # ethtool  -i eth0
driver: be2net
version: 4.0.100u
firmware-version: 2.102.517.7
bus-info: 0000:04:00.0

and the dmesg is full of errors about eth0 and eth1 that look like this:
[  220.391566] be2net 0000:04:00.0: Error in cmd completion - opcode 121, compl 2, extd 30
[  221.217569] be2net 0000:04:00.1: Error in cmd completion - opcode 121, compl 2, extd 30
[  221.403566] be2net 0000:04:00.0: Error in cmd completion - opcode 121, compl 2, extd 30
[  222.231481] be2net 0000:04:00.1: Error in cmd completion - opcode 121, compl 2, extd 30


Again I'm not that knowledgeable in programming to find out what this opcode means from the source of the driver. I think the upstream kernel devs should be notified, but I'm not really sure if they would like to review this if the kernel is not plain vanilla one.
Comment 4 Iliyan Stoyanov 2011-05-15 18:42:30 EDT
Just upgraded to FC15 with kernel 2.6.38.6-27 from koji. The situation is as follows.

ifcfg-em1 \__ ifcfg-bond0 and ifcfg-bond0.2, ifcfg-bond0.192, with VLAN tagging
ifcfg-em2 /

on top of the bond interface.

Everything is working as expected, ping is there through the tagged interfaces.

If I do service network restart, systemd does restart through systemctl and ping completely disappears afterwords. The interfaces are up, but no tagged traffic goes trough them. I have to restart the machine (if I try doing just kexec what I see is the be2net driver reporting: UE: Detected!! UE: MPU bit set mbox poll timed out)

Also after clean restart if I also set up bridging on top of ifcfg-bond0 -> ifcfg-br0 and ifcfg-bond0.2 -> ifcfg.br1 ping is not going through the tagged interfaces if I however do brctl delbr br0 and br1, the ping goes through till the point where I do again service network restart.

I also upgraded the firmware of the Emulex cards to 2.104.281.0 , which did not helped much.

Also I compiled a number of kernels (based on the ones from koji) with patches for benet driver from Linus' tree, but this did not help the situation one bit.

As I'm using Fedora for KVM host (with most of the guests SL6) and in this condition it seems Fedora 15 (at least the kernel part) is letting me down as a stable KVM host solution :(.

Side note I tried with linux kernel 2.6.39-rc7-next-20110513, same result. It seems this is an upstream problem, but please could someone verify that this is not working only with my hardware, or is this reproducible on different hardware also?
Comment 5 Chuck Ebbert 2011-05-19 21:30:09 EDT
(In reply to comment #4)
> Side note I tried with linux kernel 2.6.39-rc7-next-20110513, same result. It
> seems this is an upstream problem, but please could someone verify that this is
> not working only with my hardware, or is this reproducible on different
> hardware also?

I'm guessing this is an upstream problem. It might be best if you tried to report it there (to the netdev list, not linux-kernel.)
Comment 6 Stepan Cenek 2011-06-07 09:31:27 EDT
(In reply to comment #3)
> Here is the status with 2.6.39-0.rc4.git2.0.fc14.x86_64
> 
> [16:57]root@hp6:~ # ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: off
> generic-receive-offload: on
> large-receive-offload: off
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off
> receive-hashing: off
> [17:03]root@hp6:~ # ethtool  -i eth0
> driver: be2net
> version: 4.0.100u
> firmware-version: 2.102.517.7
> bus-info: 0000:04:00.0
> 
> and the dmesg is full of errors about eth0 and eth1 that look like this:
> [  220.391566] be2net 0000:04:00.0: Error in cmd completion - opcode 121, compl
> 2, extd 30
> [  221.217569] be2net 0000:04:00.1: Error in cmd completion - opcode 121, compl
> 2, extd 30
> [  221.403566] be2net 0000:04:00.0: Error in cmd completion - opcode 121, compl
> 2, extd 30
> [  222.231481] be2net 0000:04:00.1: Error in cmd completion - opcode 121, compl
> 2, extd 30
> 
> 
> Again I'm not that knowledgeable in programming to find out what this opcode
> means from the source of the driver. I think the upstream kernel devs should be
> notified, but I'm not really sure if they would like to review this if the
> kernel is not plain vanilla one.

The "Error in cmd completion - opcode 121" is related to obtaining die_on_temperature from the NIC and is fixed in the NIC FW 2.104.281.0.
Comment 7 Josh Boyer 2012-06-04 14:54:02 EDT
Is this still happening with the 2.6.43/3.3 kernel updates in F15/F16?  Did anyone take this issue to the netdev list if so?
Comment 8 Marcelo Ricardo Leitner 2012-06-29 14:37:44 EDT
This seems to be related to BZ #804800 (ixgbe). Be warned, I just recognized the network layout, I didn't check anything else here.

Context: may be related to the VLAN path that were restructured/unified. The bridge/VLAN relation here should be regarding the PROMISC mode. When in promisc mode, ixgbe was disabling VLAN filtering and also bypassing sw 802.1q filtering due to that change.

As in #5, upstream problem.
Comment 9 Josh Boyer 2012-07-11 13:51:13 EDT
Fedora 15 has reached it's end of life as of June 26, 2012.  As a result, we will not be fixing any remaining bugs found in Fedora 15.

In the event that you have upgraded to a newer release and the bug you reported is still present, please reopen the bug and set the version field to the newest release you have encountered the issue with.  Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered.

Thank you for taking the time to file a report.  We hope newer versions of Fedora suit your needs.

Note You need to log in before you can comment on or make changes to this bug.