Bug 1002982 - issue capturing some 802.1Q packets with kernel 3.10.9-200
issue capturing some 802.1Q packets with kernel 3.10.9-200
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
19
i386 Linux
unspecified Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-30 08:11 EDT by Yann BONNAMY
Modified: 2013-09-20 09:11 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-20 09:11:27 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
capture with the 3.9.5 kernel (2.94 KB, application/octet-stream)
2013-09-04 03:06 EDT, Yann BONNAMY
no flags Details
802.1Q pcap for replaying with netsniff-ng (260 bytes, application/octet-stream)
2013-09-05 03:46 EDT, Yann BONNAMY
no flags Details

  None (edit)
Description Yann BONNAMY 2013-08-30 08:11:38 EDT
Description of problem:

we have deployed a wireshark monitoring server with latest F19, the mirroring sources are multiple and interfaces are bonded in "MONI" interface. further more the switches performing the mirroring show the 802.1Q tags in one way but not in the other,  so we are used to launch wireshark in such a way when we need capture filters :

export FILTER="net 10.1.1.1/28"
wireshark -i MONI -k -f "$FILTER or ( vlan and $FILTER )"

it quickly revealed that some packets with 802.1Q vlan tag were no more captured as the SCTP packets we are interested in were visible only one way.

if capturing without filters, we can see some other packets with 802.1Q but not the packets we are interested in.

we use kernel-PAE-3.10.9-200.fc19.i686, all is back to normal and works correctly when i execute the following and reboot :
rpm -i kernel-PAE-3.9.5-301.fc19.i686.rpm --oldpackage
(kernel-3.9.5-301.fc19.i686.rpm is also OK)

i presume the libpcap-1.4.0-1.fc19.i686 and the kernel-PAE-3.10.9-200.fc19.i686 must have some kind of incompatibilities but i don't know where to begin to analyse more deeply this issue.

thanks for your help,

Regards,

Yann.

Version-Release number of selected component (if applicable):

# rpm -q kernel-PAE
kernel-PAE-3.10.9-200.fc19.i686
kernel-PAE-3.9.5-301.fc19.i686
# rpm -q libpcap
libpcap-1.4.0-1.fc19.i686


How reproducible: don't know how to reproduce outside of our lab


Steps to Reproduce:
1. boot on kernel-PAE-3.10.9-200.fc19.i686 , launch wireshark -i MONI -k -f "$FILTER or ( vlan and $FILTER )" as usual

Actual results:
-> some particular SCTP 802.1Q packets are not captured

Expected results:
-> all packets are captured as when we boot on kernel-PAE-3.9.5-301.fc19.i686

Additional info:
Comment 1 Yann BONNAMY 2013-09-02 11:05:29 EDT
as a workaround it is possible to create all possible vlans on the machine:
for i in {0..4094}; do vconfig add MONI $i ; done

but after that the ifconfig command is rather slow to display the configuration.

it looks as if the interface is not in full promiscuous mode, kernel will only allow packet capture if the corresponding vlan is created.
Comment 2 Michele Baldessari 2013-09-03 18:50:48 EDT
Hi Yann,

could you please share the pcap of one of those problematic packets that are not captured via 3.10.9 but are captured via 3.9.5 ?

Do they have vlan prio bits set?

thanks,
Michele
Comment 3 Yann BONNAMY 2013-09-04 03:06:20 EDT
Created attachment 793507 [details]
capture with the 3.9.5 kernel

in this capture with the 3.9.5 kernel, the 802.1Q packets do not have the vlan prio bits set
Comment 4 Yann BONNAMY 2013-09-04 07:41:25 EDT
Hi, below a few new clues :

last OK kernel version : 3.9.11
first NOK kernel version : 3.10.1

the MONI interface is a bonding of 6 interfaces on a TIGW1U server : 2 with driver e1000 + 4 with driver igb 
packets on e1000 -> OK, 802.1Q packets captured
packets on igb -> NOK, 802.1Q packets captured only if vlan is created

i tried to compile a 3.10.1 kernel with drivers/net/ethernet/intel/igb taken from 3.9.11  but i fail.

the bonding subsystem is probably not involved as capturing without any bonding is giving the same results.

Regards,

Yann.
Comment 5 Yann BONNAMY 2013-09-04 07:45:14 EDT
erratum : it is OK with e1000e driver (not e1000)
Comment 6 Yann BONNAMY 2013-09-04 12:01:00 EDT
comparing 3.9.11 and 3.10.1, thanks to comments, i tried removing some code about "VT mode", and it allowed to go back to OK situation (capturing 802.1Q packets with igb driver):

--- linux-3.10.1/drivers/net/ethernet/intel/igb/igb_main.c
+++ linux-3.10.1.yBO/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3738,8 +3738,8 @@ static void igb_set_rx_mode(struct net_d
        if (netdev->flags & IFF_PROMISC) {
                u32 mrqc = rd32(E1000_MRQC);
                /* retain VLAN HW filtering if in VT mode */
-               if (mrqc & E1000_MRQC_ENABLE_VMDQ)
-                       rctl |= E1000_RCTL_VFE;
                rctl |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
                vmolr |= (E1000_VMOLR_ROPE | E1000_VMOLR_MPME);
        } else {

sadly i don't even understand why it worked and what could be the side effects ...
Comment 7 Michele Baldessari 2013-09-04 16:10:07 EDT
Hi Yann,

thanks for the additional info. So the commit that brought in this change in behaviour is the following:
commit 6f3dc319ec5c101e1e927e55d593ad6637648fe5
Author: Greg Rose <gregory.v.rose@intel.com>
Date:   Tue Mar 26 06:19:41 2013 +0000

    igb: Retain HW VLAN filtering while in promiscuous + VT mode
    
    When using the new bridge FDB interface to allow SR-IOV virtual function
    network devices to communicate with SW bridged network devices the
    physical function is placed into promiscuous mode and hardware VLAN
    filtering is disabled.  This defeats the ability to use VLAN tagging
    to isolate user networks.  When the device is in promiscuous mode and
    VT mode simultaneously ensure that VLAN hardware filtering remains
    enabled.
    
    Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
    Tested-by: Sibai Li <sibai.li@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Ideally we can come up with a small contained reproducer (maybe injecting some traffic via http://netsniff-ng.org/ and collecting it via tshark) to demonstrate the issue. If it's too complex we can just try to ping e1000-devel. I'll see if I can give it a shot in the next days.

cheers,
Michele
Comment 8 Yann BONNAMY 2013-09-05 03:42:48 EDT
unfortunatly, replaying packets locally with netsniff-ng does not allow to reproduce the issue, however i can update the "How reproducible:" section using a secondary PC.


How reproducible:

on a server ( for exemple TIGW1U  ) with some network interfaces using igb driver ( Intel Corporation Gigabit VT Quad Port Server Adapter ) with kernel 3.10.1 , launch a capture , for exemple :

tcpdump -i p6p6 "vlan and host 10.32.112.169"

on interface p6p6, plug an ethernet cable to a PC, on the PC replay some .pcap with some 802.1Q packets :
netsniff-ng --in /tmp/802.1Q.pcap --out enp0s25

expected result : 802.1Q packets are captured
actual result :  802.1Q packets are not captured

possible workaround 1 -> fallback to 3.9.11 kernel
possible workaround 2 -> rather capture on an interface not using igb driver (e1000e is OK)
possible workaround 3 -> create all possible vlan ( for i in {0..4094}; do vconfig add p6p6 $i ; done )
possible workaround 4 -> recompile kernel applying below patch :

--- linux-3.10.1/drivers/net/ethernet/intel/igb/igb_main.c
+++ linux-3.10.1.new/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3738,8 +3738,8 @@ static void igb_set_rx_mode(struct net_d
        if (netdev->flags & IFF_PROMISC) {
                u32 mrqc = rd32(E1000_MRQC);
                /* retain VLAN HW filtering if in VT mode */
-               if (mrqc & E1000_MRQC_ENABLE_VMDQ)
-                       rctl |= E1000_RCTL_VFE;
                rctl |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
                vmolr |= (E1000_VMOLR_ROPE | E1000_VMOLR_MPME);
        } else {
Comment 9 Yann BONNAMY 2013-09-05 03:46:45 EDT
Created attachment 794046 [details]
802.1Q pcap for replaying with netsniff-ng

802.1Q pcap for replaying with netsniff-ng (used in "How reproducible:" in comment #8)
Comment 10 Michele Baldessari 2013-09-05 16:38:46 EDT
Thanks Yann, that's perfect. I'll raise it to e1000-devel in the next days.

Will keep you posted.
Comment 11 Michele Baldessari 2013-09-05 17:03:07 EDT
Hi Yann,

nevermind. No need to harass e1000. This has been fixed upstream aldeady:
commit 7e44892c1b6bb499cb2f6d5c0f4afcc077a26074
Author: Emil Tantilov <emil.s.tantilov@intel.com>
Date:   Fri Jul 26 05:46:36 2013 -0700

    igb: fix vlan filtering in promisc mode when not in VT mode
    
    This patch fixes a VT mode check to make sure VLAN filters are disabled when
    in promisc mode and VT is not enabled.
    
    The problem with the previous check was that:
    E1000_MRQC_ENABLE_VMDQ is defined as 0x00000003
    
    but when not in VT mode:
    mrqc |= E1000_MRQC_ENABLE_RSS_4Q (0x00000002)
    
    So the above check will trigger regardless if VT mode is being used or not.
    
    Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
    Tested-by: Aaron Brown <aaron.f.brown@intel.com>
    Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 6a0c1b6..c1d72c0 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3739,9 +3739,8 @@ static void igb_set_rx_mode(struct net_device *netdev)
        rctl &= ~(E1000_RCTL_UPE | E1000_RCTL_MPE | E1000_RCTL_VFE);
 
        if (netdev->flags & IFF_PROMISC) {
-               u32 mrqc = rd32(E1000_MRQC);
                /* retain VLAN HW filtering if in VT mode */
-               if (mrqc & E1000_MRQC_ENABLE_VMDQ)
+               if (adapter->vfs_allocated_count)
                        rctl |= E1000_RCTL_VFE;
                rctl |= (E1000_RCTL_UPE | E1000_RCTL_MPE);
                vmolr |= (E1000_VMOLR_ROPE | E1000_VMOLR_MPME);

I don't think it's really material for stable (aka 3.10.x) so either you move to 3.11/3.9 or someone from the fedora kernel maintainers includes this one in an update.

cheers,
Michele
Comment 12 Josh Boyer 2013-09-05 18:03:21 EDT
We can look at grabbing that soon.
Comment 13 Yann BONNAMY 2013-09-06 02:28:03 EDT
we moved to 3.9.11 and issue is solved, thanks for your help.
Comment 14 Josh Boyer 2013-09-16 09:08:50 EDT
Fedora 19 has been rebased to 3.11.1 in git.  An update should make it out with the patch mentioned in comment #11 soon.
Comment 15 Yann BONNAMY 2013-09-20 04:50:45 EDT
Hi, all OK running 3.11.1-200.fc19.i686.PAE, can be closed, thanks a lot
Comment 16 Josh Boyer 2013-09-20 09:11:27 EDT
Thank you.

Note You need to log in before you can comment on or make changes to this bug.