Bug 733224

Summary: vlan not accessible through a bridge configuration
Product: [Fedora] Fedora Reporter: ejbg
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 15CC: dalefarm, gansalmon, iarlyy, itamar, jonathan, kernel-maint, madhu.chinakonda, nhorman, notting, Per.t.Sjoholm, phresus, plautrba, ppisar, rrakus
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-11-16 15:44:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ejbg 2011-08-25 07:48:52 UTC
Description of problem:

Wnen a bridge is setup to go through a VLAN, the VLAN segment is no longer accessible.

Version-Release number of selected component (if applicable):

kernel : 2.6.40.3-0.fc15.x86_64
vconfig : vconfig-1.9-9.fc15.x86_64
bridge : bridge-utils-1.2-10.fc15.x86_64

How reproducible: Always

Steps to Reproduce:
1. Do not install NetworkManager* and disable iptables

2. disable IPv6
in /etc/modprobe.d/my.conf
options ipv6 disable=0 disable_ipv6=1 autoconf=0

3. set up networking

in /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
Type=Ethernet
BRIDGE=zbrz

in /etc/sysconfig/network-scripts/ifcfg-zbrz
DEVICE=zbrz
TYPE=Bridge
STP=off
DELAY=0
DEFROUTE=yes
NOZEROCONF=yes
IPV6INIT=no
IPV6_AUTOCONF=no
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.1.100
NETMASK=255.255.255.0
GATEWAY=192.168.1.254

in /etc/sysconfig/network-scripts/ifcfg-eth0.10
VLAN=yes
DEVICE=eth0.10
ONBOOT=yes
Type=Ethernet
BRIDGE=zbr10
IPV6INIT=no
IPV6_AUTOCONF=no
NOZEROCONF=yes

in /etc/sysconfig/network-scripts/ifcfg-zbr10
DEVICE=zbr10
TYPE=Bridge
STP=off
DELAY=0
ONBOOT=yes
NOZEROCONF=yes
IPV6INIT=no
IPV6_AUTOCONF=no
BOOTPROTO=static
IPADDR=192.168.10.100
NETMASK=255.255.255.0

4. start networking
service network (re)start OR through systemctl command

5. test networks
ping 192.168.1.xx on ZBRZ/ETH0 segment : it works
ping 192.168.10.yy on ZB10/ETH0.10 segment : all packets do not reach target

Actual results:
When pinging on ZBRZ/ETH0 segment, packets are correctly transmitted.

When pinging on ZBR10/ETH0.10 segment, all packets do not reach target

Expected results:

Both segments, through ZBRZ/ETH0 and ZBR10/ETH0.10 should be reachable.

Additional info:

This exact same configuration was working perfectly on Fedora 13 and Fedora 14 on the same machine

Comment 1 Petr Pisar 2011-08-25 08:40:20 UTC
vconfig is obsolete for long time and has been superseded by `ip' command from `iproute' package. AFAIK the /etc/sysconfig/network-scripts/* files are parsed by scripts provided by package `initscripts' and simple grep in initscripts-9.30-2.fc15.x86_64 did not find any calls to vconfig. So this is not issue for vconfig package.

This is very probably kernel issue as vconfig or ip just configure interfaces and rules that are applied to frames/packets by kernel.

Comment 2 ejbg 2011-09-01 13:43:51 UTC
Hi Bill,

Have you already found out if it comes from an initscripts problem rather than a kernel problem ?

Right now creating a KVM virtual machine on a particular VLAN is not possible as the bridge configuration, may it be on the direct vlan interface or on a bonding VLAN one does not work at all on Fedora 15 and one needs to go reuse Fedora 14 to test such config.

Thanks,
Eric.

Comment 3 Ryan Barry 2011-09-27 19:33:01 UTC
Was there any resolution to this?  I'm having exactly the same problem on a Fedora 15 machine, and no permutation of the configuration files has resolved it.

Comment 4 ejbg 2011-10-08 17:26:40 UTC
Hi Ryan,

I don't think this bug has been resolved yet, I never saw anything about it.

I believe it is not a common usage or there is maybe a way of doing this that I don't know of yet.

I dropped Fedora 15 and I came back to Fedora 14 where it works fine. 

I will try Fedora 16 to see if anything happened at that time.

Regards,
Eric.

Comment 5 dalefarm 2011-10-17 03:12:49 UTC
I've been having the same problem with a Fedora15 install recently.

I'm pretty sure it has nothing to do with initscripts.

My configuration was working fine in F14. It also works fine in an F15 install using kernel <= 2.6.38.8-35.

If I upgrade to a >= 2.6.40 kernel (aka 3.x), it no longer works.

I've tried various different configurations - bridge-on-vlan, vlan-on-bridge - no joy

(see for example http://unix.stackexchange.com/questions/18576/why-does-adding-a-non-vlaned-interface-to-a-bridge-break-the-vlaned-interfaces)


Using another machine, I've sniffed the traffic from my F15 box. I see that the egress VLAN traffic from that F15 box is not tagged with a VLAN ID, with the result that I cannot reach machines on my VLAN from the F15 box.

However if I sniff traffic on the F15 box itself, I do see VLAN ID in header.

So I'm suspecting a driver / kernel problem here.

My motherboard is using an NVIDIA MCP55 chipset, with forcedeth driver.

'lspci -nn' shows :
 00:08.0 Bridge [0680]: nVidia Corporation MCP55 Ethernet [10de:0373] (rev a2)


I also note that there has been some VLAN-related rework in the forcedeth driver (kernel too?) of late, that appears to have been causing various issues for some eg :

 https://lkml.org/lkml/2011/8/5/115


Eric, Ryan, what network adaptor / driver are you using? Which kernel?

Comment 6 Neil Horman 2011-10-17 18:20:44 UTC
dalefarm, when you sniff egress traffic, you're likely not going to see prepended vlan tags, as they are kept out of band in the vlan_tci field of the skb, to be prepended to the frame by the hardware on egress.  You're best bet may be to run a stap script to probe egress skbs to see if $skb->vlan_tci is set appropriately on entry to the driver ndo_start_xmit method.  If it is, then it is likely the driver is not properly telling the hardware to attach a vlan header to the frame

Comment 7 dalefarm 2011-10-17 21:44:08 UTC
>>when you sniff egress traffic, you're likely not going to see
prepended vlan tags

I'm sniffing the traffic from my F15 box, using a separate physical machine entirely.

In the 'working' case I see the VLAN tags in the egress traffic.

In the 'non-working' case I don't see the VLAN tags in the egress traffic.


The fact that I could see them (in either case) when I sniffed directly on the F15 box doesn't concern me - it just leads more credence to my suspicion that this is a kernel/driver issue.

Comment 8 Neil Horman 2011-10-17 23:48:18 UTC
I tend to agree, especially if you can run the above stap script that I outlined.  If vlan_tci is set properly in the drivers ndo_start_xmit routine, but no vlan tags appear on the wire for that port, we can be fairly certain the driver isn't telling the hardware to add a vlan tag properly.

Comment 9 dalefarm 2011-10-18 03:02:37 UTC
Thanks Neil.

Unfortunately that MCP55-chipset based box is in a semi-lockdown state for the next few weeks, so I'm unlikely to be able to do any debugging there for a little while.

However, I was able to try this same networking configuration on a box with Realtek hardware (RTL8111/8168B - PCI ID [10ec:8168] ), and this works fine.  (kernel here is 2.6.40.4-5.fc15.x86_64).

Comment 10 Neil Horman 2011-10-18 10:39:55 UTC
Ok, while you wait to get back on the system in question I'll see if I can find a box here to recreate this with

Comment 11 ejbg 2011-11-05 15:12:45 UTC
dalefarm,

I tried it on 2 different machines and both use the latest Fedora 15 x86_64 kernel (currently 2.6.40.6-0.fc15.x86_64 ).

The first machine uses network adapter :
09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5752 Gigabit Ethernet PCI Express (rev 02)

The second machine uses network adapter :
02:00.0 Ethernet controller: Broadcom Corporation NetLink BCM57785 Gigabit Ethernet PCIe (rev 10)

Regards,
Eric.

Comment 12 Neil Horman 2011-11-07 11:47:35 UTC
ejbg, why did you do that?  Did you get the same failure that he describes on the forcedeth hardware?  Did you run the stap script I suggested?

Comment 13 dalefarm 2011-11-13 18:21:13 UTC
Neil,

I've been able to get a couple of hours with that problem system.

I hacked together a quick stap script 'nv.stp' to inspect calls to inspect calls to the forcedeth module 'nv_start_xmit' and 'nv_start_xmit_optimized' functions :


  #!/usr/bin/stap

  probe module("forcedeth").function("nv_start_xmit") {
     printf("nv_start_xmit\n");
   
  }

  probe module("forcedeth").function("nv_start_xmit_optimized") { 
     printf("nv_start_xmit_optimized   dev %p (features: %x  hw %x  wanted %x    vlan %x) skb %p  vlan_tci %x\n",
        $dev,
        $dev->features,
        $dev->hw_features,
        $dev->wanted_features,
        $dev->vlan_features,      
        $skb, $skb->vlan_tci);
  }





Running this on kernel 2.6.40.6-0.fc15.x86_64 yield following output :

[root@localhost stap]# ./nv.stp
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600149a3  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff880174204700  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600149a3  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff88004841df00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600149a3  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff88004841d400  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600149a3  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff880172b5ff00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600149a3  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff88015f228b00  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600149a3  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff880173149c00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600149a3  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff88017202bb00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600149a3  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff880173669b00  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600149a3  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff8801752cb700  vlan_tci 0



There is a mix of non-vlan and vlan (id = 11) traffic on this system, so the vlan_tci values of '0' and '100b' are expected.  However, as previously noted, an external system sees all packets as untagged.


I attempted to disable hw acceleration, but cannot disable hw vlan via ethtool :


[root@localhost stap]# ethtool -K eth0 tx off
[root@localhost stap]# ethtool -K eth0 rx off
[root@localhost stap]# ethtool -K eth0 gro off
[root@localhost stap]# ethtool -K eth0 tso off
[root@localhost stap]# ethtool -K eth0 txvlan off
Cannot set device flag settings: Operation not supported
[root@localhost stap]# ethtool -K eth0 rxvlan off
Cannot set device flag settings: Operation not supported

[root@localhost stap]# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off




With these ethtool settings, same 'nv' stap script yielded :

[root@localhost stap]# ./nv.stp
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600001a0  hw 60014803  wanted 40000801  vlan 4020) skb 0xffff88017297aa00  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600001a0  hw 60014803  wanted 40000801  vlan 4020) skb 0xffff88002a561000  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600001a0  hw 60014803  wanted 40000801  vlan 4020) skb 0xffff880177193d00  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600001a0  hw 60014803  wanted 40000801  vlan 4020) skb 0xffff88002a561000  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600001a0  hw 60014803  wanted 40000801  vlan 4020) skb 0xffff88002a561b00  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880172e04000 (features: 600001a0  hw 60014803  wanted 40000801  vlan 4020) skb 0xffff88002a561300  vlan_tci 0





I then reverted to a working configuration (kernel 2.6.38.8-35.fc15.x86_64) on the same system.

(Had to tweak my stap script a little, since many $dev->features members are not present, or not traceable in this setup; Only dev->vlan_features was available)

Here's the output of the stap run :

nv_start_xmit_optimized   dev 0xffff8801708ee000 ( vlan 4020) skb 0xffff880093b88600  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff8801708ee000 ( vlan 4020) skb 0xffff8800938f1700  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff8801708ee000 ( vlan 4020) skb 0xffff880171da8400  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff8801708ee000 ( vlan 4020) skb 0xffff880171da8f00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff8801708ee000 ( vlan 4020) skb 0xffff88016f9d0e00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff8801708ee000 ( vlan 4020) skb 0xffff8800a18e8000  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff8801708ee000 ( vlan 4020) skb 0xffff880096539d00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff8801708ee000 ( vlan 4020) skb 0xffff880096539700  vlan_tci 0



Surprisingly, even with a (verified) mix of and non-tagged egress traffic, the vlan_tci field here is always '0', implying no tagging performed/requested of the forcedeth driver.




So at a guess :

1. kernel 2.6.38.8-35 itself is adding vlan tag, prior to sending skb to driver for transmit

2. kernel 2.6.40.6-0 is attempting to make use of driver to add vlan tags

3. Driver has a bug that results in no vlan tag added


Further, I'd suggest that driver may have had this bug for quite a while, but change in kernel behavior has now exposed it. 


Plausible?

-dalefarm

Comment 14 Neil Horman 2011-11-14 16:23:54 UTC
I'd say your spot on.  Looks like commit 0891b0e08937aaec2c4734acb94c5ff8042313bb borked you.   It looks like that change cleared the VLAN_TX flag for the enabled features list, and set it on the user modifiable features list, but the driver uses the features flag to default enable VLAN acceleration on TX and RX, so you're left in this odd state where acceleration isn't enabled in the hardware by default, but the driver is telling the stack that it is.  And it appears you can't change the feature (as you've noted) because forcedeth uses the new set_features interface that ethtool in userspace doesn't yet support.  I think the thing to do here is revert part of this commit, specifically the part that disabled vlan acceleration, and see if that gets you working again.  I'll have a patch shortly.

Comment 15 Neil Horman 2011-11-14 16:45:17 UTC
actually, scratch that, it appears that the features flag is set properly a little farther down, so I still need to figure out why it is that prior to the 2.6.40 kernel we didn't do vlan accel.

Comment 16 dalefarm 2011-11-14 17:02:17 UTC
There's discussion of a very similar problem on LKML from February this year :


https://lkml.org/lkml/2011/2/21/386

Comment 17 Neil Horman 2011-11-14 18:25:33 UTC
Hm, So the more I look at this the more confusing it seems.  By all rights before and after that patch, we should have dev->features set in such a way as to indicate that the device supports TX vlan acceleration.  In fact in 2.6.30 all the way through 3.1, this should be the case, yet, in 2.6.38 we seem to have no vlan_tci information reaching the driver, which seems quite wrong.  Can you do me a favor. Can you please attach /var/log/messages on your system here, from a boot of the 2.6.38 kernel and the 2.6.40 kernel?  I'd like to look at the banner information for forcedeth to confirm that I'm not missing something in the older driver that clears that flag.  Thank you.

Comment 18 Neil Horman 2011-11-14 18:33:28 UTC
daelfarm, yes, I've seen that conversation, and it definately looks suspicious, but from what I see forcedeth already uses the new vlan model.  I might be missing something though.

Comment 19 dalefarm 2011-11-14 23:19:26 UTC
Snippets of /var/log/messages :

(I've redacted the MAC addresses; if you need full log then I'll be happy to send via PM)


2.6.38.8-35.fc15.x86_64:
 
 forcedeth: Reverse Engineered nForce ethernet driver. Version 0.64.
 forcedeth 0000:00:08.0: PCI INT A -> Link[APCH] -> GSI 22 (level, low) -> IRQ 22
 forcedeth 0000:00:08.0: setting latency timer to 64
 forcedeth 0000:00:08.0: ifname eth0, PHY OUI 0x5043 @ 1, addr 00:18:f3:<xx:yy:zz>
 forcedeth 0000:00:08.0: highdma csum vlan pwrctl mgmt gbit lnktim msi desc-v3



2.6.40.6-0.fc15.x86_64 :

 forcedeth: Reverse Engineered nForce ethernet driver. Version 0.64.
 forcedeth 0000:00:08.0: PCI INT A -> Link[APCH] -> GSI 22 (level, low) -> IRQ 22
 forcedeth 0000:00:08.0: setting latency timer to 64
 forcedeth 0000:00:08.0: ifname eth0, PHY OUI 0x5043 @ 1, addr 00:18:f3:<xx:yy:zz>
 forcedeth 0000:00:08.0: highdma csum vlan pwrctl mgmt gbit lnktim msi desc-v3

Comment 20 dalefarm 2011-11-15 00:32:16 UTC
Being selfish for a moment, I'm less worried about why kernel 2.6.38 does not attempt hw accel vlan tagging, than I am about why the forcedeth driver doesn't seem to work properly when kernel 2.6.40 does attempt hw accel tagging.

Digging around in forcedeth.c, I'm surprised to see that the DEV_HAS_VLAN flag (in the pci_tbl struct) is advertized only for the MCP55-chipset variants.

As a workaround, I disabled (ie removed) 'DEV_HAS_VLAN' flag from the MCP55 entry; Lo and Behold now my vlans are working with 2.6.40.


/var/log/messages shows :
   forcedeth 0000:00:08.0: highdma csum pwrctl mgmt gbit lnktim msi desc-v3

ethtool -k eth0 :
   Offload parameters for eth0:
   rx-checksumming: on
   tx-checksumming: on
   scatter-gather: on
   tcp-segmentation-offload: on
   udp-fragmentation-offload: off
   generic-segmentation-offload: on
   generic-receive-offload: on
   large-receive-offload: off
   rx-vlan-offload: off
   tx-vlan-offload: off
   ntuple-filters: off
   receive-hashing: off


and finally my 'nv.stp' script is showing no attempt at hw tagging :

nv_start_xmit_optimized   dev 0xffff88017385c000 (features: 60014823  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff88008bea5100  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff88017385c000 (features: 60014823  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff88008bea5d00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff88017385c000 (features: 60014823  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff880141f86100  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff88017385c000 (features: 60014823  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff880141f86d00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff88017385c000 (features: 60014823  hw 60014803  wanted 60014803  vlan 4020) skb 0xffff880141f86500  vlan_tci 0

Comment 21 dalefarm 2011-11-15 14:21:31 UTC
Regarding commit 0891b0e08937aaec2c4734acb94c5ff8042313bb - 'forcedeth: fix vlans', as far as I can tell this doesn't come into play until kernel 3.1.

The only tangible forcedeth commit I see from 2.6.38 -> 2.6.40 (ie 3.0) that affects the various 'features' flags is  569e146396cb3b378d2957b94671bf30cd777c67 - 'forcedeth: convert to hw_features'.


Interestingly though, the 'fix vlans' commit does add notable changes to the nv_probe() function relating to vlan feature flags :


 	if (id->driver_data & DEV_HAS_VLAN) {
 		np->vlanctl_bits = NVREG_VLANCONTROL_ENABLE;
-		dev->features |= NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX;
+		dev->hw_features |= NETIF_F_HW_VLAN_RX | NETIF_F_HW_VLAN_TX;
 	}
 
+	dev->features |= dev->hw_features;



So this change specifically sets the NETIF_F_HW_VLAN_RX and NETIF_F_HW_VLAN_TX flags to dev->hw_features, in addition to dev->features.


If time allows, I may have a go at doing :

a. Regress forcedeth to the same rev as seen in kernel 2.6.38, to see if hw vlan stops being used

b. Bring in some/all of the forcedeth commits, post kernel 2.6.40, to see if hw vlan functionality springs to life.

Comment 22 Neil Horman 2011-11-15 14:46:16 UTC
I agree, the message logs confirm that, indicating that VLAN_TX was set in both versions of the file.

pulling some/all of the nv commits into f15 may be useful here, but I'm not seeing much in the way of vlan fixes since the above commit.  I'm starting to wonder if the bridge isn't doing something odd to the frames.  I'm going to set up a reproducer and take a look shortly.

Comment 23 dalefarm 2011-11-16 02:38:25 UTC
Based on a comment in the LMKL thread I linked previously [ https://lkml.org/lkml/2011/2/25/411], I added a VLAN (with an unused id) on eth0 :

 > vconfig add eth0 6

Now my other vlan at br0.11 (id = 11) is working, using unmodified 2.6.40 kernel and forcedeth driver.

stap script shows that vlan_tci field is 100b, indicating hw vlan tx active. 

To verify, I removed eth0.6 :

 > vconfig rem eth0.6


... and br0.11 stopped working.


A follow-up on that same thread reads 'The right solution is convert the driver
[Intel e1000e] over to the new vlan model'.

Comment 24 dalefarm 2011-11-16 03:20:34 UTC
I brought in the forcedeth.c from kernel 3.1, which adds the following commits above-and-beyond what's in 2.6.40/3.0 :

3326c784c9f492e988617d93f647ae0cfd4c8d09 - forcedeth: do vlan cleanup
0891b0e08937aaec2c4734acb94c5ff8042313bb - forcedeth: fix vlans


This looks to now all work nicely with kernel 2.6.40; no feature-flag hacks, no fake-vlan config.


dmesg :
  forcedeth 0000:00:08.0: highdma csum vlan pwrctl mgmt gbit lnktim msi desc-v3


ethtool -k eth0 :
  Offload parameters for eth0:
  rx-checksumming: on
  tx-checksumming: on
  scatter-gather: on
  tcp-segmentation-offload: on
  udp-fragmentation-offload: off
  generic-segmentation-offload: on
  generic-receive-offload: on
  large-receive-offload: off
  rx-vlan-offload: on
  tx-vlan-offload: on
  ntuple-filters: off
  receive-hashing: off



my 'nv.stp' stap-script :

nv_start_xmit_optimized   dev 0xffff880173b78000 (features: 600149a3  hw 60014983  wanted 60014983  vlan 4020) skb 0xffff880149959200  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff880173b78000 (features: 600149a3  hw 60014983  wanted 60014983  vlan 4020) skb 0xffff880167ac5000  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880173b78000 (features: 600149a3  hw 60014983  wanted 60014983  vlan 4020) skb 0xffff880167ac5d00  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880173b78000 (features: 600149a3  hw 60014983  wanted 60014983  vlan 4020) skb 0xffff880149aca400  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff880173b78000 (features: 600149a3  hw 60014983  wanted 60014983  vlan 4020) skb 0xffff880149acae00  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880173b78000 (features: 600149a3  hw 60014983  wanted 60014983  vlan 4020) skb 0xffff880172f17600  vlan_tci 100b
nv_start_xmit_optimized   dev 0xffff880173b78000 (features: 600149a3  hw 60014983  wanted 60014983  vlan 4020) skb 0xffff880172987f00  vlan_tci 0
nv_start_xmit_optimized   dev 0xffff880173b78000 (features: 600149a3  hw 60014983  wanted 60014983  vlan 4020) skb 0xffff880149aca400  vlan_tci 0

Comment 25 dalefarm 2011-11-16 03:32:53 UTC
Probably also worth bringing in 

9331db4f00cfee8a79d2147ac83723ef436b9759 - forcedeth: call vlan_mode only if hw supports vlans

(Obviously this doesn't affect me directly, but claims to address problems seen by many others when using non vlan-capable hardware)

Comment 26 Neil Horman 2011-11-16 12:06:44 UTC
Ok, copy that, thanks, I'll pull those changes back.

Comment 27 dalefarm 2011-11-16 14:39:43 UTC
@ ejbg :

If indeed you are using Broadcom hardware (which my reading of your Comment #11 suggests that you are), then these issues with the NVIDIA/forcedeth driver are unlikely to be the cause of your problem.

Looking back at your network config, it appears that you have both a vlan (eth0.10) and a bridge (zbrz) on the same physical interface (eth0).

From the various ifcfg files you posted, schematically your setup looks like :


         /-- zbrz [ @ 192.168.1.100]
        /
eth0 --{
        \
         \-- eth0.10 -- zbr10 [ @ 192.168.10.100]


This type of configuration has been seen to be problematic, in that all inbound traffic first gets sent to the bridge, and is never seen by the vlan. The symptoms you're seeing are consistent with this situation. 

(see for example http://thread.gmane.org/gmane.linux.network/149864)
 

Can you try re-arranging your setup, so that the vlan hangs off the bridge ?


ie -

eth0 -- zbrz [ @ 192.168.1.100] 
            \
             \-- zbrz.10 -- zbr10 [ @ 192.168.10.100]



With regards to your posted ifcfg files -  remove 'ifcfg-eth0.10', and create a new 'ifcfg-zbrz.10' to look like :

  VLAN=yes
  DEVICE=zbrz.10
  ONBOOT=yes
  Type=Ethernet
  BRIDGE=zbr10
  IPV6INIT=no
  IPV6_AUTOCONF=no
  NOZEROCONF=yes


See if that helps.

-dalefarm

Comment 28 Neil Horman 2011-11-16 15:40:10 UTC
since you've done the legwork on forcedeth, I'm going to use this bug to pull those changes in, I'll open a separate bug to see if we need to pull back simmilar changes for the broadcom driver.

Comment 29 Neil Horman 2011-11-16 15:44:57 UTC
Actually, I don't even have anything to apply.  Looks like the head of the git tree for f15 has moved to 2.6.41, which already includes these changes.  You can get a recent build here:
https://koji.fedoraproject.org/koji/buildinfo?buildID=273961

Or just wait for it to be pushed, which should be soon

Comment 30 ejbg 2011-11-16 23:31:18 UTC
@ dalefarm / C27 :

Following your advice, I changed the configuration as you stated :

eth0 -- zbrz [ @ 192.168.1.100] 
            \
             \-- zbrz.10 -- zbr10 [ @ 192.168.10.100]

Once the network has restarted, both segments 192.168.1.* and 192.168.10.* are now totally accessible :-).

It works now perfectly fine.

Many Thanks,

Eric.

Comment 31 dalefarm 2011-11-17 15:17:46 UTC
Neil,

I'm happy with my patched forcedeth driver for now, and look forward to release of 2.6.41.

Thanks for all your help.

-dalefarm