Description of problem: I have bond0 interface in active/backup mode and on the top of bond0 i have vlan interfaces. Problem is when traffic goes through vlan interface, the value of RX bytes/RX packets counter doesn't correspond with real amount of traffic. TX bytes/packet counter is OK and fully correspond with expected amount of traffic and correspond with switch statistics too. RX counter not. On bond0 interface (without .vlan) is RX counter OK, on slave interface (eth1) too. I have 2 similar servers with similar configuration (bond/vlan) with this issue. They are configured as BGP routers and i need real tx/rx bytes on interfaces include vlan interfaces. I try to download big file through bond0.103 on freshly restarted system and there is packet/bytes counter /proc/net/dev (cut shorted - other values are 0): Inter- | Receive | Transmit face |bytes packets | bytes packets eth1: 439743478 291777 8161976 121404 eth2: 32978 552 0 0 bond0: 439776456 292329 8161976 121404 bond0.102: 21826 507 368 4 bond0.103: 1344 32 8161240 121396 Receive bytes/packet on eth1 and bond0 is real traffic, but on bond0.103 is RX counter significantly lower in contrast with TX counter, which is OK. The same counter i can get by ifconfig (for example: RX bytes:30086223879 (28.0 GiB) TX bytes:13860575698738 (12.6 TiB) - real RX traffic is about 20% of TX, it is expected traffic and switch statistics confirmed this value) configuration: ifcfg-bond0: DEVICE=bond0 ONBOOT=yes MTU=1500 ifcfg-eth1: DEVICE=eth1 HWADDR=00:15:17:da:f8:65 ONBOOT=yes SLAVE=yes MASTER=bond0 MTU=1500 ifcfg-eth2: DEVICE=eth2 HWADDR=00:15:17:bd:66:f0 ONBOOT=yes SLAVE=yes MASTER=bond0 MTU=1500 modprobe.conf: alias bond0 bonding options bond0 miimon=100 mode=1 primary=eth1 ifcfg-bond0.103: DEVICE=bond0.103 VLAN=yes ONBOOT=yes IPADDR=109.205.xxx.xxx NETMASK=255.255.255.252 Version-Release number of selected component (if applicable): kernel 2.6.18-164.15.1.el5 iputils-20020927-46.el5 vconfig-1.9-2.1 How reproducible: Set up bonding with vlans and download/upload files and check counters. Steps to Reproduce: 1. 2. 3. Actual results: Wrong RX bytes/packet counter Expected results: Real RX bytes/packet counter Additional info:
I found that problem is, when i use NIC with igb driver. When i use older NIC with e1000e driver, there is no problem. I try last driver provided by intel too (igb-2.1.9) and result was worse (i've got 0.0bytes receive) So, it looks as igb driver problem. There is my lspci (NICs with igb drivers): 01:00.0 0200: 8086:10a7 (rev 02) 01:00.1 0200: 8086:10a7 (rev 02) 04:00.0 0200: 8086:10c9 (rev 01) 04:00.1 0200: 8086:10c9 (rev 01) 07:00.0 0200: 8086:10c9 (rev 01) 07:00.1 0200: 8086:10c9 (rev 01) OR 01:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02) 01:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02) 04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
I found that the problem is not only bond vlan interface, but with simple eth0.100 (for example) too.
So, the adapter stats are off on receive with vlans enabled. There might be a couple layers of problems, one being GRO, and the other being the stripping of vlan tags in hardware. Do you want the adapter stats to match the switch stats? I'm trying to figure out your expectations.
GRO is turned off, because i need route IPv6 traffic and GRO is not compatible with path MTU discovery, but on testing server, GRO was turned on with same result. There is comparison of switch port (bottom) and NIC (top) stats: http://gal.danrimal.net/main.php?g2_itemId=1671 TX traffic is almost equal, but RX is missing on NIC at all.
Dan, do you still see the problem with the latest RHEL5 kernel?
Yes, currently i have 2.6.18-194.11.4.el5 and problem is still there. I tryid RHEL6 too, there is result: https://bugzilla.redhat.com/show_bug.cgi?id=680353 I tryied Fedora 14 too and fedora is OK, without this bug.
Dan, 2.6.18-194.y.z.el5 is a 5.5 kernel, I believe Stefan was asking about the latest 5.6 kernel.
Sorry, i haven't 5.6 on my boxes, because im considering update to RHEL6
You could just drop the latest 5.6 kernel on top of 5.5 to test, if you're so inclined.
Please try kernel-2.6.18-252.el5.sassmann_bug579858_01 from http://people.redhat.com/sassmann/kernel/#rhel5 Let me know if that fixes your problem.
Created attachment 488157 [details] 0001-net-fix-vlan-rx-stats.patch
Hello, your kernel fixes problem, stats are correct now. Thank you!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
(In reply to comment #12) > Hello, > > your kernel fixes problem, stats are correct now. > > Thank you! Server 1. el5.5. Driver - e1000. The problem remained. [root@caps2 ~]# uname -r -v 2.6.18-252.el5.sassmann_bug579858_01 #1 SMP Mon Mar 28 04:01:52 EDT 2011 [root@caps2 ~]# uname -m i686 [root.ru ~]# ifconfig eth0.0473 eth0.0473 Link encap:Ethernet HWaddr 00:11:09:B7:93:62 inet addr:1.1.1.1 Bcast:1.1.1.3 Mask:255.255.255.252 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:933660 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:3039110 (2.8 MiB) [root@caps2 ~]# ethtool -i eth0 driver: e1000 version: 8.0.25-NAPI firmware-version: N/A bus-info: 0000:01:01.0 [root@caps2 ~]# ethtool -k eth0 Offload parameters for eth0: Cannot get device udp large send offload settings: Operation not supported Cannot get device flags: Operation not supported rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [root@caps2 ~]# dmesg | grep eth0 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None e1000: eth0: e1000_set_tso: TSO is Enabled [root@caps2 ~]# lspci | grep net 01:01.0 Ethernet controller: Intel Corporation 82547GI Gigabit Ethernet Controller 03:0a.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller Server 2. el5.5. Dirver e1000e: [root@gw ~]# uname -m x86_64 [root@gw ~]# ethtool -i eth5 driver: e1000e version: 1.3.10a-NAPI firmware-version: 5.10-2 bus-info: 0000:0c:00.1 [root@gw ~]# ifconfig eth5.1000 eth5.1000 Link encap:Ethernet HWaddr 00:15:17:ED:B7:8E inet addr:10.111.5.254 Bcast:10.111.5.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:67840 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:14419 (14.0 KiB) [root@gw ~]# ethtool -k eth5 Offload parameters for eth5: Cannot get device udp large send offload settings: Operation not supported Cannot get device flags: Operation not supported rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [root@gw ~]# dmesg | grep eth5 e1000e 0000:0c:00.1: eth5: (PCI Express:2.5GB/s:Width x4) 00:15:17:ed:b7:8e e1000e 0000:0c:00.1: eth5: Intel(R) PRO/1000 Network Connection e1000e 0000:0c:00.1: eth5: MAC: 1, PHY: 4, PBA No: D64202-005 e1000e: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx e1000e: eth5 NIC Link is Down e1000e: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None [root@gw ~]# lspci | grep "0c:00.1" 0c:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)
We've seen similar problems (but with the ixgbe driver), the RX counter on the vlanXX interface never increments. The kernel in comment #10 fixes the problem.
Patch(es) available in kernel-2.6.18-256.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
(In reply to comment #20) > We've seen similar problems (but with the ixgbe driver), the RX counter on the > vlanXX interface never increments. The kernel in comment #10 fixes the problem. Thanks for letting us know it works for you on ixgbe-based devices too!
(In reply to comment #21) > Patch(es) available in kernel-2.6.18-256.el5 > You can download this test kernel (or newer) from > http://people.redhat.com/jwilson/el5 > Detailed testing feedback is always welcomed. I can confirm that this kernel (just like 2.6.18-252.el5.sassmann_bug579858_01) fixes our problem (RX counters for interfaces vlanXX now increment correctly on ixgbe).
I can also confirm that 2.6.18-264.el5 from ~jwilson has fixed our RX counter on vlan interfaces as well. Looking forward to official kernel release.
*** Bug 645327 has been marked as a duplicate of this bug. ***
Reproduced on kernel 2.6.18-238.el5 [root@hp-dl580g7-01 ~]# cat /proc/net/dev Inter-| Receive | Transmit face |bytes packets | bytes packets lo: 2459098 1326 2459098 1326 eth4: 40280 429 49421 452 eth4.3: 0 0 43455 422 Verified on 2.6.18-267.el5 root@hp-dl580g7-01 ~]# cat /proc/net/dev Inter-| Receive | Transmit face |bytes packets |bytes packets eth4: 36204 375 45333 419 eth4.3: 30954 375 41912 403
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html