Bug 579858 - Wrong RX bytes/packet count on vlan interface with igb driver
Summary: Wrong RX bytes/packet count on vlan interface with igb driver
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Stefan Assmann
QA Contact: Liang Zheng
URL:
Whiteboard:
: 645327 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-06 18:41 UTC by Dan Rimal
Modified: 2013-07-29 00:56 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-21 10:29:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
0001-net-fix-vlan-rx-stats.patch (966 bytes, patch)
2011-03-28 14:30 UTC, Stefan Assmann
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Description Dan Rimal 2010-04-06 18:41:33 UTC
Description of problem:

I have bond0 interface in active/backup mode and on the top of bond0 i have vlan interfaces. Problem is when traffic goes through vlan interface, the value of RX bytes/RX packets counter doesn't correspond with real amount of traffic. TX bytes/packet counter is OK and fully correspond with expected amount of traffic and correspond with switch statistics too. RX counter not. On bond0 interface (without .vlan) is RX counter OK, on slave interface (eth1) too.

I have 2 similar servers with similar configuration (bond/vlan) with this issue. They are configured as BGP routers and i need real tx/rx bytes on interfaces include vlan interfaces.

I try to download big file through bond0.103 on freshly restarted system and there is packet/bytes counter /proc/net/dev (cut shorted - other values are 0):

Inter-    |   Receive       |  Transmit
 face     |bytes    packets |  bytes    packets 
  eth1:    439743478  291777   8161976  121404  
  eth2:    32978      552         0          0
 bond0:    439776456  292329   8161976  121404   
bond0.102: 21826      507        368       4   
bond0.103: 1344       32       8161240  121396

Receive bytes/packet on eth1 and bond0 is real traffic, but on bond0.103 is RX counter significantly lower in contrast with TX counter, which is OK. The same counter i can get by ifconfig (for example: RX bytes:30086223879 (28.0 GiB) TX bytes:13860575698738 (12.6 TiB) - real RX traffic is about 20% of TX, it is expected traffic and switch statistics confirmed this value)

configuration:
ifcfg-bond0:
DEVICE=bond0
ONBOOT=yes
MTU=1500

ifcfg-eth1:
DEVICE=eth1
HWADDR=00:15:17:da:f8:65
ONBOOT=yes
SLAVE=yes
MASTER=bond0
MTU=1500

ifcfg-eth2:
DEVICE=eth2
HWADDR=00:15:17:bd:66:f0
ONBOOT=yes
SLAVE=yes
MASTER=bond0
MTU=1500

modprobe.conf:
alias bond0 bonding
options bond0 miimon=100 mode=1 primary=eth1

ifcfg-bond0.103:
DEVICE=bond0.103
VLAN=yes
ONBOOT=yes
IPADDR=109.205.xxx.xxx
NETMASK=255.255.255.252







Version-Release number of selected component (if applicable):

kernel 2.6.18-164.15.1.el5
iputils-20020927-46.el5
vconfig-1.9-2.1

How reproducible:

Set up bonding with vlans and download/upload files and check counters.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Wrong RX bytes/packet counter

Expected results:
Real RX bytes/packet counter

Additional info:

Comment 1 Dan Rimal 2010-04-08 14:52:45 UTC
I found that problem is, when i use NIC with igb driver. When i use older NIC with e1000e driver, there is no problem. I try last driver provided by intel too (igb-2.1.9) and result was worse (i've got 0.0bytes receive) 

So, it looks as igb driver problem.


There is my lspci (NICs with igb drivers):
01:00.0 0200: 8086:10a7 (rev 02)
01:00.1 0200: 8086:10a7 (rev 02)
04:00.0 0200: 8086:10c9 (rev 01)
04:00.1 0200: 8086:10c9 (rev 01)
07:00.0 0200: 8086:10c9 (rev 01)
07:00.1 0200: 8086:10c9 (rev 01)

OR

01:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
01:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

Comment 2 Dan Rimal 2010-04-12 15:38:14 UTC
I found that the problem is not only bond vlan interface, but with simple eth0.100 (for example) too.

Comment 3 Jesse Brandeburg 2010-04-29 16:10:15 UTC
So, the adapter stats are off on receive with vlans enabled.  There might be a couple layers of problems, one being GRO, and the other being the stripping of vlan tags in hardware.

Do you want the adapter stats to match the switch stats?  I'm trying to figure out your expectations.

Comment 4 Dan Rimal 2010-04-29 16:51:04 UTC
GRO is turned off, because i need route IPv6 traffic and GRO is not compatible with path MTU discovery, but on testing server, GRO was turned on with same result.

There is comparison of switch port (bottom) and NIC (top) stats:
http://gal.danrimal.net/main.php?g2_itemId=1671

TX traffic is almost equal, but RX is missing on NIC at all.

Comment 5 Stefan Assmann 2011-03-03 12:22:39 UTC
Dan, do you still see the problem with the latest RHEL5 kernel?

Comment 6 Dan Rimal 2011-03-03 13:36:53 UTC
Yes, currently i have 2.6.18-194.11.4.el5 and problem is still there. 

I tryid RHEL6 too, there is result: https://bugzilla.redhat.com/show_bug.cgi?id=680353

I tryied Fedora 14 too and fedora is OK, without this bug.

Comment 7 Jarod Wilson 2011-03-07 19:54:30 UTC
Dan, 2.6.18-194.y.z.el5 is a 5.5 kernel, I believe Stefan was asking about the latest 5.6 kernel.

Comment 8 Dan Rimal 2011-03-08 08:06:17 UTC
Sorry, i haven't 5.6 on my boxes, because im considering update to RHEL6

Comment 9 Jarod Wilson 2011-03-08 21:17:48 UTC
You could just drop the latest 5.6 kernel on top of 5.5 to test, if you're so inclined.

Comment 10 Stefan Assmann 2011-03-28 13:37:51 UTC
Please try kernel-2.6.18-252.el5.sassmann_bug579858_01 from
http://people.redhat.com/sassmann/kernel/#rhel5

Let me know if that fixes your problem.

Comment 11 Stefan Assmann 2011-03-28 14:30:17 UTC
Created attachment 488157 [details]
0001-net-fix-vlan-rx-stats.patch

Comment 12 Dan Rimal 2011-03-29 08:06:11 UTC
Hello, 

your kernel fixes problem, stats are correct now.

Thank you!

Comment 14 RHEL Program Management 2011-03-29 14:39:17 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 17 RomanDmitriev 2011-03-31 07:00:22 UTC
(In reply to comment #12)
> Hello, 
> 
> your kernel fixes problem, stats are correct now.
> 
> Thank you!

Server 1. el5.5. Driver - e1000. The problem remained.

[root@caps2 ~]# uname -r -v
2.6.18-252.el5.sassmann_bug579858_01 #1 SMP Mon Mar 28 04:01:52 EDT 2011

[root@caps2 ~]# uname -m
i686

[root.ru ~]# ifconfig eth0.0473
eth0.0473 Link encap:Ethernet  HWaddr 00:11:09:B7:93:62
          inet addr:1.1.1.1  Bcast:1.1.1.3  Mask:255.255.255.252
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:933660 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:3039110 (2.8 MiB)

[root@caps2 ~]# ethtool -i eth0
driver: e1000
version: 8.0.25-NAPI
firmware-version: N/A
bus-info: 0000:01:01.0
[root@caps2 ~]# ethtool -k eth0
Offload parameters for eth0:
Cannot get device udp large send offload settings: Operation not supported
Cannot get device flags: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off

[root@caps2 ~]# dmesg | grep eth0
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
e1000: eth0: e1000_set_tso: TSO is Enabled

[root@caps2 ~]# lspci | grep net
01:01.0 Ethernet controller: Intel Corporation 82547GI Gigabit Ethernet Controller
03:0a.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller


Server 2. el5.5. Dirver e1000e:

[root@gw ~]# uname -m
x86_64

[root@gw ~]# ethtool -i eth5
driver: e1000e
version: 1.3.10a-NAPI
firmware-version: 5.10-2
bus-info: 0000:0c:00.1

[root@gw ~]# ifconfig eth5.1000
eth5.1000 Link encap:Ethernet  HWaddr 00:15:17:ED:B7:8E
          inet addr:10.111.5.254  Bcast:10.111.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:67840 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:14419 (14.0 KiB)

[root@gw ~]# ethtool -k eth5
Offload parameters for eth5:
Cannot get device udp large send offload settings: Operation not supported
Cannot get device flags: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off

[root@gw ~]# dmesg | grep eth5
e1000e 0000:0c:00.1: eth5: (PCI Express:2.5GB/s:Width x4) 00:15:17:ed:b7:8e
e1000e 0000:0c:00.1: eth5: Intel(R) PRO/1000 Network Connection
e1000e 0000:0c:00.1: eth5: MAC: 1, PHY: 4, PBA No: D64202-005
e1000e: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
e1000e: eth5 NIC Link is Down
e1000e: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

[root@gw ~]# lspci | grep "0c:00.1"
0c:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (Copper) (rev 06)

Comment 20 Peter K 2011-04-08 13:30:56 UTC
We've seen similar problems (but with the ixgbe driver), the RX counter on the vlanXX interface never increments. The kernel in comment #10 fixes the problem.

Comment 21 Jarod Wilson 2011-04-08 16:25:43 UTC
Patch(es) available in kernel-2.6.18-256.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 23 Andy Gospodarek 2011-04-08 16:51:25 UTC
(In reply to comment #20)
> We've seen similar problems (but with the ixgbe driver), the RX counter on the
> vlanXX interface never increments. The kernel in comment #10 fixes the problem.

Thanks for letting us know it works for you on ixgbe-based devices too!

Comment 24 Peter K 2011-04-11 09:27:20 UTC
(In reply to comment #21)
> Patch(es) available in kernel-2.6.18-256.el5
> You can download this test kernel (or newer) from
> http://people.redhat.com/jwilson/el5
> Detailed testing feedback is always welcomed.

I can confirm that this kernel (just like 2.6.18-252.el5.sassmann_bug579858_01) fixes our problem (RX counters for interfaces vlanXX now increment correctly on ixgbe).

Comment 25 Jason 2011-06-02 14:18:02 UTC
I can also confirm that 2.6.18-264.el5 from ~jwilson has fixed our RX counter on vlan interfaces as well.  Looking forward to official kernel release.

Comment 26 Andy Gospodarek 2011-06-08 18:18:43 UTC
*** Bug 645327 has been marked as a duplicate of this bug. ***

Comment 27 Liang Zheng 2011-06-14 08:34:05 UTC
Reproduced on kernel 2.6.18-238.el5
[root@hp-dl580g7-01 ~]# cat /proc/net/dev 
Inter-|   Receive       |  Transmit
 face |bytes    packets |  bytes    packets 
    lo: 2459098    1326   2459098    1326    
  eth4:   40280     429   49421     452    
eth4.3:       0       0   43455     422  

Verified on 2.6.18-267.el5
root@hp-dl580g7-01 ~]# cat /proc/net/dev
Inter-|   Receive        |  Transmit
 face |bytes    packets  |bytes    packets 
  eth4:   36204     375      45333     419   
eth4.3:   30954     375      41912     403

Comment 28 errata-xmlrpc 2011-07-21 10:29:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html


Note You need to log in before you can comment on or make changes to this bug.