Bug 680998 - ixgbe 10Gbe - poor performance - 8080:10fb Intel 82599EB 10gbit sfi/sfp+
Summary: ixgbe 10Gbe - poor performance - 8080:10fb Intel 82599EB 10gbit sfi/sfp+
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.6
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Andy Gospodarek
QA Contact: Adam Okuliar
URL:
Whiteboard:
Depends On:
Blocks: 726799
TreeView+ depends on / blocked
 
Reported: 2011-02-28 18:12 UTC by Douglas Schilling Landgraf
Modified: 2018-11-27 21:45 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-10-12 15:56:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Douglas Schilling Landgraf 2011-02-28 18:12:41 UTC
Description of problem:

The driver ixgbe is getting poor performance. For RHEL 5.6 or even RHEV-H 5.6 

Device: 8080:10fb Intel 82599EB 10gbit sfi/sfp+

#modinfo /lib/modules/2.6.18-229.el5/kernel/drivers/net/ixgbe/ixgbe.ko 
filename:       /lib/modules/2.6.18-229.el5/kernel/drivers/net/ixgbe/ixgbe.ko
version:        2.0.84-k2
license:        GPL
description:    Intel(R) 10 Gigabit PCI Express Network Driver
author:         Intel Corporation, <linux.nics>
srcversion:     4366E815901EA5D1C81C099

Chris Wright has this board in hands, here the comment from him:
> OK, disabling hw RSC with 'ethtool -C eth2 rx-usecs 0' (thanks
> Herbert!) is bringing this back for me (something like ~1800 Mb/s).
> This is roughly what booting with max_vfs=1 should have done, so I'm not
> sure why that didn't work.

Note that disabling coalescing with ethtool results in better, though still poor performance as would be expected since we're disabling coalescing. The "max_vfs=1" parameter disables RSC as a side-effect and doesn't have the performance hit that disabling interrupt coalescing on the NIC does. In internal testing, "max_vfs=1" results in ~2.5x better performance than using ethtool.

Version-Release number of selected component (if applicable):

RHEL 5.6
RHEV-H 5.6

Additional info:

Also tried the following to increase the performance but no changes:

1)
TCP segmentation offload and Generic receive offload on your NICs. 
# ethtool -K ethX tso off
# ethtool -K ethX gro off

# vi /etc/modprobe.conf
# options virtio_net gso=0

2) 
We tried took out the switch and just tested with a 10g crossover cable and customer noticed that the transfer to the hypervisor itself got expected speeds (40 megs a second). However when he attempted to the guest he got a average of 1mb a second. The guest is using the paravirt driver and when we tested with the e1000 we got the same speeds. Something interesting to point out is that ethtool -k ethx doesn't work it spits out a unrecognized device error.


3) On RHEL environment, tried to remove the driver to add max_vfs=1 

# rmmod ixgbe

# lsmod |grep ixgbe
ixgbe                 204868  0 
8021q                  57937  2 ixgbe,cxgb3
dca                    41605  1 ixgbe

# rmmod ixgbe

# lsmod |grep ixgbe
ixgbe                 204868  0 
8021q                  57937  2 ixgbe,cxgb3
dca                    41605  1 ixgbe

4) On RHEV-H environment customer tried to unload/load module:
service vdsmd stop
service network stop
rmmod ixgbe
modprobe -v ixgbe

unfortunately i cant copy and paste out of the console window so here is the /var/log/messages which basically said the same thing, same errors.

Feb 18 18:46:55 rhev03 kernel: ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 2.0.84-k2
Feb 18 18:46:55 rhev03 kernel: ixgbe: Copyright (c) 1999-2010 Intel Corporation.
Feb 18 18:46:55 rhev03 kernel: PCI: Enabling device 0000:22:00.0 (0000 -> 0002)
Feb 18 18:46:55 rhev03 kernel: ACPI: PCI Interrupt 0000:22:00.0[A] -> Link [LN60] -> GSI 60 (level, high) -> IRQ 210
Feb 18 18:46:55 rhev03 kernel: ixgbe 0000:22:00.0: not enough MMIO resources for SR-IOV
Feb 18 18:46:55 rhev03 kernel: ixgbe: 0000:22:00.0: ixgbe_probe_vf: Failed to enable PCI sriov: -12
Feb 18 18:46:55 rhev03 kernel: ixgbe: 0000:22:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 48, Tx Queue count = 48
Feb 18 18:46:55 rhev03 kernel: ixgbe: 0000:22:00.0: ixgbe_probe: (PCI Express:5.0Gb/s:Width x8) 00:1b:21:76:84:dc
Feb 18 18:46:55 rhev03 kernel: ixgbe: 0000:22:00.0: ixgbe_probe: MAC: 2, PHY: 8, SFP+: 7, PBA No: e81283-002
Feb 18 18:46:55 rhev03 kernel: ixgbe: eth4: ixgbe_probe: Intel(R) 10 Gigabit Network Connection
Feb 18 18:46:55 rhev03 kernel: PCI: Enabling device 0000:22:00.1 (0000 -> 0002)
Feb 18 18:46:55 rhev03 kernel: ACPI: PCI Interrupt 0000:22:00.1[B] -> Link [LN61] -> GSI 61 (level, high) -> IRQ 228
Feb 18 18:46:55 rhev03 kernel: ixgbe 0000:22:00.1: not enough MMIO resources for SR-IOV
Feb 18 18:46:55 rhev03 kernel: ixgbe: 0000:22:00.1: ixgbe_probe_vf: Failed to enable PCI sriov: -12
Feb 18 18:46:55 rhev03 kernel: ixgbe: 0000:22:00.1: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 48, Tx Queue count = 48
Feb 18 18:46:55 rhev03 kernel: ixgbe: 0000:22:00.1: ixgbe_probe: (PCI Express:5.0Gb/s:Width x8) 00:1b:21:76:84:dd
Feb 18 18:46:55 rhev03 kernel: ixgbe: 0000:22:00.1: ixgbe_probe: MAC: 2, PHY: 8, SFP+: 7, PBA No: e81283-002
Feb 18 18:46:55 rhev03 kernel: ixgbe: eth5: ixgbe_probe: Intel(R) 10 Gigabit Network Connection

5) Customer setup rx-usecs to 0 and got better performance but not the best
ethtool -C eth5 rx-usecs 0

6) Created a RHEV-H image using hardcode 
options ixgbe max_vfs=1

And the parameters was not loaded by module. We confirmed this while using the modified image with the client. Thy still see poor performance, and we checked the interface in sysfs and indeed we don't have a "virtfn0" device, which we would if the param were taking. Thus, the suspicion is that we're still failing to set the driver param, even in the modified image. As such, the suspicion is that we're loading it in the iso initrd and not pulling in any params regardless of where we're specifying them.

Comment 1 Andy Gospodarek 2011-03-01 17:51:43 UTC
Yay, this finally made it into bugzilla.

This upstream fix should resolve this:

commit a124339ad28389093ed15eca990d39c51c5736cc
Author: Don Skidmore <donald.c.skidmore>
Date:   Tue Jan 18 22:53:47 2011 +0000

    ixgbe: fix for 82599 erratum on Header Splitting

Comment 7 RHEL Program Management 2011-03-04 21:40:35 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Adam Okuliar 2011-03-07 17:14:03 UTC
Hi all

We have ixgbe card for network performance testing here in Brno. Maybe I can provide some support or help with this issue. I can try to reproduce issue and verify fix. Is it possible to get built kernel image with this patch patch integrated?

Thanks,
Adam

Comment 16 Adam Okuliar 2011-04-21 13:58:51 UTC
Hi Andy!

ixgbe cards are connected via optical patch cable and completely isolated from rhts network, so you can do vlans. Yesterday I done some quick check, and saw about 30-40% performance regression when using VLANS. But still with vlans I had about 4000mbit/s (500MB/s) throughput. I did not test iscsi.

Adam

Comment 18 Andy Gospodarek 2011-04-21 14:14:58 UTC
(In reply to comment #16)
> Hi Andy!
> 
> ixgbe cards are connected via optical patch cable and completely isolated from
> rhts network, so you can do vlans. Yesterday I done some quick check, and saw
> about 30-40% performance regression when using VLANS. But still with vlans I
> had about 4000mbit/s (500MB/s) throughput. I did not test iscsi.
> 
> Adam

30-40% is higher than I would expect.

Comment 20 Andy Gospodarek 2011-04-26 21:21:07 UTC
(In reply to comment #16)
> Hi Andy!
> 
> ixgbe cards are connected via optical patch cable and completely isolated from
> rhts network, so you can do vlans. Yesterday I done some quick check, and saw
> about 30-40% performance regression when using VLANS. But still with vlans I
> had about 4000mbit/s (500MB/s) throughput. I did not test iscsi.
> 
> Adam

Adam, what sort of performance did you see over the ixgbe interfaces on the latest RHEL5.7 kernel?

Here's what I'm seeing when running on the host with the 82599 and what should be the latest ixgbe driver:

# ethtool -i eth5
driver: ixgbe
version: 3.2.9-k2
firmware-version: 0.9-2
bus-info: 0000:01:00.0

[root@piketon ~]# netperf -t TCP_STREAM -H 192.168.0.1 -D 1,1 && netperf -t TCP_MAERTS -H 192.168.0.1 -D 1,1
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.1 (192.168.0.1) port 0 AF_INET : histogram : interval : demo
Interim result: 7055.97 10^6bits/s over 1.00 seconds
Interim result: 7909.40 10^6bits/s over 1.00 seconds
Interim result: 6204.97 10^6bits/s over 1.27 seconds
Interim result: 6693.02 10^6bits/s over 1.00 seconds
Interim result: 6716.78 10^6bits/s over 1.00 seconds
Interim result: 6750.01 10^6bits/s over 1.00 seconds
Interim result: 6757.38 10^6bits/s over 1.00 seconds
Interim result: 6722.87 10^6bits/s over 1.01 seconds
Interim result: 6743.34 10^6bits/s over 1.00 seconds
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.01    6812.00   
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.1 (192.168.0.1) port 0 AF_INET : histogram : interval : demo
Interim result: 111.18 10^6bits/s over 1.37 seconds
Interim result: 3097.17 10^6bits/s over 1.00 seconds
Interim result: 5202.98 10^6bits/s over 1.00 seconds
Interim result: 6807.60 10^6bits/s over 1.00 seconds
Interim result: 8191.83 10^6bits/s over 1.01 seconds
Interim result: 7708.25 10^6bits/s over 1.06 seconds
Interim result: 7697.84 10^6bits/s over 1.00 seconds
Interim result: 7716.56 10^6bits/s over 1.00 seconds
Interim result: 7695.20 10^6bits/s over 1.00 seconds
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    5909.56   

If I add VLANs, I get this:


# netperf -t TCP_STREAM -H 192.168.100.1 -D 1,1 && netperf -t TCP_MAERTS -H 192.168.100.1 -D 1,1 
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.1 (192.168.100.1) port 0 AF_INET : histogram : interval : demo
Interim result: 1318.32 10^6bits/s over 1.00 seconds
Interim result: 1393.85 10^6bits/s over 1.01 seconds
Interim result: 1329.68 10^6bits/s over 1.05 seconds
Interim result: 1331.20 10^6bits/s over 1.00 seconds
Interim result: 1340.21 10^6bits/s over 1.01 seconds
Interim result: 1334.91 10^6bits/s over 1.00 seconds
Interim result: 1362.08 10^6bits/s over 1.00 seconds
Interim result: 1372.61 10^6bits/s over 1.00 seconds
Interim result: 1346.56 10^6bits/s over 1.02 seconds
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.01    1347.61   
TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.1 (192.168.100.1) port 0 AF_INET : histogram : interval : demo
Interim result: 1275.59 10^6bits/s over 1.00 seconds
Interim result: 1577.87 10^6bits/s over 1.00 seconds
Interim result: 1576.66 10^6bits/s over 1.00 seconds
Interim result: 1573.29 10^6bits/s over 1.00 seconds
Interim result: 1559.08 10^6bits/s over 1.01 seconds
Interim result: 1552.16 10^6bits/s over 1.00 seconds
Interim result: 1509.11 10^6bits/s over 1.03 seconds
Interim result: 1557.70 10^6bits/s over 1.00 seconds
Interim result: 1485.68 10^6bits/s over 1.05 seconds
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    1502.63   

So I see a *significant* performance hit when using vlans.

What version of the Intel sourceforge driver was the customer using?

Comment 22 Adam Okuliar 2011-04-27 12:16:06 UTC
With latest 5.7 (2.6.18-257.el5) results are

netperf -t TCP_STREAM -H 172.16.29.20   && netperf -t TCP_STREAM -H 172.16.29.20 
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.29.20 (172.16.29.20) port 0 AF_INET : histogram : interval
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    9362.85   
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.29.20 (172.16.29.20) port 0 AF_INET : histogram : interval
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    9404.94  

When using VLANs results are:

netperf -t TCP_STREAM -H 172.16.30.20   && netperf -t TCP_STREAM -H 172.16.30.20 
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.30.20 (172.16.30.20) port 0 AF_INET : histogram : interval
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    9348.54   
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.30.20 (172.16.30.20) port 0 AF_INET : histogram : interval
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00    9349.02   

So no performance hit at all.

Comment 24 Adam Okuliar 2011-04-27 13:49:34 UTC
Mike please try to attach their test scripts asap. I will try to reproduce their situation here.

Big thanks,
Adam

Comment 27 Andy Gospodarek 2011-04-27 19:31:34 UTC
(In reply to comment #22)
> With latest 5.7 (2.6.18-257.el5) results are
> 
> netperf -t TCP_STREAM -H 172.16.29.20   && netperf -t TCP_STREAM -H
> 172.16.29.20 
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.29.20
> (172.16.29.20) port 0 AF_INET : histogram : interval
> Recv   Send    Send                          
> Socket Socket  Message  Elapsed              
> Size   Size    Size     Time     Throughput  
> bytes  bytes   bytes    secs.    10^6bits/sec  
> 
>  87380  16384  16384    10.00    9362.85   
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.29.20
> (172.16.29.20) port 0 AF_INET : histogram : interval
> Recv   Send    Send                          
> Socket Socket  Message  Elapsed              
> Size   Size    Size     Time     Throughput  
> bytes  bytes   bytes    secs.    10^6bits/sec  
> 
>  87380  16384  16384    10.00    9404.94  
> 
> When using VLANs results are:
> 
> netperf -t TCP_STREAM -H 172.16.30.20   && netperf -t TCP_STREAM -H
> 172.16.30.20 
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.30.20
> (172.16.30.20) port 0 AF_INET : histogram : interval
> Recv   Send    Send                          
> Socket Socket  Message  Elapsed              
> Size   Size    Size     Time     Throughput  
> bytes  bytes   bytes    secs.    10^6bits/sec  
> 
>  87380  16384  16384    10.00    9348.54   
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.30.20
> (172.16.30.20) port 0 AF_INET : histogram : interval
> Recv   Send    Send                          
> Socket Socket  Message  Elapsed              
> Size   Size    Size     Time     Throughput  
> bytes  bytes   bytes    secs.    10^6bits/sec  
> 
>  87380  16384  16384    10.00    9349.02   
> 
> So no performance hit at all.

I'm not seeing anything like this and that is surprising.  Some of my throughput issues could be the receiving device (an older cxgb3 card), but performance between UDP tests over VLANs and not over VLANs is about the same.

# netperf -H 192.168.0.1 -D 1,1 -t UDP_STREAM 
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.1 (192.168.0.1) port 0 AF_INET : histogram : interval :
 demo
Interim result: 2371.68 10^6bits/s over 1.00 seconds
Interim result: 2517.35 10^6bits/s over 1.00 seconds
Interim result: 2558.03 10^6bits/s over 1.00 seconds
Interim result: 2462.83 10^6bits/s over 1.04 seconds
Interim result: 2511.83 10^6bits/s over 1.00 seconds
Interim result: 2572.55 10^6bits/s over 1.00 seconds
Interim result: 2586.17 10^6bits/s over 1.00 seconds
Interim result: 2577.88 10^6bits/s over 1.00 seconds
Interim result: 2572.83 10^6bits/s over 1.00 seconds
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

129024   65507   10.00       48274      0    2529.70
126976           10.00       23644           1239.01

# netperf -H 192.168.100.1 -D 1,1 -t UDP_STREAM 
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.1 (192.168.100.1) port 0 AF_INET : histogram : interv
al : demo
Interim result: 2393.89 10^6bits/s over 1.00 seconds
Interim result: 2284.14 10^6bits/s over 1.05 seconds
Interim result: 2260.02 10^6bits/s over 1.01 seconds
Interim result: 2271.22 10^6bits/s over 1.00 seconds
Interim result: 2246.25 10^6bits/s over 1.01 seconds
Interim result: 2275.03 10^6bits/s over 1.00 seconds
Interim result: 2261.65 10^6bits/s over 1.01 seconds
Interim result: 2285.70 10^6bits/s over 1.00 seconds
Interim result: 2414.30 10^6bits/s over 1.00 seconds
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

129024   65507   10.00       44210      0    2316.48
126976           10.00       21810           1142.78

Can you send me your routing table?  Mine looks like this:

# route -n 
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.100.0   0.0.0.0         255.255.255.0   U     0      0        0 eth5.100
10.0.2.0        0.0.0.0         255.255.255.0   U     0      0        0 eth4
192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 eth5
192.168.122.0   0.0.0.0         255.255.255.0   U     0      0        0 virbr0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth4
0.0.0.0         10.0.2.254      0.0.0.0         UG    0      0        0 eth4

Comment 29 Andy Gospodarek 2011-04-27 20:40:58 UTC
FYI, I see virtually no difference with the performance when running 2.6.18-194.el5, so we probably need to look at something other that specifically the network driver as the issue.

Comment 30 Andy Gospodarek 2011-04-27 21:13:51 UTC
(In reply to comment #29)
> FYI, I see virtually no difference with the performance when running
> 2.6.18-194.el5, so we probably need to look at something other that
> specifically the network driver as the issue.

Well, I probably shouldn't say that since there does seem to be a performance difference when using the sourceforge ixgbe driver, but the performance is still not what it was when using the driver from RHEL5.5.

Comment 36 Adam Okuliar 2011-04-29 08:45:16 UTC
Just note about how I configured iSCSI test bed. On software target add to  /etc/tgt/targets.conf following lines:

<target hp-dl385g7-01.lab.eng.brq.redhat.com:rh_test>
    backing-store /target
    lun 10
</target>

backing store is path to file or block device which will become iscsi target.

On initiator machine run:
iscsiadm --mode node --targetname hp-dl385g7-01.lab.eng.brq.redhat.com:rh_test -p 172.16.30.20 -l

-p is IP address of concentrator. You will see new block device added to your system in dmesg:

SCSI device sda: 81920000 512-byte hdwr sectors (41943 MB)
sda: Write Protect is off
sda: Mode Sense: 49 00 00 08

Now you can format sda with file-system and mount it.

Adam

Comment 50 Andy Gospodarek 2011-10-12 15:56:04 UTC
OK, closing this as NOTABUG.  It sounds like excessive XOFF frames were coming from the host receiving our traffic, so there is not much we can do.  Please reopen if needed.


Note You need to log in before you can comment on or make changes to this bug.