Bug 1334673 - Severe Infiniband performance regression in recent kernels (>= v4.3) [NEEDINFO]
Summary: Severe Infiniband performance regression in recent kernels (>= v4.3)
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 24
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-10 09:59 UTC by Ian Chapman
Modified: 2017-04-28 17:15 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-04-28 17:15:10 UTC
Type: Bug
Embargoed:
jforbes: needinfo?


Attachments (Terms of Use)

Description Ian Chapman 2016-05-10 09:59:46 UTC
Description of problem:

In recent kernel versions (>= v4.2) Infiniband performance or more accurately IPoIB has been significantly impacted. 


The tests done here are between two directly connect PCs using identical dual port infiniband cards. Only 1 port is connected between the two PCs. There are no other infiniband devices on the subnet. Tests were performed with iperf using the following parameters

Server: iperf -l 128k -s
Client: iperf -l 128k -c <server>

The hardware:

02:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s - IB DDR / 10GigE] (rev a0)



firstpc = Fedora 22
secondpc = Fedora 23



Test 1
------
Direction: firstpc --> secondpc
firstpc: 4.0.4-301.fc22.x86_64 : 7.26 Gbits/sec transmit
secondpc: 4.2.3-300.fc23.x86_64 : 7.26 Gbits/sec receive

Direction: secondpc --> firstpc
firstpc: 4.0.4-301.fc22.x86_64 : 11.5 Gbits/sec receive
secondpc: 4.2.3-300.fc23.x86_64 : 11.5 Gbits/sec transmit


Test 2
------
Direction: firstpc --> secondpc
firstpc: 4.0.4-301.fc22.x86_64 : 7.54 Gbits/sec transmit
secondpc: 4.4.8-300.fc23.x86_64 : 7.54 Gbits/sec receive

Direction: secondpc --> firstpc
firstpc: 4.0.4-301.fc22.x86_64 : 865 Kbits/sec receive
secondpc: 4.4.8-300.fc23.x86_64 : 977 Kbits/sec transmit


Test 3
------
Direction: firstpc --> secondpc
firstpc: 4.4.8-200.fc22.x86_64 :  826 Kbits/sec transmit
secondpc: 4.4.8-300.fc23.x86_64 : 738 Kbits/sec receive

firstpc: 4.4.8-200.fc22.x86_64 : 940 Kbits/sec receive
secondpc: 4.4.8-300.fc23.x86_64 : 887 Kbits/sec transmit


firstpc
=======
4: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc fq_codel state UP mode DEFAULT group default qlen 256
(Datagram Mode)

secondpc
========
5: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc fq_codel state UP mode DEFAULT group default qlen 256
(Datagram Mode)

Comment 1 Ian Chapman 2016-05-20 10:50:52 UTC
Looks related to the following two bugs


https://bugzilla.kernel.org/show_bug.cgi?id=111921

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1536837

Comment 2 Doug Ledford 2016-05-20 13:52:21 UTC
(In reply to Ian Chapman from comment #1)
> Looks related to the following two bugs
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=111921

It's possible these two are related, but not certain.

Comment 3 Ian Chapman 2016-08-09 10:07:54 UTC
Just an FYI. Kernel 4.6.5-200.fc23.x86_64 is still affected.

Comment 4 Laura Abbott 2016-09-23 19:43:57 UTC
*********** MASS BUG UPDATE **************
 
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.
 
Fedora 23 has now been rebased to 4.7.4-100.fc23.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.
 
If you experience different issues, please open a new bug report for those.

Comment 5 Ian Chapman 2016-10-07 09:50:52 UTC
Yes the issue still persists.

Comment 6 Doug Ledford 2016-10-11 17:32:51 UTC
Upstream submission of proposed fix:

http://marc.info/?l=linux-rdma&m=147620680520525&w=2

Comment 7 Ian Chapman 2016-11-11 11:13:39 UTC
Patch has definitely improved things, although performance is still lower than before. It seems to vary between 300MBit and 4 Gbit.

Comment 8 Fedora End Of Life 2016-11-25 08:59:29 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Ian Chapman 2016-11-25 10:22:05 UTC
Note, the patch when applied under kernel-4.7.10-100 causes the system to lock-up when resuming from suspend. It happens at the same time the infiniband network is initialised.

Comment 10 Ian Chapman 2016-12-26 11:03:30 UTC
Is it possible to carry the IBoIP fix in Fedora kernels? It seems that it soon it won't be possible to use IBoIP on a supported version of Fedora.

Comment 11 Justin M. Forbes 2017-04-11 14:54:45 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs.

Fedora 25 has now been rebased to 4.10.9-100.fc24.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26.

If you experience different issues, please open a new bug report for those.

Comment 12 Justin M. Forbes 2017-04-28 17:15:10 UTC
*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the 
relevant data from the latest kernel you are running and any data that might have been requested previously.


Note You need to log in before you can comment on or make changes to this bug.