Bug 757040 - Network RPS miscellaneous bugs, RPS unusable
Summary: Network RPS miscellaneous bugs, RPS unusable
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Weibing Zhang
URL:
Whiteboard:
Keywords: Reopened
: 770739 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-25 10:11 UTC by Alex/AT
Modified: 2018-11-29 21:38 UTC (History)
6 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2012-06-20 08:07:54 UTC


Attachments (Terms of Use)
patch to revert previous bad change (1.23 KB, patch)
2012-01-25 12:10 UTC, Neil Horman
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2012:0862 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update 2012-06-20 12:55:00 UTC

Description Alex/AT 2011-11-25 10:11:02 UTC
Description of problem:
RPS (Receive Packet Steering) is unusable in kernels up to 2.6.32-131.21.1 due to critical bugs.

Version-Release number of selected component (if applicable):
6.0, 6.1

How reproducible, steps to reproduce:
1. Enable RPS. Traffic may stall due to IPI check bug.
2. Enable RPS. Most of the non-RFS traffic will be punted to CPU0 (or first CPU in RPS map) due to hash-to-CPU conversion bug.
  
Expected results:
RPS working.

Additional info:
The dev.c in net/core has two RPS backport bugs preventing normal RPS operation. I will post the code changes below that must be done to prevent the condition.

1. IPI scheduling check bug.

if (rps_ipi_queued(sd))
    ____napi_schedule(sd, &sd->backlog);

must be change to

if (!rps_ipi_queued(sd))
    ____napi_schedule(sd, &sd->backlog);

The normal behavior is to schedule skb to NAPI backlog processing if no IPIs are queued anymore, else queue traffic for scheduling at next IPI. The RH6.x code only schedules traffic if there are some queued IPIs present, and that is wrong: if there is no IPIs for some time, no traffic will be scheduled to NAPI, and the network stack will stall. The "!" is present in the original RPS patch submitted to linux-networking mailing list, so this is probably a typo during backport.

2. Hash to CPU conversion bug. 

There are two possible solutions. Either:

make rxhash in skb 32-bits (u32), and change:

skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd) >> 16

to

skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd)

OR

keep rxhash 32 bit, but make the following change:

tcpu = map->cpus[((u64) skb->rxhash * map->len) >> 32];

to

tcpu = map->cpus[((u32) skb->rxhash * map->len) >> 16];

In the original RPS patch code, rxhash is 32 bits. It is converted to CPU number by multiplying 32-bit rxhash value of 00000000-FFFFFFFF by CPU map length and shifting off lower 32 bits, effectivily giving value of 0 to (map->len - 1).

In the RH6.x code, rxhash is 16 bits, so given our map->len never exceeds FFFF (having 65536+ CPUs is probably overboard a bit) multiplying it by map->len value and shifting lower 32 bits off will effectively zero result, leading to all trafic punted to the first CPU in the map list.

Comment 1 Alex/AT 2011-11-25 10:12:33 UTC
"keep rxhash 32 bit" must read as "keep rxhash 16 bit"

Comment 3 Akemi Yagi 2011-12-13 16:47:00 UTC
@Alex/AT,

I cannot locate the code:

if (rps_ipi_queued(sd))
    ____napi_schedule(sd, &sd->backlog);

in net/core/dev.c (kernel-2.6.32-131.21.1.el6) or any other files in this directory. Can you help?

Comment 4 Alex/AT 2011-12-13 18:54:28 UTC
Oops, sorry about that. Bug #1 is not applicable to RHEL code. The local bugreport somehow mixed with upstream bugreport. I'm currently descalating this back.

Bug #2 (hashing / CPU punting bug) is intact in the RHEL 6.1 kernel code:

line 2213 of dev.c states: skb->rxhash = jhash_3words(addr1, addr2, ports, hashrnd) >> 16;

and then, line 2259 uses rxhash as follows: tcpu = map->cpus[((u64) skb->rxhash * map->len) >> 32];

16-bit rxhash in that operation will result in wrong (zero for <65536 CPUs in map) map->cpus index

Comment 5 Alex/AT 2011-12-13 18:56:31 UTC
Also, we have checked "16-bit rxhash" and >> 16 version fix (not breaking ABI). It works but needs some testing to prove that CPU distribution is even.

Comment 6 Neil Horman 2012-01-23 13:28:09 UTC
yeah, the tcpu selection is wrong.  Its the result of the RFS backport overwriting the RHEL specific fixes that wen't in to make the RPS backport operational.  Line 2259 just needs to be reverted to be a 16 bit shift rather than a 32 bit shift.

Comment 8 Neil Horman 2012-01-24 12:00:54 UTC
buildroot problem in the build cluster, should be fixed now, new build
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3969070

Comment 9 Neil Horman 2012-01-24 20:06:21 UTC
http://people.redhat.com/nhorman/rpms/bz757040.tbz2

Fixed kernel and firmware for your testing.

Comment 10 RHEL Product and Program Management 2012-01-24 20:49:08 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 11 Alex/AT 2012-01-25 03:20:39 UTC
Thanks.

May it please be an SRPM for testing? I cannot install generic kernel on test router, router build requires some custom patches to be applied.

Comment 12 Neil Horman 2012-01-25 12:10:46 UTC
Created attachment 557437 [details]
patch to revert previous bad change

if you need to build something custom, you can just apply the patch on top of what you already have.

Comment 13 Alex/AT 2012-02-09 16:15:17 UTC
Tested and found OK.

Comment 14 Aristeu Rozanski 2012-02-10 23:00:31 UTC
Patch(es) available on kernel-2.6.32-230.el6

Comment 17 Jiri Benc 2012-02-24 18:04:24 UTC
*** Bug 770739 has been marked as a duplicate of this bug. ***

Comment 18 Weibing Zhang 2012-05-25 07:54:04 UTC
Verified on kernel-2.6.32-274.el6.
[root@hp-dl580g7-02 ~]# cat /sys/class/net/eth0/queues/rx-0/rps_cpus 
0000,00000000,00000000
[root@hp-dl580g7-02 ~]# echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus
[root@hp-dl580g7-02 ~]# cat /sys/class/net/eth0/queues/rx-0/rps_cpus 
0000,00000000,0000000f
[root@hp-dl580g7-02 ~]# 

And RPS work as expected.
Set Verified.

Comment 19 Weibing Zhang 2012-05-25 07:58:48 UTC
Verified on kernel-2.6.32-274.el6.
[root@hp-dl580g7-02 ~]# cat /sys/class/net/eth0/queues/rx-0/rps_cpus 
0000,00000000,00000000
[root@hp-dl580g7-02 ~]# echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus
[root@hp-dl580g7-02 ~]# cat /sys/class/net/eth0/queues/rx-0/rps_cpus 
0000,00000000,0000000f
[root@hp-dl580g7-02 ~]# 

And RPS work as expected.
Set Verified.

Comment 21 errata-xmlrpc 2012-06-20 08:07:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html


Note You need to log in before you can comment on or make changes to this bug.