Hide Forgot
Description of problem: RPS (Receive Packet Steering) is unusable in kernels up to 2.6.32-131.21.1 due to critical bugs. Version-Release number of selected component (if applicable): 6.0, 6.1 How reproducible, steps to reproduce: 1. Enable RPS. Traffic may stall due to IPI check bug. 2. Enable RPS. Most of the non-RFS traffic will be punted to CPU0 (or first CPU in RPS map) due to hash-to-CPU conversion bug. Expected results: RPS working. Additional info: The dev.c in net/core has two RPS backport bugs preventing normal RPS operation. I will post the code changes below that must be done to prevent the condition. 1. IPI scheduling check bug. if (rps_ipi_queued(sd)) ____napi_schedule(sd, &sd->backlog); must be change to if (!rps_ipi_queued(sd)) ____napi_schedule(sd, &sd->backlog); The normal behavior is to schedule skb to NAPI backlog processing if no IPIs are queued anymore, else queue traffic for scheduling at next IPI. The RH6.x code only schedules traffic if there are some queued IPIs present, and that is wrong: if there is no IPIs for some time, no traffic will be scheduled to NAPI, and the network stack will stall. The "!" is present in the original RPS patch submitted to linux-networking mailing list, so this is probably a typo during backport. 2. Hash to CPU conversion bug. There are two possible solutions. Either: make rxhash in skb 32-bits (u32), and change: skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd) >> 16 to skb->rxhash = jhash_3words(addr1, addr2, ports.v32, hashrnd) OR keep rxhash 32 bit, but make the following change: tcpu = map->cpus[((u64) skb->rxhash * map->len) >> 32]; to tcpu = map->cpus[((u32) skb->rxhash * map->len) >> 16]; In the original RPS patch code, rxhash is 32 bits. It is converted to CPU number by multiplying 32-bit rxhash value of 00000000-FFFFFFFF by CPU map length and shifting off lower 32 bits, effectivily giving value of 0 to (map->len - 1). In the RH6.x code, rxhash is 16 bits, so given our map->len never exceeds FFFF (having 65536+ CPUs is probably overboard a bit) multiplying it by map->len value and shifting lower 32 bits off will effectively zero result, leading to all trafic punted to the first CPU in the map list.
"keep rxhash 32 bit" must read as "keep rxhash 16 bit"
@Alex/AT, I cannot locate the code: if (rps_ipi_queued(sd)) ____napi_schedule(sd, &sd->backlog); in net/core/dev.c (kernel-2.6.32-131.21.1.el6) or any other files in this directory. Can you help?
Oops, sorry about that. Bug #1 is not applicable to RHEL code. The local bugreport somehow mixed with upstream bugreport. I'm currently descalating this back. Bug #2 (hashing / CPU punting bug) is intact in the RHEL 6.1 kernel code: line 2213 of dev.c states: skb->rxhash = jhash_3words(addr1, addr2, ports, hashrnd) >> 16; and then, line 2259 uses rxhash as follows: tcpu = map->cpus[((u64) skb->rxhash * map->len) >> 32]; 16-bit rxhash in that operation will result in wrong (zero for <65536 CPUs in map) map->cpus index
Also, we have checked "16-bit rxhash" and >> 16 version fix (not breaking ABI). It works but needs some testing to prove that CPU distribution is even.
yeah, the tcpu selection is wrong. Its the result of the RFS backport overwriting the RHEL specific fixes that wen't in to make the RPS backport operational. Line 2259 just needs to be reverted to be a 16 bit shift rather than a 32 bit shift.
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3965841
buildroot problem in the build cluster, should be fixed now, new build http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3969070
http://people.redhat.com/nhorman/rpms/bz757040.tbz2 Fixed kernel and firmware for your testing.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Thanks. May it please be an SRPM for testing? I cannot install generic kernel on test router, router build requires some custom patches to be applied.
Created attachment 557437 [details] patch to revert previous bad change if you need to build something custom, you can just apply the patch on top of what you already have.
Tested and found OK.
Patch(es) available on kernel-2.6.32-230.el6
*** Bug 770739 has been marked as a duplicate of this bug. ***
Verified on kernel-2.6.32-274.el6. [root@hp-dl580g7-02 ~]# cat /sys/class/net/eth0/queues/rx-0/rps_cpus 0000,00000000,00000000 [root@hp-dl580g7-02 ~]# echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus [root@hp-dl580g7-02 ~]# cat /sys/class/net/eth0/queues/rx-0/rps_cpus 0000,00000000,0000000f [root@hp-dl580g7-02 ~]# And RPS work as expected. Set Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0862.html