Bug 1377488
| Summary: | aarch64: out of bounds array access on NUMA systems | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | David Daney <ddaney> |
| Component: | kernel-aarch64 | Assignee: | Kernel Drivers <hwkernel-mgr> |
| kernel-aarch64 sub component: | Platform Enablement | QA Contact: | Jeff Bastian <jbastian> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | ctatman, jcm, jfeeney, lmiksik, mlangsdo, rrichter |
| Version: | 7.3 | ||
| Target Milestone: | rc | ||
| Target Release: | 7.3 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-aarch64-4.5.0-12.el7 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-11-03 22:53:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1250216 | ||
|
Description
David Daney
2016-09-19 21:17:06 UTC
Brew with potential fix now here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11777838 (In reply to David Daney from comment #2) > Brew with potential fix now here: > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11777838 Now obsolete, don't use this fix. New version of potential fix is here: https://lkml.org/lkml/2016/9/20/532 These are the symptoms of the problem on kernel-4.5.0-9.el7.aarch64:
[root@localhost ~]# ps -e | grep 'kworker/u'
6 ? 00:00:00 kworker/u192:0
608 ? 00:00:00 kworker/u193:5
611 ? 00:00:00 kworker/u193:6
614 ? 00:00:00 kworker/u193:7
1012 ? 00:00:00 kworker/u192:2
3327 ? 00:00:00 kworker/u192:1
We can see two unbound workqueues with several worker threads per queue.
[root@localhost ~]# taskset -p 6
pid 6's current affinity mask: ffffffffffffffffffffffff
[root@localhost ~]# taskset -p 608
pid 608's current affinity mask: ffffffffffffffffffffffff
Look, both work queues have affinity to all 96 CPUs.
Should be like this:
[root@localhost ~]# ps -e | grep 'kworker/u'
6 ? 00:00:00 kworker/u192:0
7 ? 00:00:00 kworker/u193:0
253 ? 00:00:00 kworker/u194:0
.
.
.
[root@localhost ~]# taskset -p 6
pid 6's current affinity mask: ffffffffffffffffffffffff
[root@localhost ~]# taskset -p 7
pid 7's current affinity mask: ffffffffffff
[root@localhost ~]# taskset -p 253
pid 253's current affinity mask: ffffffffffff000000000000
First unbound work queue has affinity to all 96 CPUs
Second unbound work queue has affinity to node-0 CPUs (48 of them)
Third unbound work queue has affinity to node-1 CPUSs (the other 48)
(In reply to David Daney from comment #4) > New version of potential fix is here: > https://lkml.org/lkml/2016/9/20/532 brew build of this patch is here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11785057 Verified on cavium-thunderx2-02.khw.lab.eng.bos.redhat.com following the test steps in comment 5. :::::::::::: :: Before :: :::::::::::: [root@cavium-thunderx2-02 ~]# uname -r 4.5.0-10.el7.aarch64 [root@cavium-thunderx2-02 ~]# pgrep -laf kworker/u 6 kworker/u192:0 3941 kworker/u193:1 3954 kworker/u192:1 4001 kworker/u193:0 4021 kworker/u192:2 4071 kworker/u193:2 [root@cavium-thunderx2-02 ~]# for p in $(pgrep -laf kworker/u | awk '{print $1}') ; do taskset -p $p done pid 6's current affinity mask: ffffffffffffffffffffffff pid 3941's current affinity mask: ffffffffffffffffffffffff pid 3954's current affinity mask: ffffffffffffffffffffffff pid 4001's current affinity mask: ffffffffffffffffffffffff pid 4021's current affinity mask: ffffffffffffffffffffffff pid 4071's current affinity mask: ffffffffffffffffffffffff [root@cavium-thunderx2-02 ~]# ./hex2bin.py -d 96 ffffffffffffffffffffffff 111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 ::::::::::: :: After :: ::::::::::: [root@cavium-thunderx2-02 ~]# uname -r 4.5.0-13.el7.aarch64 [root@cavium-thunderx2-02 ~]# pgrep -laf kworker/u 6 kworker/u192:0 7 kworker/u193:0 8 kworker/u194:0 532 kworker/u194:1 597 kworker/u194:2 600 kworker/u194:3 603 kworker/u194:4 606 kworker/u194:5 609 kworker/u194:6 612 kworker/u194:7 615 kworker/u194:8 619 kworker/u194:9 623 kworker/u192:1 681 kworker/u193:1 892 kworker/u193:2 1085 kworker/u193:3 1128 kworker/u193:4 [root@cavium-thunderx2-02 ~]# for p in $(pgrep -laf kworker/u | awk '{print $1}') ; do taskset -p $p done pid 6's current affinity mask: ffffffffffffffffffffffff pid 7's current affinity mask: ffffffffffff pid 8's current affinity mask: ffffffffffff000000000000 pid 532's current affinity mask: ffffffffffff000000000000 pid 597's current affinity mask: ffffffffffff000000000000 pid 600's current affinity mask: ffffffffffff000000000000 pid 603's current affinity mask: ffffffffffff000000000000 pid 606's current affinity mask: ffffffffffff000000000000 pid 609's current affinity mask: ffffffffffff000000000000 pid 612's current affinity mask: ffffffffffff000000000000 pid 615's current affinity mask: ffffffffffff000000000000 pid 619's current affinity mask: ffffffffffff000000000000 pid 623's current affinity mask: ffffffffffffffffffffffff pid 681's current affinity mask: ffffffffffff pid 892's current affinity mask: ffffffffffff pid 1085's current affinity mask: ffffffffffff pid 1128's current affinity mask: ffffffffffff [root@cavium-thunderx2-02 ~]# ./hex2bin.py -d 96 ffffffffffffffffffffffff 111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 [root@cavium-thunderx2-02 ~]# ./hex2bin.py -d 96 ffffffffffff 000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111111111 [root@cavium-thunderx2-02 ~]# ./hex2bin.py -d 96 ffffffffffff000000000000 111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000 :::::::::::::::::::::::: :: hex2bin.py utility :: :::::::::::::::::::::::: #!/usr/bin/python import sys import argparse parser = argparse.ArgumentParser(description='Convert hex to binary.') parser.add_argument('N', metavar='N', help='number in hexadecimal to convert') parser.add_argument('-d', '--digits', type=int, help='digits to print (default is 32)') args = parser.parse_args() if args.digits is None: args.digits = 32 print bin(int(args.N, base=16))[2:].zfill(args.digits) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2145.html |