Bug 1080421

Summary: [abrt] numactl: numa_distance(): numactl killed by SIGSEGV
Product: [Fedora] Fedora Reporter: Jeff Bastian <jbastian>
Component: numactlAssignee: Petr Holasek <pholasek>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: bgray, dhoward, pholasek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/210ca12dd2e036135a26e4596e0fad19a1f0d6cd
Whiteboard: abrt_hash:0680cd6b745b0fd5f2e244b325c129987408db59
Fixed In Version: numactl-2.0.9-2.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-08-15 02:41:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
File: cgroup
none
File: core_backtrace
none
File: dso_list
none
File: environ
none
File: exploitable
none
File: limits
none
File: maps
none
File: open_fds
none
File: proc_pid_status
none
File: var_log_messages
none
patch to check for NUMA none

Description Jeff Bastian 2014-03-25 12:12:28 UTC
Description of problem:
I recompiled the kernel without NUMA support (while trying to debug another NUMA issue) and ran 'numactl --hardware' and it crashed with a seg fault.

The problem is that /sys/devices/system/node does not exist on non-NUMA systems, but numactl ignoes the ENOENT error it receives when trying to access this path.

The bug lies in the read_distance_table() function just above in distance.c:52

In particular, this loop:

 61     for (nd = 0;; nd++) { 
 62         char fn[100];
 63         FILE *dfh;
 64         sprintf(fn, "/sys/devices/system/node/node%d/distance", nd);
 65         dfh = fopen(fn, "r");
 66         if (!dfh) {
 67             if (errno == ENOENT)
 68                 err = 0;
 69             if (!err && nd<maxnode)
 70                 continue;
 71             else
 72                 break;
 73         }


There are no /sys/devices/system/node/node*/distance files on this system, so fopen (line 65) repeatedly fails and eventually it breaks out of the loop (line 72).

Then it continues with:

 89     free(line);
 90     if (err)  {
 91         numa_warn(W_distance,
 92               "Cannot parse distance information in sysfs: %s",
 93               strerror(errno));
 94         free(table);
 95         return err;
 96     }


The problem is that err is 0 due to line 68 above which considers ENOENT as a non-error, so it skips over this warning and continues with

106     distance_table = table;
107     return 0;
108 }

And table is still set to 0x0 (the initial value):

(gdb) p distance_table
$4 = (int *) 0x0

Which then leads to the segfault and crash here in distance.c:117
    return distance_table[a * distance_numnodes + b];


Why is ENOENT treated like a non-error?

 67             if (errno == ENOENT)
 68                 err = 0;

Version-Release number of selected component:
numactl-2.0.9-1.fc20

Additional info:
reporter:       libreport-2.2.0
backtrace_rating: 4
cmdline:        numactl --hardware
crash_function: numa_distance
executable:     /usr/bin/numactl
kernel:         3.13.7-200.nonuma.fc20.x86_64
runlevel:       N 5
type:           CCpp
uid:            0

Truncated backtrace:
Thread no. 1 (3 frames)
 #0 numa_distance at distance.c:117
 #1 print_distances at numactl.c:201
 #2 hardware at numactl.c:294

Comment 1 Jeff Bastian 2014-03-25 12:12:32 UTC
Created attachment 878426 [details]
File: backtrace

Comment 2 Jeff Bastian 2014-03-25 12:12:34 UTC
Created attachment 878427 [details]
File: cgroup

Comment 3 Jeff Bastian 2014-03-25 12:12:35 UTC
Created attachment 878428 [details]
File: core_backtrace

Comment 4 Jeff Bastian 2014-03-25 12:12:37 UTC
Created attachment 878429 [details]
File: dso_list

Comment 5 Jeff Bastian 2014-03-25 12:12:38 UTC
Created attachment 878430 [details]
File: environ

Comment 6 Jeff Bastian 2014-03-25 12:12:40 UTC
Created attachment 878431 [details]
File: exploitable

Comment 7 Jeff Bastian 2014-03-25 12:12:41 UTC
Created attachment 878432 [details]
File: limits

Comment 8 Jeff Bastian 2014-03-25 12:12:48 UTC
Created attachment 878433 [details]
File: maps

Comment 9 Jeff Bastian 2014-03-25 12:12:50 UTC
Created attachment 878434 [details]
File: open_fds

Comment 10 Jeff Bastian 2014-03-25 12:12:52 UTC
Created attachment 878435 [details]
File: proc_pid_status

Comment 11 Jeff Bastian 2014-03-25 12:12:53 UTC
Created attachment 878436 [details]
File: var_log_messages

Comment 12 Jeff Bastian 2014-03-25 12:20:45 UTC
A scratch build of the non-NUMA kernel is here (until koji cleans it up):
http://koji.fedoraproject.org/koji/taskinfo?taskID=6670131

Comment 15 Jeff Bastian 2014-07-15 17:05:21 UTC
Created attachment 918201 [details]
patch to check for NUMA

A patch from upstream to check for NUMA and avoid a segfault on non-NUMA systems
http://blog.gmane.org/gmane.linux.kernel.numa/month=20131123

Comment 16 Fedora Update System 2014-07-31 14:15:00 UTC
numactl-2.0.9-2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/numactl-2.0.9-2.fc20

Comment 17 Fedora Update System 2014-08-01 06:06:29 UTC
Package numactl-2.0.9-2.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing numactl-2.0.9-2.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-9089/numactl-2.0.9-2.fc20
then log in and leave karma (feedback).

Comment 18 Jeff Bastian 2014-08-12 14:06:33 UTC
I tested again with a non-NUMA custom kernel and verified that the update no longer segfaults, but instead exits cleanly with a nice error message:

[root@localhost ~]# uname -r
3.15.8-200.nonuma.fc20.x86_64

[root@localhost ~]# rpm -q numactl
numactl-2.0.9-1.fc20.x86_64

[root@localhost ~]# numactl --hardware
available: 0 nodes ()
Segmentation fault

[root@localhost ~]# yum update numactl*
...

[root@localhost ~]# rpm -q numactl
numactl-2.0.9-2.fc20.x86_64

[root@localhost ~]# numactl --hardware
No NUMA available on this system

Comment 19 Fedora Update System 2014-08-15 02:41:34 UTC
numactl-2.0.9-2.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.