Bug 1080421 - [abrt] numactl: numa_distance(): numactl killed by SIGSEGV
Summary: [abrt] numactl: numa_distance(): numactl killed by SIGSEGV
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: numactl
Version: 20
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Petr Holasek
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:0680cd6b745b0fd5f2e244b325c...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-25 12:12 UTC by Jeff Bastian
Modified: 2016-10-04 04:11 UTC (History)
3 users (show)

Fixed In Version: numactl-2.0.9-2.fc20
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-15 02:41:34 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: backtrace (13.63 KB, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: cgroup (172 bytes, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: core_backtrace (1.19 KB, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: dso_list (407 bytes, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: environ (2.04 KB, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: exploitable (82 bytes, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: limits (1.29 KB, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: maps (2.24 KB, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: open_fds (105 bytes, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: proc_pid_status (761 bytes, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
File: var_log_messages (788 bytes, text/plain)
2014-03-25 12:12 UTC, Jeff Bastian
no flags Details
patch to check for NUMA (753 bytes, patch)
2014-07-15 17:05 UTC, Jeff Bastian
no flags Details | Diff

Description Jeff Bastian 2014-03-25 12:12:28 UTC
Description of problem:
I recompiled the kernel without NUMA support (while trying to debug another NUMA issue) and ran 'numactl --hardware' and it crashed with a seg fault.

The problem is that /sys/devices/system/node does not exist on non-NUMA systems, but numactl ignoes the ENOENT error it receives when trying to access this path.

The bug lies in the read_distance_table() function just above in distance.c:52

In particular, this loop:

 61     for (nd = 0;; nd++) { 
 62         char fn[100];
 63         FILE *dfh;
 64         sprintf(fn, "/sys/devices/system/node/node%d/distance", nd);
 65         dfh = fopen(fn, "r");
 66         if (!dfh) {
 67             if (errno == ENOENT)
 68                 err = 0;
 69             if (!err && nd<maxnode)
 70                 continue;
 71             else
 72                 break;
 73         }


There are no /sys/devices/system/node/node*/distance files on this system, so fopen (line 65) repeatedly fails and eventually it breaks out of the loop (line 72).

Then it continues with:

 89     free(line);
 90     if (err)  {
 91         numa_warn(W_distance,
 92               "Cannot parse distance information in sysfs: %s",
 93               strerror(errno));
 94         free(table);
 95         return err;
 96     }


The problem is that err is 0 due to line 68 above which considers ENOENT as a non-error, so it skips over this warning and continues with

106     distance_table = table;
107     return 0;
108 }

And table is still set to 0x0 (the initial value):

(gdb) p distance_table
$4 = (int *) 0x0

Which then leads to the segfault and crash here in distance.c:117
    return distance_table[a * distance_numnodes + b];


Why is ENOENT treated like a non-error?

 67             if (errno == ENOENT)
 68                 err = 0;

Version-Release number of selected component:
numactl-2.0.9-1.fc20

Additional info:
reporter:       libreport-2.2.0
backtrace_rating: 4
cmdline:        numactl --hardware
crash_function: numa_distance
executable:     /usr/bin/numactl
kernel:         3.13.7-200.nonuma.fc20.x86_64
runlevel:       N 5
type:           CCpp
uid:            0

Truncated backtrace:
Thread no. 1 (3 frames)
 #0 numa_distance at distance.c:117
 #1 print_distances at numactl.c:201
 #2 hardware at numactl.c:294

Comment 1 Jeff Bastian 2014-03-25 12:12:32 UTC
Created attachment 878426 [details]
File: backtrace

Comment 2 Jeff Bastian 2014-03-25 12:12:34 UTC
Created attachment 878427 [details]
File: cgroup

Comment 3 Jeff Bastian 2014-03-25 12:12:35 UTC
Created attachment 878428 [details]
File: core_backtrace

Comment 4 Jeff Bastian 2014-03-25 12:12:37 UTC
Created attachment 878429 [details]
File: dso_list

Comment 5 Jeff Bastian 2014-03-25 12:12:38 UTC
Created attachment 878430 [details]
File: environ

Comment 6 Jeff Bastian 2014-03-25 12:12:40 UTC
Created attachment 878431 [details]
File: exploitable

Comment 7 Jeff Bastian 2014-03-25 12:12:41 UTC
Created attachment 878432 [details]
File: limits

Comment 8 Jeff Bastian 2014-03-25 12:12:48 UTC
Created attachment 878433 [details]
File: maps

Comment 9 Jeff Bastian 2014-03-25 12:12:50 UTC
Created attachment 878434 [details]
File: open_fds

Comment 10 Jeff Bastian 2014-03-25 12:12:52 UTC
Created attachment 878435 [details]
File: proc_pid_status

Comment 11 Jeff Bastian 2014-03-25 12:12:53 UTC
Created attachment 878436 [details]
File: var_log_messages

Comment 12 Jeff Bastian 2014-03-25 12:20:45 UTC
A scratch build of the non-NUMA kernel is here (until koji cleans it up):
http://koji.fedoraproject.org/koji/taskinfo?taskID=6670131

Comment 15 Jeff Bastian 2014-07-15 17:05:21 UTC
Created attachment 918201 [details]
patch to check for NUMA

A patch from upstream to check for NUMA and avoid a segfault on non-NUMA systems
http://blog.gmane.org/gmane.linux.kernel.numa/month=20131123

Comment 16 Fedora Update System 2014-07-31 14:15:00 UTC
numactl-2.0.9-2.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/numactl-2.0.9-2.fc20

Comment 17 Fedora Update System 2014-08-01 06:06:29 UTC
Package numactl-2.0.9-2.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing numactl-2.0.9-2.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-9089/numactl-2.0.9-2.fc20
then log in and leave karma (feedback).

Comment 18 Jeff Bastian 2014-08-12 14:06:33 UTC
I tested again with a non-NUMA custom kernel and verified that the update no longer segfaults, but instead exits cleanly with a nice error message:

[root@localhost ~]# uname -r
3.15.8-200.nonuma.fc20.x86_64

[root@localhost ~]# rpm -q numactl
numactl-2.0.9-1.fc20.x86_64

[root@localhost ~]# numactl --hardware
available: 0 nodes ()
Segmentation fault

[root@localhost ~]# yum update numactl*
...

[root@localhost ~]# rpm -q numactl
numactl-2.0.9-2.fc20.x86_64

[root@localhost ~]# numactl --hardware
No NUMA available on this system

Comment 19 Fedora Update System 2014-08-15 02:41:34 UTC
numactl-2.0.9-2.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.