Bug 1314459 - lstopo, openmpi and hwloc-info segfault in hwloc_obj_cmp() on VM
lstopo, openmpi and hwloc-info segfault in hwloc_obj_cmp() on VM
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: hwloc (Show other bugs)
7.2
x86_64 Linux
unspecified Severity high
: rc
: 7.3
Assigned To: Don Zickus
Mike Gahagan
: Patch
Depends On:
Blocks: 1274397
  Show dependency treegraph
 
Reported: 2016-03-03 11:36 EST by Orion Poplawski
Modified: 2016-11-04 04:12 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-11-04 04:12:10 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
dmidecode (4.20 KB, text/plain)
2016-03-03 11:36 EST, Orion Poplawski
no flags Details
/proc/cpuinfo (2.65 KB, text/plain)
2016-03-03 11:38 EST, Orion Poplawski
no flags Details
Minimal patch to fix hwloc 1.7 segfault (564 bytes, patch)
2016-08-16 07:23 EDT, Dimitry Andric
no flags Details | Diff

  None (edit)
Description Orion Poplawski 2016-03-03 11:36:58 EST
Created attachment 1132862 [details]
dmidecode

Description of problem:


Running lstopo or openmpi 1.10.1 compiled code on a VM results in a segementation fault.


Version-Release number of selected component (if applicable):
hwloc-1.7-5.el7.x86_64

How reproducible:
Everytime

Steps to Reproduce:
1.
2.
3.

Core was generated by `lstopo'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fdd44e71076 in __strcmp_sse42 () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fdd44e71076 in __strcmp_sse42 () from /lib64/libc.so.6
#1  0x00007fdd4661bccb in hwloc_obj_cmp (obj1=obj1@entry=0x20ae940, 
    obj2=obj2@entry=0x20ae7d0) at topology.c:569
#2  0x00007fdd4661be87 in hwloc___insert_object_by_cpuset (
    report_error=0x7fdd4661aef0 <hwloc_report_os_error>, obj=0x20ae940, 
    cur=<optimized out>, topology=0x20a9e40) at topology.c:669
#3  hwloc__insert_object_by_cpuset (topology=topology@entry=0x20a9e40, 
    obj=obj@entry=0x20ae940, 
    report_error=report_error@entry=0x7fdd4661aef0 <hwloc_report_os_error>)
    at topology.c:843
#4  0x00007fdd4661c30c in hwloc_insert_object_by_cpuset (
    topology=topology@entry=0x20a9e40, obj=obj@entry=0x20ae940)
    at topology.c:855
#5  0x00007fdd4663596e in summarize (topology=0x20a9e40, infos=0x20ae250, 
    nbprocs=4, fulldiscovery=0) at topology-x86.c:559
#6  0x00007fdd46636828 in hwloc_look_x86 (topology=topology@entry=0x20a9e40, 
    nbprocs=nbprocs@entry=4, fulldiscovery=fulldiscovery@entry=0)
    at topology-x86.c:835
#7  0x00007fdd466368a3 in hwloc_x86_discover (backend=<optimized out>)
    at topology-x86.c:876
#8  0x00007fdd4661e8bb in hwloc_discover (topology=0x20a9e40)
    at topology.c:2157
#9  hwloc_topology_load (topology=topology@entry=0x20a9e40) at topology.c:2648
#10 0x000000000040334a in main (argc=<optimized out>, argv=<optimized out>)
    at lstopo.c:559
(gdb) up
#1  0x00007fdd4661bccb in hwloc_obj_cmp (obj1=obj1@entry=0x20ae940, 
    obj2=obj2@entry=0x20ae7d0) at topology.c:569
569	          int res = strcmp(obj1->name, obj2->name);
(gdb) print obj1->name
$1 = 0x0
(gdb) print obj2->name
$2 = 0x0
Comment 1 Orion Poplawski 2016-03-03 11:38 EST
Created attachment 1132864 [details]
/proc/cpuinfo
Comment 2 Orion Poplawski 2016-03-03 11:47:15 EST
I can't reproduce on a "stock" VM, so this is probably triggered by this particular VM having a different cpu configuration:

>   <cpu mode='custom' match='exact'>
>     <model fallback='allow'>Nehalem</model>
>     <vendor>Intel</vendor>
>     <feature policy='require' name='tm2'/>
>     <feature policy='require' name='est'/>
>     <feature policy='require' name='monitor'/>
>     <feature policy='require' name='ds'/>
>     <feature policy='require' name='ss'/>
>     <feature policy='require' name='vme'/>
>     <feature policy='require' name='dtes64'/>
>     <feature policy='require' name='rdtscp'/>
>     <feature policy='require' name='ht'/>
>     <feature policy='require' name='dca'/>
>     <feature policy='require' name='pbe'/>
>     <feature policy='require' name='tm'/>
>     <feature policy='require' name='pdcm'/>
>     <feature policy='require' name='vmx'/>
>     <feature policy='require' name='ds_cpl'/>
>     <feature policy='require' name='xtpr'/>
>     <feature policy='require' name='acpi'/>
>     <feature policy='require' name='invtsc'/>
>   </cpu>
Comment 4 Orion Poplawski 2016-03-03 12:04:35 EST
Downgrading to 1.7-3.el7 fixes the issue.
Comment 5 Divya 2016-03-08 05:35:42 EST
hwloc-info crashes when run on VM. Looks like for hwloc_obj name field is NULL which causes the crash: 


Core was generated by `hwloc-info'.
Program terminated with signal 11, Segmentation fault.
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
164		movdqu	(%rdi), %xmm1
(gdb) bt
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
#1  0x00007f6d46e03ccb in hwloc_obj_cmp (obj1=obj1@entry=0x2266ce0, obj2=obj2@entry=0x2266b40) at topology.c:569
#2  0x00007f6d46e03e87 in hwloc___insert_object_by_cpuset (report_error=0x7f6d46e02ef0 <hwloc_report_os_error>, obj=0x2266ce0, 
    cur=<optimized out>, topology=0x2263a40) at topology.c:669
#3  hwloc__insert_object_by_cpuset (topology=topology@entry=0x2263a40, obj=obj@entry=0x2266ce0, 
    report_error=report_error@entry=0x7f6d46e02ef0 <hwloc_report_os_error>) at topology.c:843
#4  0x00007f6d46e0430c in hwloc_insert_object_by_cpuset (topology=topology@entry=0x2263a40, obj=obj@entry=0x2266ce0)
    at topology.c:855
#5  0x00007f6d46e1d96e in summarize (topology=topology@entry=0x2263a40, infos=infos@entry=0x22667e0, nbprocs=nbprocs@entry=2, 
    fulldiscovery=fulldiscovery@entry=0) at topology-x86.c:559
#6  0x00007f6d46e1e818 in hwloc_look_x86 (topology=topology@entry=0x2263a40, nbprocs=nbprocs@entry=2, 
    fulldiscovery=fulldiscovery@entry=0) at topology-x86.c:835
#7  0x00007f6d46e1e893 in hwloc_x86_discover (backend=<optimized out>) at topology-x86.c:876
#8  0x00007f6d46e068ab in hwloc_discover (topology=0x2263a40) at topology.c:2157
#9  hwloc_topology_load (topology=topology@entry=0x2263a40) at topology.c:2648
#10 0x0000000000401859 in main (argc=<optimized out>, argv=<optimized out>) at hwloc-info.c:384
(gdb) f 0 
#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
164		movdqu	(%rdi), %xmm1
(gdb) info registers rdi
rdi            0x0	0

(gdb) f 1
#1  0x00007f6d46e03ccb in hwloc_obj_cmp (obj1=obj1@entry=0x2266ce0, obj2=obj2@entry=0x2266b40) at topology.c:569
569	          int res = strcmp(obj1->name, obj2->name);
(gdb) p obj1->name
$2 = 0x0
(gdb) p obj2->name
$3 = 0x0

(gdb) p *obj1
$4 = {type = HWLOC_OBJ_MISC, os_index = 0, name = 0x0, memory = {total_memory = 0, local_memory = 0, page_types_len = 0, 
    page_types = 0x0}, attr = 0x2266de0, depth = 0, logical_index = 0, os_level = 1, next_cousin = 0x0, prev_cousin = 0x0, 
  parent = 0x0, sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 0, children = 0x0, first_child = 0x0, 
  last_child = 0x0, userdata = 0x0, cpuset = 0x2266c70, complete_cpuset = 0x0, online_cpuset = 0x0, allowed_cpuset = 0x0, 
  nodeset = 0x0, complete_nodeset = 0x0, allowed_nodeset = 0x0, distances = 0x0, distances_count = 0, infos = 0x0, 
  infos_count = 0, symmetric_subtree = 0}
(gdb) p *obj2
$5 = {type = HWLOC_OBJ_MISC, os_index = 0, name = 0x0, memory = {total_memory = 0, local_memory = 0, page_types_len = 0, 
    page_types = 0x0}, attr = 0x2266c40, depth = 0, logical_index = 0, os_level = 2, next_cousin = 0x0, prev_cousin = 0x0, 
  parent = 0x0, sibling_rank = 0, next_sibling = 0x0, prev_sibling = 0x0, arity = 0, children = 0x0, first_child = 0x22649d0, 
  last_child = 0x0, userdata = 0x0, cpuset = 0x2266ad0, complete_cpuset = 0x0, online_cpuset = 0x0, allowed_cpuset = 0x0, 
  nodeset = 0x0, complete_nodeset = 0x0, allowed_nodeset = 0x0, distances = 0x0, distances_count = 0, infos = 0x0, 
  infos_count = 0, symmetric_subtree = 0}
Comment 8 Don Zickus 2016-05-26 12:36:52 EDT
Hi,

Please test the following rpms to see if your issue has been resolved.  This is a rebase due to the large request of other features.  If this rebase causes issues with your usage, please let us know so we can futher evaluate how we want to distribute requested features by other customers.  The API should be backwards compatible if you have applications linking to the current hwloc-libs.

http://people.redhat.com/dzickus/rhel7/.hwloc_8d5e1809e13/

Cheers,
Don
Comment 9 Orion Poplawski 2016-06-27 11:16:37 EDT
Sorry for the delay.  Looks good to me.
Comment 11 Dimitry Andric 2016-08-16 07:23 EDT
Created attachment 1191220 [details]
Minimal patch to fix hwloc 1.7 segfault

FWIW, I have been using this minimized patch to fix the segfaults in hwloc.  This was easier for me deploy, and it has minimal impact, as far as I could determine.
Comment 12 Mike Gahagan 2016-09-14 14:11:46 EDT
Confirmed hwloc-1.11.2-1.el7 has fixed this issue.
Comment 14 errata-xmlrpc 2016-11-04 04:12:10 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2535.html

Note You need to log in before you can comment on or make changes to this bug.