Bug 1516277

Summary: Crash in os::Linux::rebuild_cpu_to_node_map ()
Product: Red Hat Enterprise Linux 7 Reporter: Deepu K S <dkochuka>
Component: java-1.8.0-openjdkAssignee: Andrew John Hughes <ahughes>
Status: CLOSED ERRATA QA Contact: Lukáš Zachar <lzachar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.4CC: ahughes, dbhole, dkochuka, dkutalek, jdoyle, jvanek, rajkumar.prabakaran
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: java-1.8.0-openjdk-1.8.0.152-1.b16.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 15:52:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1508017    
Bug Blocks:    

Description Deepu K S 2017-11-22 11:57:28 UTC
Description of problem:
Running JVM with NUMA enabled makes it to crash in os::Linux::rebuild_cpu_to_node_map ()

This was reported when starting cassandra.service with -XX:+UseNUMA jvm options.
# cat /etc/systemd/system/cassandra.service.d/90-ExecStart_NUMA.conf 
[Service]
Environment=CASSANDRA_NUMA_NODE=1
Environment=CASSANDRA_CPUS=1,3,5
ExecStart=
ExecStart=/usr/bin/numactl --cpunodebind=${CASSANDRA_NUMA_NODE} --physcpubind=${CASSANDRA_CPUS} -- /usr/sbin/cassandra -p /run/cassandra/cassandra.pid

Can be simply reproduced by ;
# /usr/bin/numactl --cpunodebind=1 --physcpubind=1,3,5 java -XX:+UseNUMA -version

Backtrace :
Core was generated by `java -cp /etc/cassandra/conf:/usr/share/cassandra/lib/airline-0.6.jar:/usr/shar'.
Program terminated with signal 6, Aborted.
#0  0x00007f8dfcde41f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f8dfcde41f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f8dfcde58e8 in __GI_abort () at abort.c:90
#2  0x00007f8dfce23f47 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f8dfcf30608 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f8dfce2b619 in malloc_printerr (ar_ptr=0x7f8dfd16b760 <main_arena>, ptr=<optimized out>, str=0x7f8dfcf30710 "double free or corruption (out)", action=3) at malloc.c:5023
#4  _int_free (av=0x7f8dfd16b760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3845
#5  0x00007f8dfc698913 in FreeHeap (memflags=mtInternal, p=<optimized out>)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/openjdk/hotspot/src/share/vm/memory/allocation.inline.hpp:93
#6  os::Linux::rebuild_cpu_to_node_map () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:2975
#7  0x00007f8dfc698e71 in os::Linux::libnuma_init () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:2891
#8  0x00007f8dfc69c495 in os::init_2 () at /usr/src/debug/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:5038
#9  0x00007f8dfc843d04 in Threads::create_vm (args=<optimized out>, canTryAgain=canTryAgain@entry=0x7f8dfdbbddc0)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3399
#10 0x00007f8dfc47d4f5 in JNI_CreateJavaVM (vm=0x7f8dfdbbdea0, penv=0x7f8dfdbbdeb0, args=<optimized out>)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/openjdk/hotspot/src/share/vm/prims/jni.cpp:5252
#11 0x00007f8dfd379057 in InitializeJVM (ifn=<synthetic pointer>, penv=0x7f8dfdbbdeb0, pvm=0x7f8dfdbbdea0)
    at /usr/src/debug/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/openjdk/jdk/src/share/bin/java.c:1215
#12 JavaMain (_args=<optimized out>) at /usr/src/debug/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/openjdk/jdk/src/share/bin/java.c:376
#13 0x00007f8dfd7a2e25 in start_thread (arg=0x7f8dfdbbe700) at pthread_create.c:308
#14 0x00007f8dfcea734d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb)
Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux 7.4
java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64
numactl-2.0.9-6.el7_2.x86_64
cassandra30-3.0.6-1.noarch

How reproducible:
Always

Steps to Reproduce:
1. Run JVM with XX:+UseNUMA on NUMA enabled system.
2. # numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10
node 0 size: 65439 MB
node 0 free: 62689 MB
node 1 cpus: 1 3 5 7 9 11
node 1 size: 65536 MB
node 1 free: 63358 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10 

3. # /usr/bin/numactl --cpunodebind=1 --physcpubind=1,3,5 java -XX:+UseNUMA -version

Actual results:
# /usr/bin/numactl --cpunodebind=1 --physcpubind=1,3,5 java -XX:+UseNUMA -version
*** Error in `java': double free or corruption (out): 0x00007fd450004070 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7c619)[0x7fd455c48619]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/lib/amd64/server/libjvm.so(+0x8b0913)[0x7fd4554b5913]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/lib/amd64/server/libjvm.so(+0x8b0e71)[0x7fd4554b5e71]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/lib/amd64/server/libjvm.so(+0x8b4495)[0x7fd4554b9495]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/lib/amd64/server/libjvm.so(+0xa5bd04)[0x7fd455660d04]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/lib/amd64/server/libjvm.so(JNI_CreateJavaVM+0x65)[0x7fd45529a4f5]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/bin/../lib/amd64/jli/libjli.so(+0x3057)[0x7fd456196057]
/lib64/libpthread.so.0(+0x7e25)[0x7fd4565bfe25]
/lib64/libc.so.6(clone+0x6d)[0x7fd455cc434d]
======= Memory map: ========
00400000-00401000 r-xp 00000000 fd:00 206714619                          /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/bin/java
00600000-00601000 r--p 00000000 fd:00 206714619                          /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/bin/java
00601000-00602000 rw-p 00001000 fd:00 206714619                          /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el7_4.x86_64/jre/bin/java
016d2000-016f3000 rw-p 00000000 00:00 0                                  [heap]
7fd44f9b5000-7fd44f9bf000 r-xp 00000000 fd:00 203377770                  /usr/lib64/libnuma.so.1
(...)


Expected results:
# /usr/bin/numactl --cpunodebind=1 --physcpubind=1,3,5 java -XX:+UseNUMA -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

Additional info:
Upstream OpenJDK Bug : https://bugs.openjdk.java.net/browse/JDK-8165153

Tested that below patch fixes the issue.
http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/65847ffbff14

Comment 2 Andrew John Hughes 2017-11-27 05:20:04 UTC
We'll be picking this fix up as part of bug 1508017.

Comment 6 Rajkumar Prabakaran 2018-02-06 05:55:31 UTC
Hi,

Is this package java-1.8.0-openjdk-1.8.0.152-1.b16.el7 available for use now or is it still in testing phase?

Comment 7 Lukáš Zachar 2018-02-06 10:00:01 UTC
Hi, you can use java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4 
which was released as part of January CPU in 
https://access.redhat.com/errata/RHSA-2018:0095

Comment 10 errata-xmlrpc 2018-04-10 15:52:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0872