Bug 1411116

Summary: java-1.8.0-openjdk: SIGSEGV (0xb)
Product: [Fedora] Fedora Reporter: gil cattaneo <puntogil>
Component: java-1.8.0-openjdkAssignee: Andrew John Hughes <ahughes>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: ahughes, arik, cesarb, dbhole, enrico.tagliavini, jerboaa, jvanek, mikko.tiihonen, msrb, omajid, sgehwolf, thelan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-24 22:43:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Java crash error file containing register values none

Description gil cattaneo 2017-01-08 14:04:49 UTC
Description of problem:
During the koschei rebuild of some packages [1] i get:

Running TestSuite
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fec343aa101, pid=29049, tid=0x00007fec0ef02700
#
# JRE version: OpenJDK Runtime Environment (8.0_111-b16) (build 1.8.0_111-b16)
# Java VM: OpenJDK 64-Bit Server VM (25.111-b16 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x157101]  __memmove_avx_unaligned_erms+0x211
#
# Core dump written. Default location: /builddir/build/BUILD/async-http-client-async-http-client-1.9.40/core or core.29049

Running org.jsslutils.sslcontext.test.PKIXNoExplicitCrlTest
Server listening at: https://localhost:31051/
Accepted connection.
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00003fff79122184, pid=13647, tid=0x00003fff5d34f190
#
# JRE version: OpenJDK Runtime Environment (8.0_111-b16) (build 1.8.0_111-b16)
# Java VM: OpenJDK 64-Bit Server VM (25.111-b16 mixed mode linux-ppc64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0xb2184]  __memcpy_power7+0xa4
#
# Core dump written. Default location: /builddir/build/BUILD/jsslutils-1.0.7/jsslutils/core or core.13647
Version-Release number of selected component (if applicable):


[1] https://apps.fedoraproject.org/koschei/package/async-http-client1?collection=f26
https://apps.fedoraproject.org/koschei/package/jsslutils?collection=f26

Comment 1 Mikko Tiihonen 2017-01-09 12:32:24 UTC
I also get this on rawhide after upgrading to openjdk 1.8.0.111-3.b16 that has this in rpm changelog:
"java SSL/TLS implementation: should follow the policies of system-wide crypto policy"

After that update opening an SSL connection might randomly crash the JVM.

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x157101]  __memmove_avx_unaligned_erms+0x211
C  [libsunec.so+0x1220]
C  [libsunec.so+0x138a]  Java_sun_security_ec_ECKeyPairGenerator_generateECKeyPair+0x12a
j  sun.security.ec.ECKeyPairGenerator.generateECKeyPair(I[B[B)[Ljava/lang/Object;+0
j  sun.security.ec.ECKeyPairGenerator.generateKeyPair()Ljava/security/KeyPair;+56
j  java.security.KeyPairGenerator$Delegate.generateKeyPair()Ljava/security/KeyPair;+23
j  sun.security.ssl.ECDHCrypt.<init>(Ljava/security/spec/ECParameterSpec;Ljava/security/SecureRandom;)V+17
j  sun.security.ssl.ClientHandshaker.serverKeyExchange(Lsun/security/ssl/HandshakeMessage$ECDH_ServerKeyExchange;)V+44
j  sun.security.ssl.ClientHandshaker.processMessage(BI)V+582
j  sun.security.ssl.Handshaker.processLoop()V+96
j  sun.security.ssl.Handshaker.process_record(Lsun/security/ssl/InputRecord;Z)V+24
j  sun.security.ssl.SSLSocketImpl.readRecord(Lsun/security/ssl/InputRecord;Z)V+357
j  sun.security.ssl.SSLSocketImpl.performInitialHandshake()V+84

Comment 2 Mikko Tiihonen 2017-01-09 12:34:03 UTC
Created attachment 1238744 [details]
Java crash error file containing register values

Comment 3 Deepak Bhole 2017-01-10 19:28:54 UTC
Hi, is this consistently reproducible?

Comment 4 Mikko Tiihonen 2017-01-11 09:41:13 UTC
Yes, easily reproducable.

I just tried to run "mvn -U versions:display-dependency-updates" on my project. 20/20 times it crashed with the same SIGSEGV error.

Luckily when I get my IDE running after few tries it no longer tries to open network connections and stays up.

Comment 5 Deepak Bhole 2017-01-11 20:52:19 UTC
(In reply to Mikko Tiihonen from comment #4)
> Yes, easily reproducable.
> 
> I just tried to run "mvn -U versions:display-dependency-updates" on my
> project. 20/20 times it crashed with the same SIGSEGV error.
> 
> Luckily when I get my IDE running after few tries it no longer tries to open
> network connections and stays up.

Thanks, are you able to put up relevant bits of your projects online somewhere so that we can reproduce this?

Comment 6 Mikko Tiihonen 2017-01-12 13:51:51 UTC
This morning I got the update to 1.8.0.111-5.b16.fc26 and the problem vanished.

I just tried to downgrade back to 1.8.0.111-3.b16.fc26 (downloaded from koji) that I was using when I reported the problem and the crashes started again occuring.

So it seems that the problem is fixed in newer version.

A simple way to reproduce this is to run:

rm -rf ~/.m2/repository/ && mvn archetype:generate -DgroupId=com.mycompany.app -DartifactId=my-app -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

With the -3 build it crashes within few seconds. With -5 it finishes successfully

Comment 7 Severin Gehwolf 2017-01-12 15:08:25 UTC
(In reply to Mikko Tiihonen from comment #6)
> This morning I got the update to 1.8.0.111-5.b16.fc26 and the problem
> vanished.
> 
> I just tried to downgrade back to 1.8.0.111-3.b16.fc26 (downloaded from
> koji) that I was using when I reported the problem and the crashes started
> again occuring.
> 
> So it seems that the problem is fixed in newer version.
> 
> A simple way to reproduce this is to run:
> 
> rm -rf ~/.m2/repository/ && mvn archetype:generate
> -DgroupId=com.mycompany.app -DartifactId=my-app
> -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
> 
> With the -3 build it crashes within few seconds. With -5 it finishes
> successfully

Did you happen to have multiple arches installed? The only relevant change between -3 and -5 it seems was to introduce arch specific requires for its own subpackages (of java-1.8.0-openjdk).

Comment 8 Mikko Tiihonen 2017-01-12 15:15:02 UTC
Nope, Only the x86_64 version of the openjdk is installed. I also wondered the terse changelog that does not mention any relevant changes. I wonder what the -4 version changed.

Can you reproduce it on the older version?

Maybe it is some strange alignment or initialization order issue that just might magically appear again in new builds if it is not fixed.

Comment 9 Mikko Tiihonen 2017-01-12 16:03:02 UTC
As an additional datapoint I also tested the -4 version from koji. It also crashes.

Comment 10 Cesar Eduardo Barros 2017-01-24 17:11:35 UTC
I'm also seeing these crashes, in both Maven and IDEA, but in java-1.8.0-openjdk-1.8.0.111-5.b16.fc24.x86_64, so it's still happening in the -5 version. Mine are in __memcpy_avx_unaligned:

Stack: [0x00007f682c5fc000,0x00007f682c6fd000],  sp=0x00007f682c6fa638,  free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x14a734]  __memcpy_avx_unaligned+0x2c4
C  [libsunec.so+0x1240]
C  [libsunec.so+0x13aa]  Java_sun_security_ec_ECKeyPairGenerator_generateECKeyPair+0x12a
j  sun.security.ec.ECKeyPairGenerator.generateECKeyPair(I[B[B)[Ljava/lang/Object;+0
j  sun.security.ec.ECKeyPairGenerator.generateKeyPair()Ljava/security/KeyPair;+56
j  java.security.KeyPairGenerator$Delegate.generateKeyPair()Ljava/security/KeyPair;+23
j  sun.security.ssl.ECDHCrypt.<init>(Ljava/security/spec/ECParameterSpec;Ljava/security/SecureRandom;)V+17
j  sun.security.ssl.ClientHandshaker.serverKeyExchange(Lsun/security/ssl/HandshakeMessage$ECDH_ServerKeyExchange;)V+44
[...]

Of note are the registers. Mine are:

RAX=0x00000000bad35fc0 is pointing into object: 0x00000000bad35fb0
[B 
 - klass: {type array byte}
 - length: 311656120
RBX=0x00007f68481f0800 is a thread
RCX=0x0000000012937eb8 is an unknown value
RDX=0x0000000012937eb8 is an unknown value
RSP=0x00007f682c6fa638 is pointing into the stack for thread: 0x00007f68481f0800
RBP=0x00007f682c6fa6b0 is pointing into the stack for thread: 0x00007f68481f0800
RSI=0x0000000000000000 is an unknown value
RDI=0x00000000bad35fc0 is pointing into object: 0x00000000bad35fb0
[B 
 - klass: {type array byte}
 - length: 311656120
R8 =0x0000000000000000 is an unknown value
R9 =0x0000000004000001 is an unknown value
R10=0x0000000000000001 is an unknown value
R11=0x0000000000000283 is an unknown value
R12=0x0000000012937eb8 is an unknown value
R13=0x0000000000000000 is an unknown value
R14=0x00007f6810029ad0 is an unknown value
R15=0x00007f68481f0a58 is an unknown value

Notice that RDI is pointing to a Java-allocated array of around 300 megabytes, and RDX (which should be the memcpy size parameter) is exactly the size of that array. And RSI is a null pointer.

The same can be found in attachment #1238744 [details]: RDI points to a Java-allocated array of around 1.7 gigabytes, RDX is exactly that length, and RSI is a null pointer.

The faulting instruction, c5 fe 6f 26, is "vmovdqu (%rsi),%ymm4", so it's a null pointer dereference.

The questions are: *why* is memcpy receiving a null pointer as its source parameter, and *why* is Java allocating a huge array as destination for the memcpy from the null pointer?

Comment 11 Severin Gehwolf 2017-01-24 17:15:32 UTC
(In reply to Cesar Eduardo Barros from comment #10)
> I'm also seeing these crashes, in both Maven and IDEA, but in
> java-1.8.0-openjdk-1.8.0.111-5.b16.fc24.x86_64, so it's still happening in
> the -5 version. Mine are in __memcpy_avx_unaligned:
> 
> Stack: [0x00007f682c5fc000,0x00007f682c6fd000],  sp=0x00007f682c6fa638, 
> free space=1017k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> C  [libc.so.6+0x14a734]  __memcpy_avx_unaligned+0x2c4
> C  [libsunec.so+0x1240]
> C  [libsunec.so+0x13aa] 
> Java_sun_security_ec_ECKeyPairGenerator_generateECKeyPair+0x12a
> j 
> sun.security.ec.ECKeyPairGenerator.generateECKeyPair(I[B[B)[Ljava/lang/
> Object;+0
> j 
> sun.security.ec.ECKeyPairGenerator.generateKeyPair()Ljava/security/KeyPair;
> +56
> j 
> java.security.KeyPairGenerator$Delegate.generateKeyPair()Ljava/security/
> KeyPair;+23
> j 
> sun.security.ssl.ECDHCrypt.<init>(Ljava/security/spec/ECParameterSpec;Ljava/
> security/SecureRandom;)V+17
> j 
> sun.security.ssl.ClientHandshaker.serverKeyExchange(Lsun/security/ssl/
> HandshakeMessage$ECDH_ServerKeyExchange;)V+44
> [...]

This suggests you are hitting bug 1415137. You can verify if that is the case by downgrading nss* packages. If the problem goes away for you, you are hitting bug 1415137, not this one.

Comment 12 Severin Gehwolf 2017-01-24 17:17:04 UTC
Actually, its probably the same bug :) We should close one as a duplicate.

Comment 13 Cesar Eduardo Barros 2017-01-24 17:53:03 UTC
All right, I think I found something interesting.

The most probable source for the failure we're seeing is at Java_sun_security_ec_ECKeyPairGenerator_generateECKeyPair *after* the EC_NewKey. The suspicious lines for me are the pair of calls to getEncodedBytes (defined just above it) which do a memcpy from a structure returned by EC_NewKey into a newly allocated Java array.

Now look at https://hg.mozilla.org/projects/nss/rev/047ab976840a which introduces Curve25519; it changes ec_NewKey to use ecParams->pointSize instead of a formula based on ecParams->fieldID.size, when allocating precisely one of the structures I found suspicious. The ecParams comes from the Java code, and where in the Java code is ecParams->pointSize set? Nowhere!

If you look at EC_DecodeParams on the Java side, you see that it has code to set ecParams->fieldID.size, but no code to set ecParams->pointSize.

Therefore, my conclusion is that the fault is on the Java side, which is trying to fill the ecParams struct manually instead of calling a NSS function to fill it, and missing a (newly introduced) field.

Then the pointSize field has garbage, so when it is large enough, NSS fails the allocation, resulting in both the null pointer and the large enough (garbage) value. When the value is by chance small enough, it doesn't return a null pointer and therefore won't crash (but it can result in the OutOfMemory exceptions I've been seeing in IntelliJ).

Comment 14 Cesar Eduardo Barros 2017-01-24 18:05:09 UTC
So, one possible solution to this particular issue would be to add a dependency on the latest NSS and set that field manually to whichever value it was supposed to have.

There's no telling, however, which other crazy things the Java code was doing which would break with that change. Someone who understands both code bases should take a careful look.

Comment 15 Andrew John Hughes 2017-01-24 22:43:35 UTC

*** This bug has been marked as a duplicate of bug 1415137 ***