Bug 1209451 - openjdk: SIGSEGV after glibc update from 2.21.90-8.fc23 to 2.21.90-9.fc23
Summary: openjdk: SIGSEGV after glibc update from 2.21.90-8.fc23 to 2.21.90-9.fc23
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: java-1.8.0-openjdk
Version: 23
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Severin Gehwolf
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1209252 1209973 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-07 11:49 UTC by Michael Simacek
Modified: 2016-12-20 13:29 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-20 13:29:59 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Michael Simacek 2015-04-07 11:49:50 UTC
Description of problem:
Java applications started randomly crashing with SIGSEGV after minor version update of glibc from 2.21.90-8.fc23 to 2.21.90-9.fc23.
Example of such crash:
https://kojipkgs.fedoraproject.org/work/tasks/7108/9427108/build.log

According to koschei, the only dependency change since previous succesful build was glibc and libpng. I assume that libpng is not relevant here.
http://koschei.cloud.fedoraproject.org/package/httpcomponents-core/466855

Version-Release number of selected component (if applicable):
1:java-1.8.0-openjdk-1.8.0.40-19.b12.fc22
glibc-2.21.90-9.fc23

How reproducible:
The failures are random, but tend to be reproducible for any build that runs more than few seconds.

Steps to Reproduce:
1. Rebuild any Java package in mock or koji.

Additional info:
I don't know whether the bug is in openjdk or glibc. But if it's not trivial to fix, I'd suggest untagging latest glibc as temporary workaround. CC'ing glibc maintainer

Comment 1 Mikolaj Izdebski 2015-04-07 13:08:37 UTC
Reproducible for me as well.

Comment 2 Siddhesh Poyarekar 2015-04-07 15:33:56 UTC
I have reverted the last glibc rebase for now:

http://koji.fedoraproject.org/koji/taskinfo?taskID=9429482

So once that build finishes, you should be able to get on with your build.  I'll figure out what's broken later.

Comment 3 Severin Gehwolf 2015-04-08 15:25:59 UTC
So java-1.8.0-openjdk-1.8.0.40-19.b12.fc22 was the last one built with GCC 4.9. glibc with GCC 5. We might enter ABI incompatibility issues now until bug 1208369 is fixed.

Comment 4 Mat Booth 2015-04-08 16:00:09 UTC
FWIW, updating to glibc-2.21.90-10.fc23 fixes the seg faults for me -- I am now able to build java packages.

Comment 5 Vít Ondruch 2015-04-09 05:56:40 UTC
*** Bug 1209973 has been marked as a duplicate of this bug. ***

Comment 6 Siddhesh Poyarekar 2015-04-09 06:37:15 UTC
(In reply to Mat Booth from comment #4)
> FWIW, updating to glibc-2.21.90-10.fc23 fixes the seg faults for me -- I am
> now able to build java packages.

Thanks, I'll try to find out what broke it.  Assigning it to glibc.

Comment 7 Severin Gehwolf 2015-04-09 08:23:28 UTC
*** Bug 1209252 has been marked as a duplicate of this bug. ***

Comment 8 Siddhesh Poyarekar 2015-04-13 15:09:50 UTC
I thought this was due to this commit:

commit c26efef9798914e208329c0e8c3c73bb1135d9e3
Author: Mel Gorman <mgorman>
Date:   Thu Apr 2 12:14:14 2015 +0530

    malloc: Consistently apply trim_threshold to all heaps [BZ #17195]
    
and the segfault indeed seemed to go away at first glance.  However, looking closely, it looks like the segfault does not always happen and is likely some kind of race condition.

I ran this in an infinite loop with the patch reverted (and in fact, with -10.fc23):

while javac -classpath /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java/plplotjavacJNI.java -d /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java; do true; done

and it crashed eventually, indicating that the patch only seems to make the crash more frequent.  Running the above compilation command under valgrind spews out thousands of errors and it also seemed to cause the crash a bit more consistently, so it might be a good way to observe this behaviour.

I'll reassign this to openjdk.  Let me know if you want to keep this patch out till you figure out what's going on.  Otherwise I'll rebase by the end of the week.

Comment 9 Siddhesh Poyarekar 2015-04-13 15:11:22 UTC
(In reply to Siddhesh Poyarekar from comment #8)
> while javac -classpath
> /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java
> /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java/plplotjavacJNI.java
> -d /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java; do true; done

To be clear, while I tested the crash with this specific command, the build itself was failing on any one of the many example program compile commands.

Comment 10 Severin Gehwolf 2015-04-15 09:33:03 UTC
(In reply to Siddhesh Poyarekar from comment #8)
> I thought this was due to this commit:
> 
> commit c26efef9798914e208329c0e8c3c73bb1135d9e3
> Author: Mel Gorman <mgorman>
> Date:   Thu Apr 2 12:14:14 2015 +0530
> 
>     malloc: Consistently apply trim_threshold to all heaps [BZ #17195]
>     
> and the segfault indeed seemed to go away at first glance.  However, looking
> closely, it looks like the segfault does not always happen and is likely
> some kind of race condition.
> 
> I ran this in an infinite loop with the patch reverted (and in fact, with
> -10.fc23):
> 
> while javac -classpath
> /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java
> /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java/plplotjavacJNI.java
> -d /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java; do true; done
> 
> and it crashed eventually, indicating that the patch only seems to make the
> crash more frequent.  Running the above compilation command under valgrind
> spews out thousands of errors and it also seemed to cause the crash a bit
> more consistently, so it might be a good way to observe this behaviour.
> 
> I'll reassign this to openjdk.  Let me know if you want to keep this patch
> out till you figure out what's going on.  Otherwise I'll rebase by the end
> of the week.

Thanks for the analysis Siddhesh. With bug 1208369 fixed I think I'll be able to make some progress on this one. I'd rather reproduce this problem with a GCC 5 compiled openjdk in order to rule out ABI problems. I'll let you know later today as to how to proceed with the glibc rebase from an openjdk perspective.

Comment 11 Severin Gehwolf 2015-04-15 18:10:57 UTC
Siddhesh, I've reproduced crashes with glibc-2.21.90-9.fc23.x86_64 and java-1.8.0-openjdk-devel-1.8.0.45-31.b13.fc23.x86_64

If you rebase glibc it would be good to leave out that patch which makes the SEGVs more frequent for the time being. It'll take a while to track down what's causing this.

glibc-2.21.90-10.fc23.x86_6 and java-1.8.0-openjdk-devel-1.8.0.45-31.b13.fc23.x86_64 seem to cooperate better.

Comment 12 Siddhesh Poyarekar 2015-04-16 02:20:09 UTC
OK, I'll leave out the patch till you figure out a fix for this.

Comment 13 Severin Gehwolf 2015-04-24 14:15:57 UTC
$ rpm -q java-1.8.0-openjdk
java-1.8.0-openjdk-1.8.0.45-31.b13.fc23.x86_64
$ rpm -q glibc
glibc-2.21.90-9.fc23.x86_64

There seems to be a simpler reproducer:

$ cat HelloWorld.java 
public class HelloWorld {
  public static void main(String[] args) {
    System.out.println("Hello World!");
  }
}
$ while true; do rm -f HelloWorld.class; javac HelloWorld.java; done

A couple of compiles seem to work, then it segfaults as in comment 0.

Contrast this to

$ while true; do rm -f HelloWorld.class; javac -J-Xint HelloWorld.java; done

where compiles run fine for much longer. Not sure if it ever triggers the segfault (I wasn't that patient).

Comment 14 Severin Gehwolf 2015-04-24 14:18:18 UTC
For reference, the JVM segfault snippet looks like this:
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f5c5e26b065, pid=1541, tid=140034464122624
#
# JRE version: OpenJDK Runtime Environment (8.0_45-b13) (build 1.8.0_45-b13)
# Java VM: OpenJDK 64-Bit Server VM (25.45-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x84065]

Comment 15 Account closed by the user 2015-06-25 17:10:05 UTC
was this bug reported to upstream openjdk and glibc ?

Comment 16 Carlos O'Donell 2015-07-02 20:12:16 UTC
Interestingly when talking to Christine Flood at Red Hat Summit she indicated the JVM doesn't use malloc all that much, and instead does block allocations and then uses that. It's odd that this crashes at all, but at some point we're going to have to put the code back to ensure that Fedora's malloc behaviour is matching upstream i.e. malloc tunnables being applied to all stacks equally.

Comment 17 Siddhesh Poyarekar 2015-07-15 12:11:23 UTC
I'm putting the code back in for rawhide because there have been a few more fixes around this code.  I will have to eventually do this for F23 as well, so it would be good to know the way forward.

Comment 18 Severin Gehwolf 2015-07-15 12:21:25 UTC
(In reply to Siddhesh Poyarekar from comment #17)
> I'm putting the code back in for rawhide because there have been a few more
> fixes around this code.  I will have to eventually do this for F23 as well,
> so it would be good to know the way forward.

Feel free to move the code back in in rawhide and we'll see what happens. This seems to be a bug triggered by latest glibc + JIT (-Xint does not seem to trigger the bug). Unfortunately, I wasn't able to look further into this due to lack of cycles and it being hard to reproduce.

Comment 19 Jan Kurik 2015-07-15 14:18:42 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Comment 20 Fedora End Of Life 2016-11-24 11:39:51 UTC
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 23 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 21 Fedora End Of Life 2016-12-20 13:29:59 UTC
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.