Description of problem: Java applications started randomly crashing with SIGSEGV after minor version update of glibc from 2.21.90-8.fc23 to 2.21.90-9.fc23. Example of such crash: https://kojipkgs.fedoraproject.org/work/tasks/7108/9427108/build.log According to koschei, the only dependency change since previous succesful build was glibc and libpng. I assume that libpng is not relevant here. http://koschei.cloud.fedoraproject.org/package/httpcomponents-core/466855 Version-Release number of selected component (if applicable): 1:java-1.8.0-openjdk-1.8.0.40-19.b12.fc22 glibc-2.21.90-9.fc23 How reproducible: The failures are random, but tend to be reproducible for any build that runs more than few seconds. Steps to Reproduce: 1. Rebuild any Java package in mock or koji. Additional info: I don't know whether the bug is in openjdk or glibc. But if it's not trivial to fix, I'd suggest untagging latest glibc as temporary workaround. CC'ing glibc maintainer
Reproducible for me as well.
I have reverted the last glibc rebase for now: http://koji.fedoraproject.org/koji/taskinfo?taskID=9429482 So once that build finishes, you should be able to get on with your build. I'll figure out what's broken later.
So java-1.8.0-openjdk-1.8.0.40-19.b12.fc22 was the last one built with GCC 4.9. glibc with GCC 5. We might enter ABI incompatibility issues now until bug 1208369 is fixed.
FWIW, updating to glibc-2.21.90-10.fc23 fixes the seg faults for me -- I am now able to build java packages.
*** Bug 1209973 has been marked as a duplicate of this bug. ***
(In reply to Mat Booth from comment #4) > FWIW, updating to glibc-2.21.90-10.fc23 fixes the seg faults for me -- I am > now able to build java packages. Thanks, I'll try to find out what broke it. Assigning it to glibc.
*** Bug 1209252 has been marked as a duplicate of this bug. ***
I thought this was due to this commit: commit c26efef9798914e208329c0e8c3c73bb1135d9e3 Author: Mel Gorman <mgorman> Date: Thu Apr 2 12:14:14 2015 +0530 malloc: Consistently apply trim_threshold to all heaps [BZ #17195] and the segfault indeed seemed to go away at first glance. However, looking closely, it looks like the segfault does not always happen and is likely some kind of race condition. I ran this in an infinite loop with the patch reverted (and in fact, with -10.fc23): while javac -classpath /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java/plplotjavacJNI.java -d /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java; do true; done and it crashed eventually, indicating that the patch only seems to make the crash more frequent. Running the above compilation command under valgrind spews out thousands of errors and it also seemed to cause the crash a bit more consistently, so it might be a good way to observe this behaviour. I'll reassign this to openjdk. Let me know if you want to keep this patch out till you figure out what's going on. Otherwise I'll rebase by the end of the week.
(In reply to Siddhesh Poyarekar from comment #8) > while javac -classpath > /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java > /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java/plplotjavacJNI.java > -d /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java; do true; done To be clear, while I tested the crash with this specific command, the build itself was failing on any one of the many example program compile commands.
(In reply to Siddhesh Poyarekar from comment #8) > I thought this was due to this commit: > > commit c26efef9798914e208329c0e8c3c73bb1135d9e3 > Author: Mel Gorman <mgorman> > Date: Thu Apr 2 12:14:14 2015 +0530 > > malloc: Consistently apply trim_threshold to all heaps [BZ #17195] > > and the segfault indeed seemed to go away at first glance. However, looking > closely, it looks like the segfault does not always happen and is likely > some kind of race condition. > > I ran this in an infinite loop with the patch reverted (and in fact, with > -10.fc23): > > while javac -classpath > /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java > /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java/plplotjavacJNI.java > -d /root/rpmbuild/BUILD/plplot-5.10.0/fedora/bindings/java; do true; done > > and it crashed eventually, indicating that the patch only seems to make the > crash more frequent. Running the above compilation command under valgrind > spews out thousands of errors and it also seemed to cause the crash a bit > more consistently, so it might be a good way to observe this behaviour. > > I'll reassign this to openjdk. Let me know if you want to keep this patch > out till you figure out what's going on. Otherwise I'll rebase by the end > of the week. Thanks for the analysis Siddhesh. With bug 1208369 fixed I think I'll be able to make some progress on this one. I'd rather reproduce this problem with a GCC 5 compiled openjdk in order to rule out ABI problems. I'll let you know later today as to how to proceed with the glibc rebase from an openjdk perspective.
Siddhesh, I've reproduced crashes with glibc-2.21.90-9.fc23.x86_64 and java-1.8.0-openjdk-devel-1.8.0.45-31.b13.fc23.x86_64 If you rebase glibc it would be good to leave out that patch which makes the SEGVs more frequent for the time being. It'll take a while to track down what's causing this. glibc-2.21.90-10.fc23.x86_6 and java-1.8.0-openjdk-devel-1.8.0.45-31.b13.fc23.x86_64 seem to cooperate better.
OK, I'll leave out the patch till you figure out a fix for this.
$ rpm -q java-1.8.0-openjdk java-1.8.0-openjdk-1.8.0.45-31.b13.fc23.x86_64 $ rpm -q glibc glibc-2.21.90-9.fc23.x86_64 There seems to be a simpler reproducer: $ cat HelloWorld.java public class HelloWorld { public static void main(String[] args) { System.out.println("Hello World!"); } } $ while true; do rm -f HelloWorld.class; javac HelloWorld.java; done A couple of compiles seem to work, then it segfaults as in comment 0. Contrast this to $ while true; do rm -f HelloWorld.class; javac -J-Xint HelloWorld.java; done where compiles run fine for much longer. Not sure if it ever triggers the segfault (I wasn't that patient).
For reference, the JVM segfault snippet looks like this: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f5c5e26b065, pid=1541, tid=140034464122624 # # JRE version: OpenJDK Runtime Environment (8.0_45-b13) (build 1.8.0_45-b13) # Java VM: OpenJDK 64-Bit Server VM (25.45-b02 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libc.so.6+0x84065]
was this bug reported to upstream openjdk and glibc ?
Interestingly when talking to Christine Flood at Red Hat Summit she indicated the JVM doesn't use malloc all that much, and instead does block allocations and then uses that. It's odd that this crashes at all, but at some point we're going to have to put the code back to ensure that Fedora's malloc behaviour is matching upstream i.e. malloc tunnables being applied to all stacks equally.
I'm putting the code back in for rawhide because there have been a few more fixes around this code. I will have to eventually do this for F23 as well, so it would be good to know the way forward.
(In reply to Siddhesh Poyarekar from comment #17) > I'm putting the code back in for rawhide because there have been a few more > fixes around this code. I will have to eventually do this for F23 as well, > so it would be good to know the way forward. Feel free to move the code back in in rawhide and we'll see what happens. This seems to be a bug triggered by latest glibc + JIT (-Xint does not seem to trigger the bug). Unfortunately, I wasn't able to look further into this due to lack of cycles and it being hard to reproduce.
This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle. Changing version to '23'. (As we did not run this process for some time, it could affect also pre-Fedora 23 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23
This message is a reminder that Fedora 23 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 23. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '23'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 23 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.