Bug 808501 - Java VM crashing with SIGSEGV on unzip
Java VM crashing with SIGSEGV on unzip
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: glibc (Show other bugs)
6.0
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Sanne Grinovero
qe-baseos-tools
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-30 10:41 EDT by Sanne Grinovero
Modified: 2016-11-24 11:01 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-04 10:47:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Dump from Fedora16 (36.93 KB, text/x-log)
2012-03-30 10:41 EDT, Sanne Grinovero
no flags Details
Crash on RHEL 4 - a (33.53 KB, text/x-log)
2012-03-30 10:47 EDT, Sanne Grinovero
no flags Details
Crash on RHEL 4 - b (33.78 KB, text/x-log)
2012-03-30 10:48 EDT, Sanne Grinovero
no flags Details
Crash on RHEL 4 - c (33.44 KB, text/x-log)
2012-03-30 10:50 EDT, Sanne Grinovero
no flags Details

  None (edit)
Description Sanne Grinovero 2012-03-30 10:41:05 EDT
Created attachment 574001 [details]
Dump from Fedora16

Description of problem:
We have a reproducible testcase in one of our open source projects (In this case, Hibernate OGM) which is crashing the JVM on several platforms, including:

Red Hat Enterprise Linux 6
Red Hat Enterprise Linux 6 - CBS version
Red Hat Enterprise Linux 4
Fedora 16
Apple OSX (just FYI - makes us think it's Glibc related)

On Fedora 16 I had this problem with both Oracle's distribution of the JVM, and with OpenJDK as distributed in the Fedora repositories.


Version-Release number of selected component (if applicable):
Since we have several stacks with different errors reported, please see the details in each log. I'm attaching logs from several platforms, mostly from JBoss QA testing nodes.

How reproducible:
Requires: Maven, Git and Java:

1. git clone git://github.com/hibernate/hibernate-ogm.git
2. git checkout 20db8d61dc791aced51a1db33e48b2b987954882
3. mvn clean install

Unfortunately it does not fail at all runs, but fails fairly often.
  
Actual results:

JVM crashes during the run of functional tests

Expected results:

project being built running all tests

Additional info:
Comment 2 Sanne Grinovero 2012-03-30 10:47:08 EDT
Created attachment 574003 [details]
Crash on RHEL 4 - a
Comment 3 Sanne Grinovero 2012-03-30 10:48:02 EDT
Created attachment 574007 [details]
Crash on RHEL 4 - b
Comment 4 Sanne Grinovero 2012-03-30 10:50:12 EDT
Created attachment 574012 [details]
Crash on RHEL 4 - c
Comment 5 Sanne Grinovero 2012-03-30 11:35:56 EDT
Looks like this is related to 560232

https://bugzilla.redhat.com/show_bug.cgi?id=560232

But in that case the problematic frame is libzip.so, in this case it's libc.so
Comment 6 Jeff Law 2012-03-30 16:08:16 EDT
What version of glibc is in use on the Fedora 16 crash?  I may be able to get some information from the faulting address, but I need to know the precise version of glibc to do that.


I assume this is not the failure you want me to look at:

[ERROR] Plugin org.jboss.maven.plugins:maven-injection-plugin:1.0.2 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.jboss.maven.plugins:maven-injection-plugin:jar:1.0.2: Failure to find org.jboss.maven.plugins:maven-injection-plugin:pom:1.0.2 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]
org.apache.maven.plugin.PluginResolutionException: Plugin org.jboss.maven.plugins:maven-injection-plugin:1.0.2 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.jboss.maven.plugins:maven-injection-plugin:jar:1.0.2
        at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve(DefaultPluginDependenciesResolver.java:129)
        at org.apache.maven.plugin.internal.DefaultMavenPluginManager.getPluginDescriptor(DefaultMavenPluginManager.java:142)
        at org.apache.maven.plugin.internal.DefaultMavenPluginManager.getMojoDescriptor(DefaultMavenPluginManager.java:261)
        at org.apache.maven.plugin.DefaultBuildPluginManager.getMojoDescriptor(DefaultBuildPluginManager.java:185)
        at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.setupMojoExecution(DefaultLifecycleExecutionPlanCalculator.java:152)
        at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.setupMojoExecutions(DefaultLifecycleExecutionPlanCalculator.java:139)
        at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.calculateExecutionPlan(DefaultLifecycleExecutionPlanCalculator.java:116)
        at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.calculateExecutionPlan(DefaultLifecycleExecutionPlanCalculator.java:129)
        at org.apache.maven.lifecycle.internal.BuilderCommon.resolveBuildPlan(BuilderCommon.java:92)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
        at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
        at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:321)
        at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:158)
        at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
        at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
        at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
Caused by: org.sonatype.aether.resolution.ArtifactDescriptorException: Failed to read artifact descriptor for org.jboss.maven.plugins:maven-injection-plugin:jar:1.0.2
        at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:282)
        at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.readArtifactDescriptor(DefaultArtifactDescriptorReader.java:172)
        at org.sonatype.aether.impl.internal.DefaultRepositorySystem.readArtifactDescriptor(DefaultRepositorySystem.java:316)
        at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve(DefaultPluginDependenciesResolver.java:115)
        ... 25 more
Caused by: org.sonatype.aether.resolution.ArtifactResolutionException: Failure to find org.jboss.maven.plugins:maven-injection-plugin:pom:1.0.2 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced
        at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:541)
        at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:220)
        at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolveArtifact(DefaultArtifactResolver.java:197)
        at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:267)
        ... 28 more
Caused by: org.sonatype.aether.transfer.ArtifactNotFoundException: Failure to find org.jboss.maven.plugins:maven-injection-plugin:pom:1.0.2 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced
        at org.sonatype.aether.impl.internal.DefaultUpdateCheckManager.checkArtifact(DefaultUpdateCheckManager.java:190)
        at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:430)
        ... 31 more
[ERROR] 
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException
Comment 7 Jeff Law 2012-03-30 17:35:40 EDT
I'm pretty sure something is passing a bogus value to memmove.

Looking at the F16 data you provided we have:

# Problematic frame:
# C  [libc.so.6+0x1481a3]  __tls_get_addr@@GLIBC_2.3+0x1481a3

Ignore __tls_get_addr we're clearly not in that function (the 0x1481a3 offset).  So first we want to know what function that corresponds to.  Luckily we can get that from the register dump, particularly RIP:  RIP=0x00000037faf481a3

So given RIP we first want to map that to a DSO from the DSO list in the BZ we have:

37fae00000-37fafad000 r-xp 00000000 08:04 524745                         /lib64/libc-2.14.90.so

So we're in glibc.  Good.  Now subtracting RIP from the starting address in the MAP above we get an offset of 0x1481a3.  That's the offset within the glibc text segment of the faulting instruction.  Now, my glibc may be at a different prelink address, etc, but the offset is still useful.

This is the text segment info for my glibc.  Of particular note is the 3rd entry 0x3a2d400000.  That's the virtual address where my glibc will be loaded.

 LOAD           0x0000000000000000 0x0000003a2d400000 0x0000003a2d400000
                 0x00000000001ac9fc 0x00000000001ac9fc  R E    200000

So if I start gdb on my glibc and examine that address I get:

(gdb) x/10i 0x0000003a2d400000 + 0x1481a3
   0x3a2d5481a3 <__memmove_ssse3_back+5571>:    movaps 0x22(%rsi),%xmm4


Ohhhh....  Now things are getting interesting.  Looking at %rsi in the register dump we have: RSI=0x00007fc063e00fde  If we add 0x22 to that value to form the effective address we get:

(gdb) p/x 0x00007fc063e00fde + 0x22
$2 = 0x7fc063e01000

Hmmm, a page boundary.  Interesting.  Now going back to the DSO map we have:

7fc063e00000-7fc063e01000 rwxp 00000000 00:00 0 

Nothing is mapped immediately after those addresses.  So 0x7fc06ee01000 is unmapped and hitting it will cause a segfault.

The backtrace says this was called on the following chain:

Stack: [0x00007fc062fcf000,0x00007fc0630d0000],  sp=0x00007fc0630cbf58,  free space=1011k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x1481a3]  __tls_get_addr@@GLIBC_2.3+0x1481a3

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J  java.util.zip.ZipFile.getEntry(JLjava/lang/String;Z)J
J  java.util.jar.JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry;
J  java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry;
[ ... ]

Which, if I'm interpreting correctly, means that we came into the memmove code from a compiled Java code frame.  ie, java calling into the glibc memmove routine.  

This really looks like bug somewhere in the java code since it appears that the java code is calling memmove with bogus arguments.  I'm going to have to hand the bug back to you since I don't know who would own this hunk of java code or how to proceed debugging it.


You might try valgrind or using your own LD_PRELOADded memmove which checks for bogus arguments.  Both would be slow, but would verify the problem is in the arguments the java code is passing to memmove rather than memmove itself.
Comment 8 Sanne Grinovero 2012-04-02 05:37:24 EDT
Hi Jeff,
than you for the detailed analysis, very appreciated!
I'll ask someone from the OpenJDK team to have a look.

> This really looks like bug somewhere in the java code since it appears that the
java code is calling memmove with bogus arguments.  I'm going to have to hand
the bug back to you since I don't know who would own this hunk of java code or
how to proceed debugging it.

I think indeed the root cause is that the zips/jars are memory-mapped and we change them at runtime.
It doesn't affect us anymore as we fixed our own very dumb coding mistake, still I think it should not be possible to crash the VM via Java itself.
Comment 9 Jeff Law 2012-04-30 17:58:21 EDT
Sanne, can you reset the component of this bug appropriately?  I didn't see OpenJDK in the components list and as long as it's in the glibc pile, my boss is going to bug me about it.
Comment 10 RHEL Product and Program Management 2012-05-04 00:07:32 EDT
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.
Comment 11 Sanne Grinovero 2012-05-04 10:47:45 EDT
Hi Jeff,
sorry for the delay. I'm closing it, I'll open an OpenJDK issue when I'll be able to isolate a simpler test.
Thanks again!

Note You need to log in before you can comment on or make changes to this bug.