Bug 808501
Summary: | Java VM crashing with SIGSEGV on unzip | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Sanne Grinovero <sanne> | ||||||||||
Component: | glibc | Assignee: | Sanne Grinovero <sanne> | ||||||||||
Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-tools-bugs | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 6.0 | CC: | fweimer, law, mfranc | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2012-05-04 14:47:45 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Created attachment 574003 [details]
Crash on RHEL 4 - a
Created attachment 574007 [details]
Crash on RHEL 4 - b
Created attachment 574012 [details]
Crash on RHEL 4 - c
Looks like this is related to 560232 https://bugzilla.redhat.com/show_bug.cgi?id=560232 But in that case the problematic frame is libzip.so, in this case it's libc.so What version of glibc is in use on the Fedora 16 crash? I may be able to get some information from the faulting address, but I need to know the precise version of glibc to do that. I assume this is not the failure you want me to look at: [ERROR] Plugin org.jboss.maven.plugins:maven-injection-plugin:1.0.2 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.jboss.maven.plugins:maven-injection-plugin:jar:1.0.2: Failure to find org.jboss.maven.plugins:maven-injection-plugin:pom:1.0.2 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1] org.apache.maven.plugin.PluginResolutionException: Plugin org.jboss.maven.plugins:maven-injection-plugin:1.0.2 or one of its dependencies could not be resolved: Failed to read artifact descriptor for org.jboss.maven.plugins:maven-injection-plugin:jar:1.0.2 at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve(DefaultPluginDependenciesResolver.java:129) at org.apache.maven.plugin.internal.DefaultMavenPluginManager.getPluginDescriptor(DefaultMavenPluginManager.java:142) at org.apache.maven.plugin.internal.DefaultMavenPluginManager.getMojoDescriptor(DefaultMavenPluginManager.java:261) at org.apache.maven.plugin.DefaultBuildPluginManager.getMojoDescriptor(DefaultBuildPluginManager.java:185) at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.setupMojoExecution(DefaultLifecycleExecutionPlanCalculator.java:152) at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.setupMojoExecutions(DefaultLifecycleExecutionPlanCalculator.java:139) at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.calculateExecutionPlan(DefaultLifecycleExecutionPlanCalculator.java:116) at org.apache.maven.lifecycle.internal.DefaultLifecycleExecutionPlanCalculator.calculateExecutionPlan(DefaultLifecycleExecutionPlanCalculator.java:129) at org.apache.maven.lifecycle.internal.BuilderCommon.resolveBuildPlan(BuilderCommon.java:92) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:321) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:158) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196) at org.apache.maven.cli.MavenCli.main(MavenCli.java:141) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352) Caused by: org.sonatype.aether.resolution.ArtifactDescriptorException: Failed to read artifact descriptor for org.jboss.maven.plugins:maven-injection-plugin:jar:1.0.2 at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:282) at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.readArtifactDescriptor(DefaultArtifactDescriptorReader.java:172) at org.sonatype.aether.impl.internal.DefaultRepositorySystem.readArtifactDescriptor(DefaultRepositorySystem.java:316) at org.apache.maven.plugin.internal.DefaultPluginDependenciesResolver.resolve(DefaultPluginDependenciesResolver.java:115) ... 25 more Caused by: org.sonatype.aether.resolution.ArtifactResolutionException: Failure to find org.jboss.maven.plugins:maven-injection-plugin:pom:1.0.2 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:541) at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolveArtifacts(DefaultArtifactResolver.java:220) at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolveArtifact(DefaultArtifactResolver.java:197) at org.apache.maven.repository.internal.DefaultArtifactDescriptorReader.loadPom(DefaultArtifactDescriptorReader.java:267) ... 28 more Caused by: org.sonatype.aether.transfer.ArtifactNotFoundException: Failure to find org.jboss.maven.plugins:maven-injection-plugin:pom:1.0.2 in http://repo1.maven.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced at org.sonatype.aether.impl.internal.DefaultUpdateCheckManager.checkArtifact(DefaultUpdateCheckManager.java:190) at org.sonatype.aether.impl.internal.DefaultArtifactResolver.resolve(DefaultArtifactResolver.java:430) ... 31 more [ERROR] [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException I'm pretty sure something is passing a bogus value to memmove. Looking at the F16 data you provided we have: # Problematic frame: # C [libc.so.6+0x1481a3] __tls_get_addr@@GLIBC_2.3+0x1481a3 Ignore __tls_get_addr we're clearly not in that function (the 0x1481a3 offset). So first we want to know what function that corresponds to. Luckily we can get that from the register dump, particularly RIP: RIP=0x00000037faf481a3 So given RIP we first want to map that to a DSO from the DSO list in the BZ we have: 37fae00000-37fafad000 r-xp 00000000 08:04 524745 /lib64/libc-2.14.90.so So we're in glibc. Good. Now subtracting RIP from the starting address in the MAP above we get an offset of 0x1481a3. That's the offset within the glibc text segment of the faulting instruction. Now, my glibc may be at a different prelink address, etc, but the offset is still useful. This is the text segment info for my glibc. Of particular note is the 3rd entry 0x3a2d400000. That's the virtual address where my glibc will be loaded. LOAD 0x0000000000000000 0x0000003a2d400000 0x0000003a2d400000 0x00000000001ac9fc 0x00000000001ac9fc R E 200000 So if I start gdb on my glibc and examine that address I get: (gdb) x/10i 0x0000003a2d400000 + 0x1481a3 0x3a2d5481a3 <__memmove_ssse3_back+5571>: movaps 0x22(%rsi),%xmm4 Ohhhh.... Now things are getting interesting. Looking at %rsi in the register dump we have: RSI=0x00007fc063e00fde If we add 0x22 to that value to form the effective address we get: (gdb) p/x 0x00007fc063e00fde + 0x22 $2 = 0x7fc063e01000 Hmmm, a page boundary. Interesting. Now going back to the DSO map we have: 7fc063e00000-7fc063e01000 rwxp 00000000 00:00 0 Nothing is mapped immediately after those addresses. So 0x7fc06ee01000 is unmapped and hitting it will cause a segfault. The backtrace says this was called on the following chain: Stack: [0x00007fc062fcf000,0x00007fc0630d0000], sp=0x00007fc0630cbf58, free space=1011k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x1481a3] __tls_get_addr@@GLIBC_2.3+0x1481a3 Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) J java.util.zip.ZipFile.getEntry(JLjava/lang/String;Z)J J java.util.jar.JarFile.getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry; J java.util.jar.JarFile.getJarEntry(Ljava/lang/String;)Ljava/util/jar/JarEntry; [ ... ] Which, if I'm interpreting correctly, means that we came into the memmove code from a compiled Java code frame. ie, java calling into the glibc memmove routine. This really looks like bug somewhere in the java code since it appears that the java code is calling memmove with bogus arguments. I'm going to have to hand the bug back to you since I don't know who would own this hunk of java code or how to proceed debugging it. You might try valgrind or using your own LD_PRELOADded memmove which checks for bogus arguments. Both would be slow, but would verify the problem is in the arguments the java code is passing to memmove rather than memmove itself. Hi Jeff,
than you for the detailed analysis, very appreciated!
I'll ask someone from the OpenJDK team to have a look.
> This really looks like bug somewhere in the java code since it appears that the
java code is calling memmove with bogus arguments. I'm going to have to hand
the bug back to you since I don't know who would own this hunk of java code or
how to proceed debugging it.
I think indeed the root cause is that the zips/jars are memory-mapped and we change them at runtime.
It doesn't affect us anymore as we fixed our own very dumb coding mistake, still I think it should not be possible to crash the VM via Java itself.
Sanne, can you reset the component of this bug appropriately? I didn't see OpenJDK in the components list and as long as it's in the glibc pile, my boss is going to bug me about it. Since RHEL 6.3 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Hi Jeff, sorry for the delay. I'm closing it, I'll open an OpenJDK issue when I'll be able to isolate a simpler test. Thanks again! |
Created attachment 574001 [details] Dump from Fedora16 Description of problem: We have a reproducible testcase in one of our open source projects (In this case, Hibernate OGM) which is crashing the JVM on several platforms, including: Red Hat Enterprise Linux 6 Red Hat Enterprise Linux 6 - CBS version Red Hat Enterprise Linux 4 Fedora 16 Apple OSX (just FYI - makes us think it's Glibc related) On Fedora 16 I had this problem with both Oracle's distribution of the JVM, and with OpenJDK as distributed in the Fedora repositories. Version-Release number of selected component (if applicable): Since we have several stacks with different errors reported, please see the details in each log. I'm attaching logs from several platforms, mostly from JBoss QA testing nodes. How reproducible: Requires: Maven, Git and Java: 1. git clone git://github.com/hibernate/hibernate-ogm.git 2. git checkout 20db8d61dc791aced51a1db33e48b2b987954882 3. mvn clean install Unfortunately it does not fail at all runs, but fails fairly often. Actual results: JVM crashes during the run of functional tests Expected results: project being built running all tests Additional info: