Bug 841057

Summary: stap can't locate java markers when $vars are used (i686 debuginfo packages contain STABS not DWARF)
Product: [Fedora] Fedora Reporter: Josh Stone <jistone>
Component: java-1.7.0-openjdkAssignee: jiri vanek <jvanek>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: ahughes, dbhole, dsmith, fche, jan.kratochvil, jistone, jvanek, lberk, mjw, omajid, scox, wcohen
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-01 20:42:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Josh Stone 2012-07-18 02:14:21 UTC
Description of problem:
SystemTap reports "semantic error: no match" on libjvm markers when the handler is using any $var access, even though the markers are found without $var use.

This has only been seen as an issue on i686; x86_64 seems fine.

Version-Release number of selected component (if applicable):
systemtap-1.8-1.fc17.i686
elfutils-0.154-1.fc17.i686
java-1.7.0-openjdk-1.7.0.3-2.2.1.fc17.8.i686

How reproducible:
100%

Steps to Reproduce:
1. stap -e 'probe process(@1).mark("*") { println($foo) }' -p2 $JVM
(The fact that $foo is bad here is irrelevant, just me being lazy.  The result should look more like the error x86_64 prints in "Expected results".)
  
Actual results:
> $ JVM=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.3/jre/lib/i386/server/libjvm.so
> $ stap -e 'probe process(@1).mark("*") { println($foo) }' -p2 $JVM
> semantic error: while resolving probe point: identifier 'process' at <input>:1:7
>         source: probe process(@1).mark("*") { println($foo) }
>                       ^
> 
> semantic error: no match
> Pass 2: analysis failed.  Try again with another '--vp 01' option.


Expected results:
> $ JVM=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.3.x86_64/jre/lib/amd64/server/libjvm.so
> $ stap -e 'probe process(@1).mark("*") { println($foo) }' -p2 $JVM
> semantic error: unable to find local 'foo' near pc 0x37342d in <unknown> /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/services/classLoadingService.cpp ( (alternatives: $data $len $name): identifier '$foo' at <input>:1:39
>         source: probe process(@1).mark("*") { println($foo) }
>                                                       ^
> 
> Pass 2: analysis failed.  Try again with another '--vp 01' option.

It's still an error, because I used invalid $foo instead of bothering to find a common $var across all markers, or to isolate a single marker.  But it's the *right* error.


Additional info:

I've started this bug against systemtap, but only because I don't know the root cause yet.  There seems to be a real problem in the debuginfo that leads us into this corner.

Certainly systemtap is wrong in the way it reacts to this failure, with a completely misleading "not found" error message.  I'll file a bug upstream to address that on its own.

When we locate an SDT marker, stap takes different paths depending on the presence of $vars in the handler.  Without any $var, we just build the probe directly, and life is good.  With a $var, we decide we need debuginfo, and so that goes down the query_addr path, until we reach a failure at calling elfutils dwfl_module_addrdie().  The given address is within the range of the queried module.

Then mjw asked me if elfutils is lying, i.e. is there a DIE/CU that ought to be returned here?  And now I find that something is really suspicious with the debuginfo.  The first clue is simply in size:

> $ ls -sh /usr/lib/debug/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.3/jre/lib/i386/server/libjvm.so.debug
> 32M /usr/lib/debug/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.3/jre/lib/i386/server/libjvm.so.debug
vs.
> $ ls -sh /usr/lib/debug/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.3.x86_64/jre/lib/amd64/server/libjvm.so.debug
> 200M /usr/lib/debug/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.3.x86_64/jre/lib/amd64/server/libjvm.so.debug

And looking at those with eu-readelf shows only *three* CUs in i386, vs. 555 CUs in amd64.  The only three survivors in i386 are:
> /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/memory/compactingPermGenGen.cpp
> /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/runtime/sharedRuntimeTrans.cpp
> /usr/src/debug/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/runtime/sharedRuntimeTrig.cpp

Surely there's more to libjvm.so than that! ;)  But whether this is an issue in compiler generation or debuginfo stripping or something else, I don't know...

Comment 1 Josh Stone 2012-07-18 02:54:34 UTC
(In reply to comment #0)
> Certainly systemtap is wrong in the way it reacts to this failure, with a
> completely misleading "not found" error message.  I'll file a bug upstream
> to address that on its own.

http://sourceware.org/bugzilla/show_bug.cgi?id=14369

Comment 2 Jan Kratochvil 2012-07-30 11:22:52 UTC
Both:
java-1.7.0-openjdk-1.7.0.3-2.2.1.fc17.8.i686
java-1.7.0-openjdk-1.7.0.5-2.2.1.fc18.10.i686

are built with -gstabs:
readelf -Wa java-1.7.0-openjdk-debuginfo-1.7.0.5-2.2.1.fc18.10.x86_64/usr/lib/debug/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.5.x86_64/jre/lib/amd64/server/libjvm.so.debug
  [28] .stab             PROGBITS        00000000 000120 143e9f8 0c     29   0  4
  [29] .stabstr          STRTAB          00000000 143eb18 83a505 00      0   0  1

x86_64 does not contain .stab section and it has also significantly larger debuginfo rpm:
15683585 java-1.7.0-openjdk-debuginfo-1.7.0.3-2.2.1.fc17.8.i686.rpm
57968721 java-1.7.0-openjdk-debuginfo-1.7.0.3-2.2.1.fc17.8.x86_64.rpm

http://kojipkgs.fedoraproject.org/packages/java-1.7.0-openjdk/1.7.0.5/2.2.1.fc18.10/data/logs/i686/build.log
g++ [...] -m32 -march=i586 -pipe -g -O3 -fno-strict-aliasing  -gstabs -DVM_LITTLE_ENDIAN [...] /builddir/build/BUILD/java-1.7.0-openjdk/openjdk/hotspot/src/share/vm/services/classLoadingService.cpp

because:
java-1.7.0-openjdk/openjdk/hotspot/make/linux/makefiles/gcc.make
DEBUG_CFLAGS/ia64  = -g
DEBUG_CFLAGS/amd64 = -g
DEBUG_CFLAGS/arm   = -g
DEBUG_CFLAGS/ppc   = -g
DEBUG_CFLAGS += $(DEBUG_CFLAGS/$(BUILDARCH))
ifeq ($(DEBUG_CFLAGS/$(BUILDARCH)),)
DEBUG_CFLAGS += -gstabs
endif

But i486 is missing in this list!
(Probably also for FASTDEBUG_CFLAGS and OPT_CFLAGS below it.)

Comment 3 Deepak Bhole 2012-07-30 19:12:18 UTC
Assigning to jvanek (maintainer) and adding ahughes (our buildsys expert, for comments) to cc:

Comment 5 Andrew John Hughes 2012-07-30 19:57:47 UTC
There's already a patch for this, but it may need some work before going upstream to OpenJDK.  However, it should be possible to add it to the Fedora spec file straight away in order to resolve this issue.

Comment 6 Andrew John Hughes 2012-08-06 13:40:24 UTC
Fixed in OpenJDK8:

http://hg.openjdk.java.net/hsx/hotspot-rt/hotspot/rev/282abd0fd878

and committed to IcedTea7:

http://hg.openjdk.java.net/icedtea/jdk7/hotspot/rev/c5430c659d54

I'll get it backported to 7u.

Comment 8 Josh Stone 2012-09-07 18:30:30 UTC
I just tried java-1.7.0-openjdk-1.7.0.6-2.3.1.fc17.2.i686, looks good - thanks!

Comment 9 Fedora End Of Life 2013-07-04 08:04:34 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Fedora End Of Life 2013-08-01 20:42:39 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.