Bug 804632 - systemtap jstack() support broken
systemtap jstack() support broken
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: java-1.6.0-openjdk (Show other bugs)
6.3
Unspecified Unspecified
high Severity high
: rc
: ---
Assigned To: jiri vanek
Lukas Zachar
:
Depends On:
Blocks: 805204 825824
  Show dependency treegraph
 
Reported: 2012-03-19 09:35 EDT by Mark Wielaard
Modified: 2012-06-20 09:51 EDT (History)
4 users (show)

See Also:
Fixed In Version: java-1.6.0-openjdk-1.6.0.0-1.43.1.11.1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 825824 (view as bug list)
Environment:
Last Closed: 2012-06-20 09:51:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mark Wielaard 2012-03-19 09:35:28 EDT
Description of problem:

Using the jstack() systemtap support in java-1.6.0-openjdk sometimes
breaks and aborts the stap script.

Version-Release number of selected component (if applicable):

java-1.6.0-openjdk-devel-1.6.0.0-1.43.1.10.6.el6_2.x86_64

How reproducible:

Almost always.

Steps to Reproduce:
1. Make sure to have systemtap, java-1.6.0-openjdk-devel and java-1.6.0-openjdk

2. Have some simple java program around. The following Hello.java will do:

public class Hello
{
  public static void main(String args[])
  {
    System.out.println("Hello World!");
  }
}

javac Hello.java

3. Run some stap script that uses jstack(). e.g.

stap -v -e 'probe hotspot.jni.GetStringUTFChars { log(probestr); print_jstack_full(); log(" === "); }' -c 'java Hello' 2>&1 | c++filt
  
Actual results:

Some backtraces, but then...
ERROR: kernel read fault at 0x0000000000000018 (addr) near identifier '@cast' at /usr/share/systemtap/tapset/x86_64/jstack.stp:362:29
WARNING: Number of errors: 1, skipped probes: 0
Warning: /usr/bin/staprun exited with status: 1

Expected results:

No ERRORs. Lots more backtraces.

Additional info:

This is a bug in the jstack.stp tapset, already fixed upstream:
http://thread.gmane.org/gmane.comp.java.openjdk.distro-packaging.devel/17667
Comment 7 jiri vanek 2012-03-26 09:40:23 EDT
Patch approved to be included in specfile
Comment 9 Mark Wielaard 2012-05-25 06:36:28 EDT
Sadly after a systemtap upgrade from systemtap-1.6-4.el6.x86_64 (RHEL 6.2) to systemtap-1.7-5.el6.x86_64 (RHEL 6.3 Beta) this has turned into a different bug:

Pass 1: parsed user script and 85 library script(s) using 198312virt/26832res/3028shr kb, in 110usr/10sys/132real ms.
Pass 2: analyzed script: 3 probe(s), 42 function(s), 3 embed(s), 16 global(s) using 426332virt/198740res/126360shr kb, in 960usr/100sys/1101real ms.
Pass 3: translated to C into "/tmp/stapi1CKcr/stap_4366_src.c" using 423224virt/77760res/6072shr kb, in 90usr/60sys/157real ms.
cc1: warnings being treated as errors
/tmp/stapi1CKcr/stap_4366_src.c: In function ‘function_print_jstack’:
/tmp/stapi1CKcr/stap_4366_src.c:5201: error: the frame size of 272 bytes is larger than 256 bytes
make[1]: *** [/tmp/stapi1CKcr/stap_4366_src.o] Error 1
make: *** [_module_/tmp/stapi1CKcr] Error 2
WARNING: make exited with status: 2
Pass 4: compiled C into "stap_4366.ko" in 6500usr/770sys/7448real ms.
Pass 4: compilation failed.  Try again with another '--vp 0001' option.
Keeping temporary directory "/tmp/stapi1CKcr"
Comment 10 Mark Wielaard 2012-05-25 06:46:16 EDT
Just confirmed the bug from comment #9 is also in systemtap git trunk. Even slightly worse:

/tmp/stap4JLjXj/stap_10562_src.c: In function ‘function_print_jstack’:
/tmp/stap4JLjXj/stap_10562_src.c:5183: error: the frame size of 288 bytes is larger than 256 bytes

The generated frame size grew from 272 bytes to 288 bytes with systemtap git.
Comment 11 Mark Wielaard 2012-05-25 06:57:11 EDT
More confirming. Upstream systemtap 1.6 works fine against the new java-1.6.0-openjdk-devel-1.6.0.0-1.44.1.11.1.el6.x86_64 package jstack.stp, upstream systemtap 1.7 is broken (error: the frame size of 272 bytes is larger than 256 bytes).
Comment 12 Frank Ch. Eigler 2012-05-25 07:26:41 EDT
Please try again with -DSTP_LEGACY_PRINT
Comment 13 Mark Wielaard 2012-05-25 07:43:46 EDT
(In reply to comment #12)
> Please try again with -DSTP_LEGACY_PRINT

That makes no difference (error: the frame size of 272 bytes is larger than 256 bytes)
Comment 14 Mark Wielaard 2012-05-25 08:46:25 EDT
I created a new bug against systemtap-1.7-5.el6.x86_64 for the issue mentioned in comment #9. https://bugzilla.redhat.com/show_bug.cgi?id=825244
Comment 19 Mark Wielaard 2012-05-25 13:34:33 EDT
systemtap 1.7 trips over the added try-catch construct. This makes it "work" again:

--- jstack.stp.newpkg	2012-05-25 17:25:16.761481432 +0200
+++ /usr/share/systemtap/tapset/x86_64/jstack.stp	2012-05-25 17:31:41.809479955 +0200
@@ -311,7 +311,7 @@
 
           // Some of this is "fuzzy" so catch any read error in case we
           // "guessed" wrong.
-          try
+          /* try */
             {
 
               // Do some sanity checking.
@@ -440,6 +440,7 @@
                 }
 
             }
+/*
           catch
             {
               // Some assumption above totally failed and we got an address
@@ -447,6 +448,7 @@
               frame = sprintf("<unknown_frame@0x%x>", pc);
               trust_fp = 0;
             }
+*/
         }
       else
         {

Sadly, that try-catch is precisely the part of the fix that is needed...
Comment 20 Mark Wielaard 2012-05-25 13:36:46 EDT
That catch part didn't really come out good in the diff, here it is in full to show it isn't anything "fancy":

/*
          catch
            {
              // Some assumption above totally failed and we got an address
              // read error. Give up and mark frame pointer as suspect.
              frame = sprintf("<unknown_frame@0x%x>", pc);
              trust_fp = 0;
            }
*/
Comment 21 Mark Wielaard 2012-05-25 13:57:21 EDT
Also confirmed to other way around. The old jstack.stp version "works" with systemtap 1.6, but adding just the try/catch (and none of the other tweaks between the old and new/fixed jstack.stp makes it fail with error: the frame size of 272 bytes is larger than 256 bytes

--- jstack.stp	2012-05-25 17:21:22.375931710 +0200
+++ /usr/share/systemtap/tapset/x86_64/jstack.stp	2012-05-25 17:54:26.938477606 +0200
@@ -299,7 +299,7 @@
               segments++;
             }
           block = CodeCache_low + (segment << CodeHeap_log2_segment_size);
-
+try {
           // Do some sanity checking.
           used = @cast(block, "HeapBlock",
                        "/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server/libjvm.so")->_header->_used;
@@ -424,6 +424,13 @@
               frame_size = @cast(blob, "CodeBlob",
                                  "/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/lib/amd64/server/libjvm.so")->_frame_size;
             }
+} catch
+{
+ // Some assumption above totally failed and we got an address
+ frame = sprintf("<unknown_frame@0x%x>", pc);
+ trust_fp = 0;
+}
+
         }
       else
         {
Comment 22 Mark Wielaard 2012-05-25 13:58:25 EDT
(In reply to comment #21)
> Also confirmed to other way around. The old jstack.stp version "works" with
> systemtap 1.6, but adding just the try/catch (and none of the other tweaks
> between the old and new/fixed jstack.stp makes it fail with error: the frame
> size of 272 bytes is larger than 256 bytes

Sorry, that should say systemtap 1.7.
Comment 23 Mark Wielaard 2012-05-25 14:00:02 EDT
So to summarize the additional try-catch is what makes it fail (produce the error) with systemtap 1.7, but the same addition of the try-catch is what fixes jstack.stp against systemtap 1.6 (where the new jstack.stp works just fine).
Comment 24 Deepak Bhole 2012-05-25 14:42:02 EDT
So then since RHEL-6 has 1.7, we need to remove the try/catch to fix this bug?
Comment 25 Mark Wielaard 2012-05-25 15:17:40 EDT
(In reply to comment #24)
> So then since RHEL-6 has 1.7, we need to remove the try/catch to fix this
> bug?

except that would bring back the original bug that the try/catch is supposed to fix.

But it seems tweaking the script a little to make it a little less informative in case of other "issues" makes it work again with systemtap 1.7:

--- jstack.stp.newpkg	2012-05-25 17:25:16.761481432 +0200
+++ /usr/share/systemtap/tapset/x86_64/jstack.stp	2012-05-25 19:13:31.777479659 +0200
@@ -320,7 +320,7 @@
               if (used != 1)
                 {
                   // Something very odd has happened.
-                  frame = sprintf("<unused_code_block@0x%x>", pc);
+                  frame = "<unused_code_block>";
                   blob_name = "unused";
                   trust_fp = 0;
                   frame_size = 0;
@@ -444,7 +444,7 @@
             {
               // Some assumption above totally failed and we got an address
               // read error. Give up and mark frame pointer as suspect.
-              frame = sprintf("<unknown_frame@0x%x>", pc);
+              frame = "<unknown_frame>";
               trust_fp = 0;
             }
         }
Comment 29 jiri vanek 2012-05-28 03:53:10 EDT
rebuilding http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4453614
Comment 36 jiri vanek 2012-05-28 10:45:40 EDT
I can just confirm...
Comment 40 errata-xmlrpc 2012-06-20 09:51:26 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0836.html

Note You need to log in before you can comment on or make changes to this bug.