Bug 825244 - frame size error regression
frame size error regression
Status: CLOSED DUPLICATE of bug 1008567
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: systemtap (Show other bugs)
6.3
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Frank Ch. Eigler
qe-baseos-tools
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-25 08:44 EDT by Mark Wielaard
Modified: 2013-09-25 20:58 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-09-25 20:58:51 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Generated C source file that shows the error (208.42 KB, text/plain)
2012-05-25 08:44 EDT, Mark Wielaard
no flags Details
Difference between stap -p3 output befor/after the tweak (58.13 KB, text/plain)
2012-05-25 15:58 EDT, Mark Wielaard
no flags Details

  None (edit)
Description Mark Wielaard 2012-05-25 08:44:58 EDT
Created attachment 586855 [details]
Generated C source file that shows the error

Description of problem:

Some scripts now error because the C source file generates an frame size error.

Version-Release number of selected component (if applicable):

systemtap-1.7-5.el6.x86_64

How reproducible:

Always.

Steps to Reproduce:
1. install systemtap java-1.6.0-openjdk-devel and java-1.6.0-openjdk-debuginfo
2. stap -p4 -e 'probe hotspot.jni.GetStringUTFChars { print_jstack(); }'
  
Actual results:

cc1: warnings being treated as errors
/tmp/stap8lklnr/stap_b4ebc82307f06034c19c8e212fd62bfb_32140_src.c: In function ‘function_print_jstack’:
/tmp/stap8lklnr/stap_b4ebc82307f06034c19c8e212fd62bfb_32140_src.c:5114: error: the frame size of 288 bytes is larger than 256 bytes
make[1]: *** [/tmp/stap8lklnr/stap_b4ebc82307f06034c19c8e212fd62bfb_32140_src.o] Error 1
make: *** [_module_/tmp/stap8lklnr] Error 2
WARNING: make exited with status: 2
Pass 4: compilation failed.  Try again with another '--vp 0001' option.

Expected results:

script compiles fine.

Additional info:

This was originally spotted in https://bugzilla.redhat.com/show_bug.cgi?id=804632 which is about a different bug. The fix for that bug should work and does against systemtap in rhel 6.2, but fails against the systemtap in rhel 6.3.

Trying to git bisect this issue between systemtap 1.6 and 1.7 pinpoints this commit:

b0209e91577f4d026172c67ab4dc561425fc21fa is the first bad commit
commit b0209e91577f4d026172c67ab4dc561425fc21fa
Author: Mark Wielaard <mjw@redhat.com>
Date:   Thu Oct 20 13:06:54 2011 +0200

    Don't try to do any lookup when addr is zero in _stp_kallsyms_lookup().

That commit looks totally harmless, though it is inside a giant 10 unmerged branches knot in the git log --graph and it might just have been the last "straw" of course that pushed the frame size over the edge for some reason.
Comment 2 Mark Wielaard 2012-05-25 15:58:59 EDT
Created attachment 586938 [details]
Difference between stap -p3 output befor/after the tweak

We found a workaround that seems to work for bug #804632.

If we tweak the script as follows it seems to not trigger the error:

@@ -320,7 +320,7 @@
               if (used != 1)
                 {
                   // Something very odd has happened.
-                  frame = sprintf("<unused_code_block@0x%x>", pc);
+                  frame = "<unused_code_block>";
                   blob_name = "unused";
                   trust_fp = 0;
                   frame_size = 0;
@@ -444,7 +444,7 @@
             {
               // Some assumption above totally failed and we got an address
               // read error. Give up and mark frame pointer as suspect.
-              frame = sprintf("<unknown_frame@0x%x>", pc);
+              frame = "<unknown_frame>";
               trust_fp = 0;
             }
         }

Attached is the diff between the stap -p3 -e 'probe hotspot.jni.GetStringUTFChars { print_jstack_full() }' output.
Comment 3 Mark Wielaard 2012-05-25 16:36:02 EDT
The relevant hunks of the diff seem to be:

@@ -3940,21 +3866,8 @@
                   {
                     (void) 
                     ({
-                      strlcpy (l->__tmp75, 
-                      ({
-                        l->__tmp77 = l->pc;
-                        #ifndef STP_LEGACY_PRINT
-                          c->printf_locals.stp_sprintf_1.arg0 = l->__tmp77;
-                          c->printf_locals.stp_sprintf_1.__retvalue = l->__tmp78;
-                          stp_sprintf_1 (c);
-                        #else // STP_LEGACY_PRINT
-                          _stp_snprintf (l->__tmp78, MAXSTRINGLEN, "<unused_code_block@0x%llx>", l->__tmp77);
-                        #endif // STP_LEGACY_PRINT
-                        if (unlikely(c->last_error)) goto out;
-                        l->__tmp78;
-                      }), MAXSTRINGLEN);
-                      strlcpy (l->frame, l->__tmp75, MAXSTRINGLEN);
-                      l->__tmp75;
+                      strlcpy (l->frame, "<unused_code_block>", MAXSTRINGLEN);
+                      "<unused_code_block>";
                     });
                     
                     (void) 

@@ -4575,21 +4488,8 @@
             {
               (void) 
               ({
-                strlcpy (l->__tmp189, 
-                ({
-                  l->__tmp191 = l->pc;
-                  #ifndef STP_LEGACY_PRINT
-                    c->printf_locals.stp_sprintf_4.arg0 = l->__tmp191;
-                    c->printf_locals.stp_sprintf_4.__retvalue = l->__tmp192;
-                    stp_sprintf_4 (c);
-                  #else // STP_LEGACY_PRINT
-                    _stp_snprintf (l->__tmp192, MAXSTRINGLEN, "<unknown_frame@0x%llx>", l->__tmp191);
-                  #endif // STP_LEGACY_PRINT
-                  if (unlikely(c->last_error)) goto out;
-                  l->__tmp192;
-                }), MAXSTRINGLEN);
-                strlcpy (l->frame, l->__tmp189, MAXSTRINGLEN);
-                l->__tmp189;
+                strlcpy (l->frame, "<unknown_frame>", MAXSTRINGLEN);
+                "<unknown_frame>";
               });
               
               (void) 

What surprises me is that -DSTP_LEGACY_PRINT doesn't seem to make a difference in this case.
Comment 4 Mark Wielaard 2012-05-25 16:43:45 EDT
So it might be the two calls to strlcpy() which are to an exported symbol of the kernel and so cannot be inlined/optimized. Maybe we should have a static _stp_strlcpy() instead? Or would the optimizations/inlining lead to possible more stack usage?
Comment 5 Mark Wielaard 2012-05-27 17:15:50 EDT
This seems RHEL specific. The issue doesn't occur on Fedora 17. The following update is needed to the openjdk package on F17: https://admin.fedoraproject.org/updates/FEDORA-2012-8424/java-1.7.0-openjdk-1.7.0.3-2.1.fc17.7

Note Fedora 17 has different versions of everything of course, newer gcc (4.7 instead of 4.4) and openjdk (1.7 instead of 1.6). So they might not be comparible at all.
Comment 7 RHEL Product and Program Management 2012-07-10 04:38:13 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 8 RHEL Product and Program Management 2012-07-10 21:51:49 EDT
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
Comment 9 RHEL Product and Program Management 2012-12-14 03:35:50 EST
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 10 Frank Ch. Eigler 2013-09-16 15:16:05 EDT
As per bug #1008567, upstream patch is available to bump up
the 256 safety limit to 512.
Comment 12 Frank Ch. Eigler 2013-09-25 20:58:51 EDT
Closing as DUP due to raising the warning threshold, but the code generation differences might merit further study.

*** This bug has been marked as a duplicate of bug 1008567 ***

Note You need to log in before you can comment on or make changes to this bug.