RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 825244 - frame size error regression
Summary: frame size error regression
Keywords:
Status: CLOSED DUPLICATE of bug 1008567
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: systemtap
Version: 6.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Frank Ch. Eigler
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-05-25 12:44 UTC by Mark Wielaard
Modified: 2013-09-26 00:58 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-26 00:58:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Generated C source file that shows the error (208.42 KB, text/plain)
2012-05-25 12:44 UTC, Mark Wielaard
no flags Details
Difference between stap -p3 output befor/after the tweak (58.13 KB, text/plain)
2012-05-25 19:58 UTC, Mark Wielaard
no flags Details

Description Mark Wielaard 2012-05-25 12:44:58 UTC
Created attachment 586855 [details]
Generated C source file that shows the error

Description of problem:

Some scripts now error because the C source file generates an frame size error.

Version-Release number of selected component (if applicable):

systemtap-1.7-5.el6.x86_64

How reproducible:

Always.

Steps to Reproduce:
1. install systemtap java-1.6.0-openjdk-devel and java-1.6.0-openjdk-debuginfo
2. stap -p4 -e 'probe hotspot.jni.GetStringUTFChars { print_jstack(); }'
  
Actual results:

cc1: warnings being treated as errors
/tmp/stap8lklnr/stap_b4ebc82307f06034c19c8e212fd62bfb_32140_src.c: In function ‘function_print_jstack’:
/tmp/stap8lklnr/stap_b4ebc82307f06034c19c8e212fd62bfb_32140_src.c:5114: error: the frame size of 288 bytes is larger than 256 bytes
make[1]: *** [/tmp/stap8lklnr/stap_b4ebc82307f06034c19c8e212fd62bfb_32140_src.o] Error 1
make: *** [_module_/tmp/stap8lklnr] Error 2
WARNING: make exited with status: 2
Pass 4: compilation failed.  Try again with another '--vp 0001' option.

Expected results:

script compiles fine.

Additional info:

This was originally spotted in https://bugzilla.redhat.com/show_bug.cgi?id=804632 which is about a different bug. The fix for that bug should work and does against systemtap in rhel 6.2, but fails against the systemtap in rhel 6.3.

Trying to git bisect this issue between systemtap 1.6 and 1.7 pinpoints this commit:

b0209e91577f4d026172c67ab4dc561425fc21fa is the first bad commit
commit b0209e91577f4d026172c67ab4dc561425fc21fa
Author: Mark Wielaard <mjw>
Date:   Thu Oct 20 13:06:54 2011 +0200

    Don't try to do any lookup when addr is zero in _stp_kallsyms_lookup().

That commit looks totally harmless, though it is inside a giant 10 unmerged branches knot in the git log --graph and it might just have been the last "straw" of course that pushed the frame size over the edge for some reason.

Comment 2 Mark Wielaard 2012-05-25 19:58:59 UTC
Created attachment 586938 [details]
Difference between stap -p3 output befor/after the tweak

We found a workaround that seems to work for bug #804632.

If we tweak the script as follows it seems to not trigger the error:

@@ -320,7 +320,7 @@
               if (used != 1)
                 {
                   // Something very odd has happened.
-                  frame = sprintf("<unused_code_block@0x%x>", pc);
+                  frame = "<unused_code_block>";
                   blob_name = "unused";
                   trust_fp = 0;
                   frame_size = 0;
@@ -444,7 +444,7 @@
             {
               // Some assumption above totally failed and we got an address
               // read error. Give up and mark frame pointer as suspect.
-              frame = sprintf("<unknown_frame@0x%x>", pc);
+              frame = "<unknown_frame>";
               trust_fp = 0;
             }
         }

Attached is the diff between the stap -p3 -e 'probe hotspot.jni.GetStringUTFChars { print_jstack_full() }' output.

Comment 3 Mark Wielaard 2012-05-25 20:36:02 UTC
The relevant hunks of the diff seem to be:

@@ -3940,21 +3866,8 @@
                   {
                     (void) 
                     ({
-                      strlcpy (l->__tmp75, 
-                      ({
-                        l->__tmp77 = l->pc;
-                        #ifndef STP_LEGACY_PRINT
-                          c->printf_locals.stp_sprintf_1.arg0 = l->__tmp77;
-                          c->printf_locals.stp_sprintf_1.__retvalue = l->__tmp78;
-                          stp_sprintf_1 (c);
-                        #else // STP_LEGACY_PRINT
-                          _stp_snprintf (l->__tmp78, MAXSTRINGLEN, "<unused_code_block@0x%llx>", l->__tmp77);
-                        #endif // STP_LEGACY_PRINT
-                        if (unlikely(c->last_error)) goto out;
-                        l->__tmp78;
-                      }), MAXSTRINGLEN);
-                      strlcpy (l->frame, l->__tmp75, MAXSTRINGLEN);
-                      l->__tmp75;
+                      strlcpy (l->frame, "<unused_code_block>", MAXSTRINGLEN);
+                      "<unused_code_block>";
                     });
                     
                     (void) 

@@ -4575,21 +4488,8 @@
             {
               (void) 
               ({
-                strlcpy (l->__tmp189, 
-                ({
-                  l->__tmp191 = l->pc;
-                  #ifndef STP_LEGACY_PRINT
-                    c->printf_locals.stp_sprintf_4.arg0 = l->__tmp191;
-                    c->printf_locals.stp_sprintf_4.__retvalue = l->__tmp192;
-                    stp_sprintf_4 (c);
-                  #else // STP_LEGACY_PRINT
-                    _stp_snprintf (l->__tmp192, MAXSTRINGLEN, "<unknown_frame@0x%llx>", l->__tmp191);
-                  #endif // STP_LEGACY_PRINT
-                  if (unlikely(c->last_error)) goto out;
-                  l->__tmp192;
-                }), MAXSTRINGLEN);
-                strlcpy (l->frame, l->__tmp189, MAXSTRINGLEN);
-                l->__tmp189;
+                strlcpy (l->frame, "<unknown_frame>", MAXSTRINGLEN);
+                "<unknown_frame>";
               });
               
               (void) 

What surprises me is that -DSTP_LEGACY_PRINT doesn't seem to make a difference in this case.

Comment 4 Mark Wielaard 2012-05-25 20:43:45 UTC
So it might be the two calls to strlcpy() which are to an exported symbol of the kernel and so cannot be inlined/optimized. Maybe we should have a static _stp_strlcpy() instead? Or would the optimizations/inlining lead to possible more stack usage?

Comment 5 Mark Wielaard 2012-05-27 21:15:50 UTC
This seems RHEL specific. The issue doesn't occur on Fedora 17. The following update is needed to the openjdk package on F17: https://admin.fedoraproject.org/updates/FEDORA-2012-8424/java-1.7.0-openjdk-1.7.0.3-2.1.fc17.7

Note Fedora 17 has different versions of everything of course, newer gcc (4.7 instead of 4.4) and openjdk (1.7 instead of 1.6). So they might not be comparible at all.

Comment 7 RHEL Program Management 2012-07-10 08:38:13 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 8 RHEL Program Management 2012-07-11 01:51:49 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 9 RHEL Program Management 2012-12-14 08:35:50 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 10 Frank Ch. Eigler 2013-09-16 19:16:05 UTC
As per bug #1008567, upstream patch is available to bump up
the 256 safety limit to 512.

Comment 12 Frank Ch. Eigler 2013-09-26 00:58:51 UTC
Closing as DUP due to raising the warning threshold, but the code generation differences might merit further study.

*** This bug has been marked as a duplicate of bug 1008567 ***


Note You need to log in before you can comment on or make changes to this bug.