Bug 596933 - [RHEL6] stap segfaults on PPC
[RHEL6] stap segfaults on PPC
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: systemtap (Show other bugs)
6.0
All Linux
high Severity medium
: rc
: ---
Assigned To: Frank Ch. Eigler
Petr Muller
https://beaker.engineering.redhat.com...
:
Depends On: 602359 640321
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-27 15:11 EDT by Jeff Burke
Modified: 2016-09-19 22:06 EDT (History)
7 users (show)

See Also:
Fixed In Version: systemtap-1.2-9.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-10 16:44:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeff Burke 2010-05-27 15:11:03 EDT
Description of problem:
 When trying to run the tracepoint tests for RHEL6 on ppc it fails

Version-Release number of selected component (if applicable):
 Kernel Package = kernel-2.6.32-30.el6.ppc64
 systemtap version = systemtap-1.2-3.el6.ppc64

How reproducible:
 Always

Steps to Reproduce:
1. Install the RHEL6.0-Snapshot-5 PPC Server variant onto a PPC host 
2. run the following command 
/usr/bin/stap -L 'kernel.trace("*")' | grep -o "\".*\"" | xargs -I [] -n1 stap -c "sleep 1" -vvvve 'probe kernel.trace("[]") { exit() }'
  
Actual results:
Segmentation fault (core dumped) stap -c "sleep 1" -vvvve 'probe kernel.trace('$i') { exit() }' > $VERBOSETRACELOG 2>&1

Expected results:
 Should pass

Additional info:
Comment 2 Petr Muller 2010-06-02 10:15:51 EDT
We hit this too in our Tier Tests. Basically everything you run on ppc ends with a segfault (or abort).

With certain probes, stap seems to be a bit more verbose when going down, so I'm putting it here:

# stap -e 'probe begin{printf("PWN!\n")}' 
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_S_create
Aborted (core dumped)

glibc even gave a backtrace in one case:

*** glibc detected *** stap: free(): invalid next size (fast): 0x000000004a121180 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x807a11f704)[0xfff9d83f704]
/usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv-0x5e5a4)[0xfff9dbce46c]
/usr/lib64/libstdc++.so.6(+0x807a70b3e4)[0xfff9dbcb3e4]
/usr/lib64/libstdc++.so.6(_ZSt9terminatev-0x611c0)[0xfff9dbcb430]
/usr/lib64/libstdc++.so.6(__cxa_throw-0x6104c)[0xfff9dbcb5e4]
/usr/lib64/libstdc++.so.6(_ZSt20__throw_length_errorPKc-0xde140)[0xfff9db45980]
/usr/lib64/libstdc++.so.6(_ZNSs4_Rep9_S_createEmmRKSaIcE-0x9412c)[0xfff9db94e14]
/usr/lib64/libstdc++.so.6(_ZNSs4_Rep8_M_cloneERKSaIcEm-0x92ab4)[0xfff9db9674c]
/usr/lib64/libstdc++.so.6(_ZNSs7reserveEm-0x92404)[0xfff9db96eac]
/usr/lib64/libstdc++.so.6(_ZNSs6appendERKSs-0x91da0)[0xfff9db97580]
stap(_ZStplIcSt11char_traitsIcESaIcEESbIT_T0_T1_ERKS6_S8_-0x196200)[0x233d3110]
stap(+0x12d234)[0x234ad234]
stap(+0x12b074)[0x234ab074]
stap(+0x51058)[0x233d1058]
/lib64/libc.so.6(+0x807a0bbc38)[0xfff9d7dbc38]
/lib64/libc.so.6(__libc_start_main-0x184ea0)[0xfff9d7dbe30]
Comment 3 Frank Ch. Eigler 2010-06-10 09:32:11 EDT
From http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44492, 
it seems like this was an unforeseen interaction between
our sys/sdt.h style of inline assembly vs. gcc powerpc
optimizer.  stap will adopt a more restricted inline-asm
constraint for STAP_PROBE(...) which should make it work
even if the compiler issue is not settled.
Comment 4 Issue Tracker 2010-06-16 16:33:35 EDT
Event posted on 06-16-2010 02:34pm EDT by Glen Johnson

------- Comment From anibalca@linux.ibm.com 2010-06-16 14:24 EDT-------
I hit this bug on RHEL6 pre-beta2


This event sent from IssueTracker by jkachuck 
 issue 989583
Comment 5 Frank Ch. Eigler 2010-06-16 16:38:23 EDT
Upstream bug http://sourceware.org/bugzilla/show_bug.cgi?id=11708
is a corequisite, as the powerpc fix listed above unfortunately
breaks i686.  Work in progress.  scox, please append here the commit#
for the PR11708 fix, when that is ready.
Comment 6 Frank Ch. Eigler 2010-06-23 12:11:54 EDT
Let me backtrack a bit from comment #5.  rhel6's systemtap does not have
the SDT_V2 stuff yet, so scox's patch for sdt.h wouldn't apply there directly
anyhow.  And as it only has SDT_V1, the $XXX translator support from
http://sourceware.org/PR11708 is not actually needed.

So for RHEL6, it may be sufficient to change all the "g" constraints to "ron"
in sys/sdt.h and make no other change.
Comment 7 Frank Ch. Eigler 2010-06-28 22:45:10 EDT
See bug #608768 for ppc code negatively impacted by "nro", however.
Comment 8 Frank Ch. Eigler 2010-06-29 10:42:52 EDT
Our current plan is to revert the current fix for this bug, and
instead prereq/buildreq the version of gcc (4.4.4-9) that includes
an alternate fix for the "g" assembly constraint.  That way, 
bug #608768 is fixed too.
Comment 11 releng-rhel@redhat.com 2010-11-10 16:44:29 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.