RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 640097 - Segfaults while probing shell spawnfest
Summary: Segfaults while probing shell spawnfest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: systemtap
Version: 6.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: David Smith
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-04 19:37 UTC by Petr Muller
Modified: 2016-09-20 02:07 UTC (History)
5 users (show)

Fixed In Version: systemtap-1.4-2.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-19 13:54:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
stap script to run (268 bytes, text/plain)
2010-10-04 19:37 UTC, Petr Muller
no flags Details
gzipped reproducer tarball (532 bytes, application/x-gzip)
2010-10-04 19:40 UTC, Petr Muller
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0651 0 normal SHIPPED_LIVE systemtap bug fix and enhancement update 2011-05-19 09:37:25 UTC

Description Petr Muller 2010-10-04 19:37:08 UTC
Created attachment 451504 [details]
stap script to run

Description of problem:
We have an automated stress test for userspace apps which probes a gazillion shells spawning new shell spawning new shells and probing all the functions in the shells. During the execution, several segmentation faults occur. I know they can probably be a shell bug, where stap slows the execution down enough to widen the race condition window, but I'm initially placing in to systemtap component: I hae to file it somewhere :)


Version-Release number of selected component (if applicable):
# rpm -q systemtap bash zsh ksh
systemtap-1.2-9.el6.x86_64
bash-4.1.2-3.el6.x86_64
zsh-4.3.10-4.1.el6.x86_64
ksh-20100621-2.el6.x86_64

How reproducible:
seems like always

Steps to Reproduce:
1. install bash, ksh and zsh shells and their debuginfos
2. # stap -v shelly-stressor.stp -c "bash bash-spawner.bash" `which bash` `which zsh` `which ksh` > /dev/null
3. observe the segfaults 
  
Actual results:
segfaults

Expected results:
no segfaults

Additional info:
reproducing files attached

Comment 1 Petr Muller 2010-10-04 19:40:13 UTC
Created attachment 451505 [details]
gzipped reproducer tarball

Comment 2 Petr Muller 2010-10-04 19:42:02 UTC
If this is confirmed, it should be cloned for RHEL5 too, because it has the same bug

Comment 3 David Smith 2010-10-07 21:40:57 UTC
Here's an update.  I've confirmed this with RHEL6 (kernel 2.6.32-71.el6.x86_64).  From looking at some debug output, it appears the segfault only occurs when exec'ing a different shell.

I managed to simplify the testcase down by changing bash-spawner.bash to just "zsh -c ls", which makes things a bit easier.  I'm still parsing the debug output, trying to figure out what is causing this.

Comment 4 David Smith 2010-10-13 17:40:22 UTC
Here's a further update.  It appears that the segfaults are coming from ksh/zsh, which both use vfork().  Bash, which uses fork(), doesn't seem to cause the segfaults.  I tested bash by using csh (which also uses fork()).

At this point the culprit seems to be the way uprobes handles vfork'ed processes.  There is a new upstream test (dtrace_vfork_exec.exp) which shows a similar problem when using vfork() that might be related to this bug.

Comment 5 David Smith 2010-11-19 18:46:56 UTC
This one seems to be fixed now.  There were 2 separate problems:

1) Uprobes problems with vfork.  This problem was fixed in upstream commits:

4ed03d392476139c36d1959400909c80f533f651:
http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commit;h=4ed03d392476139c36d1959400909c80f533f651

and

3b0b3f85bcc89fccf141a4617453b09710c62578:
<http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commit;h=3b0b3f85bcc89fccf141a4617453b09710c62578>

2) Uprobes problems with empty functions/newer gcc's.  Newer gcc's emit conditional returns for empty functions ("rep ret"), which the uprobes instruction handler wasn't expecting.  Fixed in upstream commit:

33d60a821c49313350cf6e575697600004567f6f:
<http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commit;h=33d60a821c49313350cf6e575697600004567f6f>


With these fixes, the reproducer test runs to completion without any segmentation faults.  Note that the reproducer test will take a long time to run to completion.  Without systemtap, the reproducer test takes about 11 seconds.  With systemtap probing every function in bash & zsh & ksh, the reproducer test takes over 14 minutes.  The reproducer test causes 8.75 million probe hits.

As noted in comment #2 above, this bug should be cloned for RHEL5, because it also has the same problem.  However, the list of commits will be different for RHEL5, since RHEL6 uses uprobes2 while RHEL5 uses the original uprobes.

Comment 9 errata-xmlrpc 2011-05-19 13:54:43 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0651.html


Note You need to log in before you can comment on or make changes to this bug.