Bug 640097
| Summary: | Segfaults while probing shell spawnfest | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Petr Muller <pmuller> | ||||||
| Component: | systemtap | Assignee: | David Smith <dsmith> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | qe-baseos-tools-bugs | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 6.1 | CC: | dsmith, ebachalo, fche, mjw, ohudlick | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | systemtap-1.4-2.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2011-05-19 13:54:43 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 451505 [details]
gzipped reproducer tarball
If this is confirmed, it should be cloned for RHEL5 too, because it has the same bug Here's an update. I've confirmed this with RHEL6 (kernel 2.6.32-71.el6.x86_64). From looking at some debug output, it appears the segfault only occurs when exec'ing a different shell. I managed to simplify the testcase down by changing bash-spawner.bash to just "zsh -c ls", which makes things a bit easier. I'm still parsing the debug output, trying to figure out what is causing this. Here's a further update. It appears that the segfaults are coming from ksh/zsh, which both use vfork(). Bash, which uses fork(), doesn't seem to cause the segfaults. I tested bash by using csh (which also uses fork()). At this point the culprit seems to be the way uprobes handles vfork'ed processes. There is a new upstream test (dtrace_vfork_exec.exp) which shows a similar problem when using vfork() that might be related to this bug. This one seems to be fixed now. There were 2 separate problems: 1) Uprobes problems with vfork. This problem was fixed in upstream commits: 4ed03d392476139c36d1959400909c80f533f651: http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commit;h=4ed03d392476139c36d1959400909c80f533f651 and 3b0b3f85bcc89fccf141a4617453b09710c62578: <http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commit;h=3b0b3f85bcc89fccf141a4617453b09710c62578> 2) Uprobes problems with empty functions/newer gcc's. Newer gcc's emit conditional returns for empty functions ("rep ret"), which the uprobes instruction handler wasn't expecting. Fixed in upstream commit: 33d60a821c49313350cf6e575697600004567f6f: <http://sources.redhat.com/git/gitweb.cgi?p=systemtap.git;a=commit;h=33d60a821c49313350cf6e575697600004567f6f> With these fixes, the reproducer test runs to completion without any segmentation faults. Note that the reproducer test will take a long time to run to completion. Without systemtap, the reproducer test takes about 11 seconds. With systemtap probing every function in bash & zsh & ksh, the reproducer test takes over 14 minutes. The reproducer test causes 8.75 million probe hits. As noted in comment #2 above, this bug should be cloned for RHEL5, because it also has the same problem. However, the list of commits will be different for RHEL5, since RHEL6 uses uprobes2 while RHEL5 uses the original uprobes. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0651.html |
Created attachment 451504 [details] stap script to run Description of problem: We have an automated stress test for userspace apps which probes a gazillion shells spawning new shell spawning new shells and probing all the functions in the shells. During the execution, several segmentation faults occur. I know they can probably be a shell bug, where stap slows the execution down enough to widen the race condition window, but I'm initially placing in to systemtap component: I hae to file it somewhere :) Version-Release number of selected component (if applicable): # rpm -q systemtap bash zsh ksh systemtap-1.2-9.el6.x86_64 bash-4.1.2-3.el6.x86_64 zsh-4.3.10-4.1.el6.x86_64 ksh-20100621-2.el6.x86_64 How reproducible: seems like always Steps to Reproduce: 1. install bash, ksh and zsh shells and their debuginfos 2. # stap -v shelly-stressor.stp -c "bash bash-spawner.bash" `which bash` `which zsh` `which ksh` > /dev/null 3. observe the segfaults Actual results: segfaults Expected results: no segfaults Additional info: reproducing files attached