Bug 510282
| Summary: | systemtap panics kernel when killing concurrent staprun processes | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Marc Milgram <mmilgram> |
| Component: | systemtap | Assignee: | Frank Ch. Eigler <fche> |
| Status: | CLOSED ERRATA | QA Contact: | BaseOS QE <qe-baseos-auto> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 5.3 | CC: | dsmith, fche, mjw, mmilgram, pmuller, tao |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 490234 | Environment: | |
| Last Closed: | 2010-03-30 09:05:37 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 515829 | ||
| Bug Blocks: | 499522 | ||
|
Description
Marc Milgram
2009-07-08 15:08:49 UTC
A customer hit this issue in Red Hat Enterprise Linux 5.3:
After several stap processes including procfs probe points are stopped,
kernel panic sometimes occurs.
Kernel panic occurred at proc_match@fs/proc/generic.c:755 because of invalid pointer dereference.
We attach the file(panic_message.log) of kernel panic message.
In addition, the following BUG messages are produced to console.
This occurs because one of running stap processes remove /proc/systemtap directory,
while another stap process uses it.
This BUG messages are always produced in this situation, but kernel panic does not always occur.
-------------------------------------------------------------------------------
BUG: warning at fs/proc/generic.c:764/remove_proc_entry() (Tainted: G )
Call Trace:
[<ffffffff80101d38>] remove_proc_entry+0x17d/0x1e3
[<ffffffff88498bc5>] :stap_6c500b418cf00cac628a4a64c97f2716_312:_stp_rmdir_proc_module+0x79/0xca
[<ffffffff88498cb0>] :stap_6c500b418cf00cac628a4a64c97f2716_312:systemtap_module_exit+0x63/0xcc
[<ffffffff88498d95>] :stap_6c500b418cf00cac628a4a64c97f2716_312:_stp_cleanup_and_exit+0x77/0x79
[<ffffffff88498e28>] :stap_6c500b418cf00cac628a4a64c97f2716_312:_stp_work_queue+0x91/0x96
[<ffffffff8004d139>] run_workqueue+0x94/0xe4
[<ffffffff800499ba>] worker_thread+0x0/0x122
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80049aaa>] worker_thread+0xf0/0x122
[<ffffffff8008a461>] default_wake_function+0x0/0xe
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
-------------------------------------------------------------------------------
Version-Release number of selected component:
RHEL5.3 GA
kernel-2.6.18-128.el5
systemtap-0.7.2-2.el5
How reproducible:
After stopping several running stap processes including procfs probe points,
it can be reproduced in several tries.
Steps to Reproduce:
1. Start two stap scripts including procfs probe points like as follows:
---- procfs.stp -------
probe procfs("a").read{
$value = "a";
}
-----------------------
2. Stop (by kill or Ctrl-C) two stap processes.
This issue is reproduced by following script.
--- procfs_panic_reproduce.sh -----------------------------
#!/bin/sh
while :
do
stap -e 'probe procfs("a").read{$value="a"}' &
sleep 1
stap -e 'probe procfs("b").read{$value="a"}' &
sleep 1
for i in `pgrep stap`
do
kill -INT $i
sleep 1
done
done
-------------------------------------------------
Actual results:
Kernel panic sometimes occurs.
Expected results:
Kernel panic does not occur.
Business impact:
SystemTap is very usefull tool to analyze kernel behavior and it's important to
support for enterprise user.
This problem loses the convenience of SystemTap.
Hardware info:
Hardware independent
Upstream systemtap includes a fix for this, which may be backported/rebased for 5.5. Event posted on 09-30-2009 10:53am EDT by mmilgram Hi Furuta-san, We are waiting for approval. The BZ is on the GSS 5.5 proposed list. I asked jwest for more information, and he indicated that he would look into it. There is not much more that I can do. As far as I can tell, there is not even a decision between rebasing, or porting the individual fix. Sorry that I can't be more helpful. Marc Milgram Internal Status set to 'Waiting on Engineering' This event sent from IssueTracker by mmilgram issue 314430 I've tested the testcase in comment #1 against the fix in commit 83eaf9b. Everything worked correctly. This fix is not present in systemtap-0.9.7-5.el5, but could be backported. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2010-0308.html |