Bug 510282

Summary:	systemtap panics kernel when killing concurrent staprun processes
Product:	Red Hat Enterprise Linux 5	Reporter:	Marc Milgram <mmilgram>
Component:	systemtap	Assignee:	Frank Ch. Eigler <fche>
Status:	CLOSED ERRATA	QA Contact:	BaseOS QE <qe-baseos-auto>
Severity:	high	Docs Contact:
Priority:	high
Version:	5.3	CC:	dsmith, fche, mjw, mmilgram, pmuller, tao
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	490234	Environment:
Last Closed:	2010-03-30 09:05:37 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	515829
Bug Blocks:	499522

Description Marc Milgram 2009-07-08 15:08:49 UTC

+++ This bug was initially created as a clone of Bug #490234 +++

Description of problem: apparent race condition in runtime/procfs.c:_stp_rmdir_proc_module when killing two staprun processes with a kill -9.


Version-Release number of selected component (if applicable): 

RHEL4U6
systemtap-0.6.2-1.el4
kernel-2.6.9-67.0.15.ELsmp


How reproducible: Unsure


Steps to Reproduce:
1. Start two systemtap scripts at the same time
2. Use kill -9 with multiple process ids on the command line to kill at the same time
3. See if you get the panic
  
Actual results:
Panic

Expected results:
No panic

Additional info:

Crash occurs in fs/proc/generic.c:remove_proc_entry, calling the BUG_ON(de->subdir) macro. 

Here is the code fragment:

        for (p = &parent->subdir; *p; p=&(*p)->next ) {
                if (!proc_match(len, fn, *p))
                        continue;
                de = *p;
                *p = de->next;
                de->next = NULL;
                if (S_ISDIR(de->mode))
                        parent->nlink--;
                proc_kill_inodes(de);
                de->nlink = 0;
                BUG_ON(de->subdir);
                if (!atomic_read(&de->count))
                        free_proc_entry(de);
                else {
                        de->deleted = 1;
                        printk("remove_proc_entry: %s/%s busy, count=%d\n",
                                parent->name, de->name, atomic_read(&de->count)\
);
                }
                break;


 SYSTEM MAP:
/var/crash/crash_analysis/debug_x86_64/boot/System.map-2.6.9-67.0.15.ELsmp
DEBUG KERNEL:
/var/crash/crash_analysis/debug_x86_64/usr/lib/debug/lib/modules/2.6.9-67.0.15.ELsmp/vmlinux
(2.6.9-67.0.15.ELsmp)
   DUMPFILE: /var/crash/crash_dumps/ap12.1236890303.dmp
       CPUS: 8
       DATE: Thu Mar 12 16:38:23 2009
     UPTIME: 25 days, 02:31:04
LOAD AVERAGE: 7.48, 8.65, 8.42
      TASKS: 916
   NODENAME: chip6.acs.census.gov
    RELEASE: 2.6.9-67.0.15.ELsmp
    VERSION: #1 SMP Tue Apr 22 13:58:43 EDT 2008
    MACHINE: x86_64  (2193 Mhz)
     MEMORY: 33 GB
      PANIC: "vmdump"
        PID: 2868
    COMMAND: "staprun"
       TASK: 104bf77a7f0  [THREAD_INFO: 1040baf0000]
        CPU: 2
      STATE: TASK_RUNNING (PANIC)

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at generic:712
invalid operand: 0000 [1] SMP
CPU 2
Modules linked in: stap_a5058ef9256374d79b204f5489fce41b_6002(U)
stap_034dd3d28d1d58dff817bd97d399e87f_3085(U) loop sr_mod ip_vs nfsd
exportfs parport_pc lp parport autofs4 i2c_dev i2c_core lock_dlm(U)
gfs(U) lock_harness(U) nfs lockd dlm(U) cman(U) md5 ipv6 ipt_state
ip_conntrack iptable_filter ip_tables hw_random k8_edac edac_mc nfs_acl
sunrpc egenera_nmi(U) egenera_veth(U) sd_mod egenera_vscsi(U) scsi_mod
egenera_vmdump(U) egenera_dumpdev(U) egenera_ipmi(U) egenera_base(U)
egenera_virtual_bus(U) egenera_fs(U) dm_snapshot dm_zero dm_mirror ext3
jbd dm_mod
Pid: 2868, comm: staprun Tainted: PF     2.6.9-67.0.15.ELsmp
RIP: 0010:[<ffffffff801b0f4e>] <ffffffff801b0f4e>{remove_proc_entry+340}
RSP: 0000:000001040baf1e48  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000102381de500 RCX: 0000000000000000
RDX: 000000000000000a RSI: 000001021c68b219 RDI: ffffffff804f4700
RBP: 000001021c68b180 R08: ffffffffa03c71b5 R09: 0000000000000080
R10: 0000000000000080 R11: ffffffff801d232b R12: 00000102381de400
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000002a9557f440(0000) GS:ffffffff804f3680(0000) knlGS:00000000f6e03ba0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003c105142d8 CR3: 0000000438fc0000 CR4: 00000000000006e0
Process staprun (pid: 2868, threadinfo 000001040baf0000, task
00000104bf77a7f0)
Stack: ffffffffa03cdb00 0000000000000000 ffffffff803f0c60 ffffffffa03c71b5
      ffffffff803ed1a0 ffffffffa03cdb00 0000000000000000 ffffffffa03c2875
      0000010440000000 ffffffffa03c28b2
Call
Trace:<ffffffffa03c2875>{:stap_034dd3d28d1d58dff817bd97d399e87f_3085:_stp_rmdir_proc_module+182}

     
<ffffffffa03c28b2>{:stap_034dd3d28d1d58dff817bd97d399e87f_3085:_stp_unregister_ctl_channel+48}

     
<ffffffffa03c6e13>{:stap_034dd3d28d1d58dff817bd97d399e87f_3085:_stp_transport_close+30}

      <ffffffff8014ed74>{sys_delete_module+487}
<ffffffff8011036e>{tracesys+113}
      <ffffffff801103ce>{tracesys+209}

Code: 0f 0b 72 de 32 80 ff ff ff ff c8 02 8b 85 80 00 00 00 85 c0
RIP <ffffffff801b0f4e>{remove_proc_entry+340} RSP <000001040baf1e48>

crash> bt 2868
PID: 2868   TASK: 104bf77a7f0       CPU: 2   COMMAND: "staprun"
#0 [1040baf1c40] dump_execute at ffffffffa010c483
#1 [1040baf1c80] __do_vmdump at ffffffffa010c556
#2 [1040baf1ca0] try_crashdump at ffffffff8014c7dd
#3 [1040baf1cb0] die at ffffffff80111c04
#4 [1040baf1cd0] do_invalid_op at ffffffff80111fcc
#5 [1040baf1d90] error_exit at ffffffff80110d91
   [exception RIP: remove_proc_entry+340]
   RIP: ffffffff801b0f4e  RSP: 000001040baf1e48  RFLAGS: 00010202
   RAX: 0000000000000000  RBX: 00000102381de500  RCX: 0000000000000000
   RDX: 000000000000000a  RSI: 000001021c68b219  RDI: ffffffff804f4700
   RBP: 000001021c68b180   R8: ffffffffa03c71b5   R9: 0000000000000080
   R10: 0000000000000080  R11: ffffffff801d232b  R12: 00000102381de400
   R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
   ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
#6 [1040baf1e40] remove_proc_entry at ffffffff801b0f3f
#7 [1040baf1e80] _stp_rmdir_proc_module at ffffffffa03c2875
#8 [1040baf1e90] _stp_unregister_ctl_channel at ffffffffa03c28b2
#9 [1040baf1ec0] _stp_transport_close at ffffffffa03c6e13
#10 [1040baf1ee0] sys_delete_module at ffffffff8014ed74
#11 [1040baf1f80] tracesys at ffffffff801103ce (via system_call)
   RIP: 0000003c0fbc9e79  RSP: 0000007fbffff768  RFLAGS: 00000246
   RAX: ffffffffffffffda  RBX: ffffffff801103ce  RCX: ffffffffffffffff
   RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000000000509060
   RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000200
   R10: 0000000000000004  R11: 0000000000000246  R12: 0000000000000000
   R13: 0000000000000003  R14: 0000000000000000  R15: 0000000000000000
   ORIG_RAX: 00000000000000b0  CS: 0033  SS: 002b

--- Additional comment from mmilgram on 2009-03-26 15:23:19 EDT ---

The fix for http://sourceware.org/bugzilla/show_bug.cgi?id=5042 seemed to cause this problem.

Currently, the code only checks to see that /proc/systemtap is not being used before deleting /proc/systemtap.  But, it may be unused while another module has a directory under /proc/systemtap.

The solution is to add a guard to ensure that there are no other files under /proc/systemtap.

diff --git a/runtime/procfs.c b/runtime/procfs.c
index d6b7533..dfbe2de 100644
--- a/runtime/procfs.c
+++ b/runtime/procfs.c
@@ -59,7 +59,8 @@ static void _stp_rmdir_proc_module(void)
                /* modules to use it, even though it will not show up */
                /* in directory listings. */
 
-               if (atomic_read(&_stp_proc_stap->count) == 0) {
+               if ((atomic_read(&_stp_proc_stap->count) == 0) &&
+                   (!_stp_proc_stap->subdir)) {
                        remove_proc_entry("systemtap", NULL);
                        _stp_proc_stap = NULL;
                }

--- Additional comment from dsmith on 2009-07-06 10:18:35 EDT ---

Fixed in upstream commit 83eaf9b.  Besides checking to make sure there are no other files in /proc/systemtap before removal, the transport directory is locked before removal of '/proc/systemtap/${MODULE}' also.

Comment 1 Marc Milgram 2009-07-08 15:11:53 UTC

A customer hit this issue in Red Hat Enterprise Linux 5.3:
After several stap processes including procfs probe points are stopped,
kernel panic sometimes occurs.

Kernel panic occurred at proc_match@fs/proc/generic.c:755 because of invalid pointer dereference.
We attach the file(panic_message.log) of kernel panic message.

In addition, the following BUG messages are produced to console.
This occurs because one of running stap processes remove /proc/systemtap directory,
while another stap process uses it.
This BUG messages are always produced in this situation, but kernel panic does not always occur.

-------------------------------------------------------------------------------
BUG: warning at fs/proc/generic.c:764/remove_proc_entry() (Tainted: G     )

Call Trace:
[<ffffffff80101d38>] remove_proc_entry+0x17d/0x1e3
[<ffffffff88498bc5>] :stap_6c500b418cf00cac628a4a64c97f2716_312:_stp_rmdir_proc_module+0x79/0xca
[<ffffffff88498cb0>] :stap_6c500b418cf00cac628a4a64c97f2716_312:systemtap_module_exit+0x63/0xcc
[<ffffffff88498d95>] :stap_6c500b418cf00cac628a4a64c97f2716_312:_stp_cleanup_and_exit+0x77/0x79
[<ffffffff88498e28>] :stap_6c500b418cf00cac628a4a64c97f2716_312:_stp_work_queue+0x91/0x96
[<ffffffff8004d139>] run_workqueue+0x94/0xe4
[<ffffffff800499ba>] worker_thread+0x0/0x122
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80049aaa>] worker_thread+0xf0/0x122
[<ffffffff8008a461>] default_wake_function+0x0/0xe
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032360>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff8009d909>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032262>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
-------------------------------------------------------------------------------

Version-Release number of selected component:
RHEL5.3 GA
 kernel-2.6.18-128.el5
 systemtap-0.7.2-2.el5

How reproducible:
After stopping several running stap processes including procfs probe points,
it can be reproduced in several tries.

Steps to Reproduce:
1. Start two stap scripts including procfs probe points like as follows:

  ---- procfs.stp -------
  probe procfs("a").read{
  $value = "a";
  }
  -----------------------

2. Stop (by kill or Ctrl-C) two stap processes.

  This issue is reproduced by following script.
  --- procfs_panic_reproduce.sh -----------------------------
  #!/bin/sh

  while :
  do
    stap -e 'probe procfs("a").read{$value="a"}' &
    sleep 1
    stap -e 'probe procfs("b").read{$value="a"}' &
    sleep 1
    for i in `pgrep stap`
    do
      kill -INT $i
      sleep 1
    done
  done
  -------------------------------------------------

Actual results:
Kernel panic sometimes occurs.

Expected results:
Kernel panic does not occur.

Business impact:
SystemTap is very usefull tool to analyze kernel behavior and it's important to
support for enterprise user.
This problem loses the convenience of SystemTap.

Hardware info:
Hardware independent

Comment 2 Frank Ch. Eigler 2009-09-09 18:23:24 UTC

Upstream systemtap includes a fix for this,
which may be backported/rebased for 5.5.

Comment 4 Issue Tracker 2009-09-30 14:53:22 UTC

Event posted on 09-30-2009 10:53am EDT by mmilgram

Hi Furuta-san,

We are waiting for approval.  The BZ is on the GSS 5.5 proposed list.  I
asked jwest for more information, and he indicated that he would look into
it.

There is not much more that I can do.

As far as I can tell, there is not even a decision between rebasing, or
porting the individual fix.

Sorry that I can't be more helpful.

Marc Milgram


Internal Status set to 'Waiting on Engineering'

This event sent from IssueTracker by mmilgram 
 issue 314430

Comment 5 David Smith 2009-10-01 21:21:48 UTC

I've tested the testcase in comment #1 against the fix in commit 83eaf9b.  Everything worked correctly.

This fix is not present in systemtap-0.9.7-5.el5, but could be backported.

Comment 10 errata-xmlrpc 2010-03-30 09:05:37 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0308.html