483814 – kernel BUG at kernel/ptrace.c:1068

Bug 483814 - kernel BUG at kernel/ptrace.c:1068

Summary: kernel BUG at kernel/ptrace.c:1068

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.3
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Jerome Marchand
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	487394
TreeView+	depends on / blocked

Reported:	2009-02-03 18:29 UTC by Denys Vlasenko
Modified:	2018-10-20 01:32 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-09-02 08:08:20 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Reproducer for ia64 (92.95 KB, application/octet-stream) 2009-02-03 18:34 UTC, Denys Vlasenko	no flags	Details
Reproducer for ia64, this time with correct strace binary (old one had x86 one!) (127.87 KB, application/octet-stream) 2009-02-06 13:20 UTC, Denys Vlasenko	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1243	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update	2009-09-01 08:53:34 UTC

Description Denys Vlasenko 2009-02-03 18:29:59 UTC

While working on bug 478419, Fujitsu stumbled across this kernel panic:

[root@localhost test_ia64]# ./tstrun/run2.sh.changed 
starting strace 1...
...end of iteration 1
starting strace 2...
...end of iteration 2
starting strace 3...
...end of iteration 3
starting strace 4...
...end of iteration 4
starting strace 5...
...end of iteration 5
starting strace 6...
kernel BUG at kernel/ptrace.c:1068!
strace[12847]: bugcheck! 0 [1]
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc
ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink
iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables
x_tables ipv6 xfrm_nalgo crypto_api vfat fat dm_mirror dm_multipath
scsi_dh button parport_pc lp parport joydev e100 mii sr_mod tg3 cdrom
libphy sg dm_raid45 dm_message dm_region_hash dm_log dm_mod dm_mem_cache
usb_storage lpfc scsi_transport_fc shpchp mptspi mptscsih mptbase
scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 12847, CPU 1, comm:               strace
psr : 00001010085a6010 ifs : 8000000000000918 ip  : [<a000000100104e30>]  
 Not tainted
ip is at sys_ptrace+0x4d0/0x1120
unat: 0000000000000000 pfs : 0000000000000918 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 000000000055a559
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000100104e30 b6  : a000000100011060 b7  : a0000001002de0a0
f6  : 1003e00000000000000a0 f7  : 1003e20c49ba5e353f7cf
f8  : 1003e00000000000004e2 f9  : 1003e000000000fa00000
f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db
r1  : a000000100c12950 r2  : a000000100a2ae38 r3  : a00000010095a1e0
r8  : 0000000000000027 r9  : a000000100a2ae68 r10 : a000000100a2ae68
r11 : 0000000000000000 r12 : e00000013914fe10 r13 : e000000139148000
r14 : a000000100a2ae38 r15 : 0000000000000000 r16 : a00000010095a1e8
r17 : e0000001254dfe18 r18 : 0000000000000000 r19 : 0000000000000000
r20 : a00000010085d280 r21 : a000000100a12fa8 r22 : a000000100a2ae40
r23 : a000000100a2ae40 r24 : a0000001007e1054 r25 : 0000000000000000
r26 : a0000001007e105c r27 : a0000001007e1040 r28 : a0000001007e0008
r29 : 0000005000000078 r30 : 0000000000000000 r31 : 0000000000000000

Call Trace:
 [<a000000100013ba0>] show_stack+0x40/0xa0
                                sp=e00000013914f9a0 bsp=e000000139149338
 [<a0000001000144a0>] show_regs+0x840/0x880
                                sp=e00000013914fb70 bsp=e0000001391492e0
 [<a000000100037c00>] die+0x1c0/0x2c0
                                sp=e00000013914fb70 bsp=e000000139149298
 [<a000000100037d50>] die_if_kernel+0x50/0x80
                                sp=e00000013914fb90 bsp=e000000139149268
 [<a000000100650310>] ia64_bad_break+0x270/0x4a0
                                sp=e00000013914fb90 bsp=e000000139149240
 [<a00000010000c040>] __ia64_leave_kernel+0x0/0x280
                                sp=e00000013914fc40 bsp=e000000139149240
 [<a000000100104e30>] sys_ptrace+0x4d0/0x1120
                                sp=e00000013914fe10 bsp=e000000139149178
 [<a00000010000bdd0>] __ia64_trace_syscall+0xd0/0x110
                                sp=e00000013914fe30 bsp=e000000139149178
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e000000139150000 bsp=e000000139149178
 <0>Kernel panic - not syncing: Fatal exception
--------------------------------------------------------------

Reproducible on ia64, failed to reproduce on x86_64 and i386.
Will attach the reproducer.

Comment 1 Denys Vlasenko 2009-02-03 18:34:21 UTC

Created attachment 330763 [details]
Reproducer for ia64

Recipe: on an ia64 machine:

* run ./mk.sh to compile test_select.c to test_select binary
* run ./run.sh
* watch "starting strace N, pid P..." / "...end of iteration N" messages
* after ~4000 iterations, machine freezes (tested twice on different machines)

NB: tarball includes a new strace binary, not yet released in RHEL.

Comment 3 Denys Vlasenko 2009-02-06 00:02:04 UTC

I saw it on:

intel-s6e5132-01.rhts.bos.redhat.com

hp-diablo-01.rhts.bos.redhat.com

Comment 5 Denys Vlasenko 2009-02-06 13:20:07 UTC

Created attachment 331125 [details]
Reproducer for ia64, this time with correct strace binary (old one had x86 one!)

Comment 6 Prarit Bhargava 2009-02-06 14:50:50 UTC

Un-private-ing comment #3 for dchapman (on-site HP partner engineer).

P.

Comment 7 Prarit Bhargava 2009-02-06 14:52:26 UTC

Denys, what kernel rev was this seen on?

P.

Comment 8 Doug Chapman 2009-02-06 15:27:20 UTC

I was finally able to reproduce on 2.6.18-130, it took over 6000 iters, I have other systems running the test that are over 8000 iters so evidently this is a rather narrow race condition.  I will investigate.

Comment 9 Veaceslav Falico 2009-02-06 15:38:43 UTC

Easily reproduced on 2.6.18-128 (63 iters).

Comment 10 Doug Chapman 2009-02-06 16:22:20 UTC

(In reply to comment #9)
> Easily reproduced on 2.6.18-128 (63 iters).

Interesting, which system are you using that hits this so quickly?  I have not seen it in less than 6000 iters and in bug 478419 they mention it usually taking at least 4000 iters.  It would be easier to experiment if I could find the magic system that hits this quickly.

Comment 11 Veaceslav Falico 2009-02-06 16:50:37 UTC

Using itanium 2 with 32 cores. AFAICS, as more cpu's you have, as quicker it hits.
I think it's coming form the following code:
utrace.c:760
 760                 spin_lock(&utrace->lock);
 761                 if (unlikely(rcu_dereference(target->utrace) != utrace)
 762                     || unlikely(rcu_dereference(engine->ops)
 763                                 == &dead_engine_ops)) {
 764                         /*
 765                          * By the time we got the utrace lock,
 766                          * it had been reaped or detached already.
 767                          */
 768                         spin_unlock(&utrace->lock);
 769                         utrace = ERR_PTR(-ESRCH);
 770                 }
And while getting the spinlock, something happens with the utrace. Will examine it further, if any updates appear - I'll write here.

And, veeery strange, but the vmcore's bt shows that it's not exiting from ptrace:1068, but from ptrace:449...

Comment 12 Doug Chapman 2009-02-06 18:03:31 UTC

(In reply to comment #11)
> Using itanium 2 with 32 cores. AFAICS, as more cpu's you have, as quicker it
> hits.

That's what I expected also but when I ran on the 64 cpu system back at HP it took 35000 iterations, very odd.

> 
> And, veeery strange, but the vmcore's bt shows that it's not exiting from
> ptrace:1068, but from ptrace:449...

Ok, not just me.  I have had the vmcore point to other places than expected also.  however each time it was pointing to a BUG_ON(ret != -ESRCH)

Comment 14 Doug Chapman 2009-02-06 21:04:51 UTC

I added some printk's to ptrace.c to see what ret was set to (since it is obvioulsy not -ESRCH)

I found that it is set to -114 or -EALREADY

this comes from ptrace_update which calls utrace_set_flags.  It appears -EALREADY is the "default" return value for utrace_set_flags.  So... not sure why this is an error, and if it is an error I don't understand why it is something we would panic on.

I see there are a few places where utrace_set_flags can do the return, adding some code to figure out which one we are hitting.

Comment 15 Doug Chapman 2009-02-06 22:32:37 UTC

What appears to be happening is we are trying to do a PTRACE_SINGLESTEP just as the target process exits.

in kernel/utrace.c in utrace_set_flags we have

 981         int ret = -EALREADY;
....
1042                 if (unlikely(target->exit_state)) {
1043                         read_unlock(&tasklist_lock);
1044                         spin_unlock(&utrace->lock);
1045                         return ret;
1046                 }

So, if the target process has exited we return -EALREADY

then in kernel/ptrace.c:

1066                 ret = ptrace_update(child, state, flags, 1);
1067                 if (ret)
1068                         BUG_ON(ret != -ESRCH);

(ptrace_update gets its return value from utrace_set_flags)

So, my guess is this BUG_ON is invalid, unless there is some reason why this state really is invalid.  It seems we should just change:

1067                 if (ret)
1068                         BUG_ON(ret != -ESRCH);

with

1067                 if (ret)
1068                         return ret;

or something along those lines.

Comment 16 Denys Vlasenko 2009-02-07 21:54:47 UTC

until people who actually understand utrace comment on it, I propose replacing BUG_ON with WARN_ON(ret != ESRCH && ret != EALREADY), and making ret == EALREADY act the same as ret == ESRCH.

Comment 17 Doug Chapman 2009-02-09 19:22:45 UTC

I have confirmed that this is not ia64 specific.  I was able to reproduce this on an x86_64 system running kernel-2.6.18-128.el5


Kernel BUG at kernel/ptrace.c:1068
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/class
CPU 1 
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp sg floppy i2c_i801 i2c_core e1000 ide_cd parport_pc cdrom shpchp parport serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 23377, comm: strace Not tainted 2.6.18-128.el5 #1
RIP: 0010:[<ffffffff800bfa61>]  [<ffffffff800bfa61>] ptrace_common+0x146/0x19c
RSP: 0018:ffff810004a9bf08  EFLAGS: 00010283
RAX: 00000000ffffff8e RBX: 0000000000000018 RCX: 0000000000000000
RDX: 0000000000000410 RSI: 0000000000000028 RDI: ffffffff803c3e80
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff81001ca400c0 R11: ffff81001cc76b20 R12: ffff81001a38b840
R13: ffff8100050b6fc0 R14: ffff8100035cb860 R15: 000000000e1e6920
FS:  00002b7991a49250(0000) GS:ffff81001f31a7c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002aaaf420d008 CR3: 000000001c2dc000 CR4: 00000000000006e0
Process strace (pid: 23377, threadinfo ffff810004a9a000, task ffff810003e700c0)
Stack:  0000000000000001 0000000000000018 000000000e1e6910 0000000000000000
 0000000000000000 ffffffff800c0570 0000000000000000 0000000000000018
 000000000e1e6920 ffffffff8005d229 0000000000000000 ffff81001a38b840
Call Trace:
 [<ffffffff800c0570>] sys_ptrace+0xf0/0x1f7
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 0f 0b 68 10 b6 29 80 c2 2c 04 eb 3f 48 f7 c1 80 ff ff ff bd 
RIP  [<ffffffff800bfa61>] ptrace_common+0x146/0x19c
 RSP <ffff810004a9bf08>
 <0>Kernel panic - not syncing: Fatal exception

Comment 19 Michal Nowak 2009-02-11 12:08:12 UTC

Also on i386, with -128.el5 kernel.

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6730459 (please see the EXTERNALWATCHDOG.log)

Last sub-test or RHTS test '/tools/strace/Regression/478419-threads-stop-when-strace--f-is-executed-on-multi-thread-process' caused that panic.

Comment 21 Keiichiro Tokunaga 2009-02-13 19:52:48 UTC

Any updates on this?  This is an urgent bug, but I've not seen any patch posted to rhkernel.

By the way, looking at kernel/ptrace.c, -EALREADY is handled in some places like this:

  if (unlikely(ret == -EALREADY)) // some are not wrapped with unlikely()
    ret = -ESRCH;

I'm not familiar with the ptrace code, but I'd imagine we could use the same lines for this bug case as well?  We might also need it for "case PTRACE_SETOPTIONS".

In essence, this approach does the same thing as the Doug's idea unless there is a possibility ptrace_update() returns neither 0, -EALREADY nor -ESRCH.

Comment 22 Jerome Marchand 2009-02-16 10:48:01 UTC

(In reply to comment #11)
> And, veeery strange, but the vmcore's bt shows that it's not exiting from
> ptrace:1068, but from ptrace:449...
Strange indeed as here -EALREADY is treated like -ESEARCH. Do you know which value was returned.

Comment 23 RHEL Program Management 2009-02-16 15:23:08 UTC

Updating PM score.

Comment 24 Veaceslav Falico 2009-02-17 13:29:56 UTC

(In reply to comment #17)
> I have confirmed that this is not ia64 specific.  I was able to reproduce this
> on an x86_64 system running kernel-2.6.18-128.el5

Did you use the same testcase, or something more specific?

Comment 25 Doug Chapman 2009-02-17 14:58:14 UTC

(In reply to comment #24)
> (In reply to comment #17)
> > I have confirmed that this is not ia64 specific.  I was able to reproduce this
> > on an x86_64 system running kernel-2.6.18-128.el5
> 
> Did you use the same testcase, or something more specific?

I used the test case attached here that has the x86_64 version of strace.

Comment 26 Doug Chapman 2009-02-17 14:59:32 UTC

(In reply to comment #22)
> (In reply to comment #11)
> > And, veeery strange, but the vmcore's bt shows that it's not exiting from
> > ptrace:1068, but from ptrace:449...
> Strange indeed as here -EALREADY is treated like -ESEARCH. Do you know which
> value was returned.

Look at comment #15 where I explain this in detail including showing line numbers and values returned.

Comment 27 Jerome Marchand 2009-02-17 15:53:44 UTC

(In reply to comment #26)
> (In reply to comment #22)
> > (In reply to comment #11)
> > > And, veeery strange, but the vmcore's bt shows that it's not exiting from
> > > ptrace:1068, but from ptrace:449...
> > Strange indeed as here -EALREADY is treated like -ESEARCH. Do you know which
> > value was returned.
> 
> Look at comment #15 where I explain this in detail including showing line
> numbers and values returned.

That explain the bug at ptrace:1068, not at ptrace:449 where the case -EALREADY is handled correctely.

Comment 28 Veaceslav Falico 2009-02-18 13:03:00 UTC

(In reply to comment #25)
> (In reply to comment #24)
> > (In reply to comment #17)
> > > I have confirmed that this is not ia64 specific.  I was able to reproduce this
> > > on an x86_64 system running kernel-2.6.18-128.el5
> > 
> > Did you use the same testcase, or something more specific?
> 
> I used the test case attached here that has the x86_64 version of strace.

How many cores did you use? Did you start several strace's in paralel?

...end of iteration 679334

And nothing :( on a 2.6.18-128 x86_64.

And, btw, did you get SEGFAULT's on strace binary in strace?
something like
Feb 18 13:46:00 dell-pe1950-03 kernel: test_select[32024]: segfault at 0000000041bb4a08 rip 00000038414074f0 rsp 00007fffbfc36300 error 4
RIP is always the same, and the offset is always ending with a08 (didn't dig into it).

Thank you!

Comment 29 Doug Chapman 2009-02-18 13:33:45 UTC

(In reply to comment #28)
> 
> How many cores did you use? Did you start several strace's in paralel?

I just asked rhts to give me a random system and tried it so I don't know the details.  The system was nec-em3.

I don't recall if I did several test instances in parallel but I have found that reproduces the failure more quickly on ia64 so I may have.

> 
> ...end of iteration 679334
> 
> And nothing :( on a 2.6.18-128 x86_64.
> 
> And, btw, did you get SEGFAULT's on strace binary in strace?
> something like
> Feb 18 13:46:00 dell-pe1950-03 kernel: test_select[32024]: segfault at
> 0000000041bb4a08 rip 00000038414074f0 rsp 00007fffbfc36300 error 4
> RIP is always the same, and the offset is always ending with a08 (didn't dig
> into it).

I don't recall seeing this.

Comment 30 Jerome Marchand 2009-02-19 13:26:00 UTC

Patch posted:
http://post-office.corp.redhat.com/archives/rhkernel-list/2009-February/msg00476.html

Comment 32 Veaceslav Falico 2009-03-02 13:50:58 UTC

Sorry for a long and somewhat stupid comment, but it's really hard to describe utrace workflow with my english level...

The bug occurs only when the process is in EXIT_ZOMBIE exit_state. Current implementation of utrace doesn't properly handle it.

Feb 20 20:13:25 dell-pe1950-03 kernel: 1) child->exit_state = 16, request = 24, data = 0
Feb 20 20:13:25 dell-pe1950-03 kernel: 2) child->exit_state = 16, request = 24, data = 0
Feb 20 20:13:25 dell-pe1950-03 kernel: 3) child->exit_state = 16, request = 24, data = 0
Feb 20 20:13:25 dell-pe1950-03 kernel: ZOMBIE detected.
Feb 20 20:13:25 dell-pe1950-03 kernel: set_utrace_flags = 523624, old_utrace_flags = 40, set_utrace_flags &~ old_utrace_flags = 523584, target->exit_state = 16
Feb 20 20:13:25 dell-pe1950-03 kernel: BUG_ON 1068, ret=-114, request=24, EEE=2, exit_state before ptrace_induce_signal = 16, signal number = 0
Feb 21 20:13:25 dell-pe1950-03 kernel: 
Feb 21 20:13:25 dell-pe1950-03 kernel: 4) child->exit_state = 16, request = 24, data = 0

request = 24 = PTRACE_SYSCALL
exit_state = 16 = EXIT_ZOMBIE
(set_utrace_flags &~ old_state_flags) & _UTRACE_EVENT_DEATH == 1
where DEATH_EVENTS = (UTRACE_EVENT(DEATH) | UTRACE_EVENT(QUIESCE) = _UTRACE_EVENT_DEATH | _UTRACE_EVENT_QUIESCE

Relevant parts of code:

kernel/utrace.c
 973 int EEE;
 974 int
 975 utrace_set_flags(struct task_struct *target,
...
 994         if (unlikely(IS_ERR(utrace))){
 995                 EEE = 3;
 996                 return PTR_ERR(utrace);
 997         }
...
1004         if (target->exit_state
1005             && (((flags &~ old_flags) & (UTRACE_ACTION_QUIESCE
1006                                          | UTRACE_ACTION_NOREAP
1007                                          | DEATH_EVENTS))
1008                 || ((utrace->u.exit.flags & EXIT_FLAG_DEATH)
1009                     && ((old_flags &~ flags) & DEATH_EVENTS))
1010                 || ((utrace->u.exit.flags & EXIT_FLAG_REAP)
1011                     && ((old_flags &~ flags) & UTRACE_EVENT(REAP))))) {
1012                 spin_unlock(&utrace->lock);
1013                 EEE = 1;
1014                 return ret;
...
1046                 if (unlikely(target->exit_state)) {
1047                         read_unlock(&tasklist_lock);
1048                         spin_unlock(&utrace->lock);
1049                         printk("set_utrace_flags = %lu, old_utrace_flags = %lu, set_utrace_flags &~ old_utrace_flags = %lu, target->exit_state = %ld\n",set_utrace_flags,old_utrace_flags,set_utrace_flags &~ old_utrace_flags,target->exit_state);
1050                         EEE = 2;
1051                         return ret;
...
1138         EEE = 4;
1139         return ret;
1140 }


kernel/ptrace.c
 995 
 996 extern int EEE;
 997 
 998 static int
 999 ptrace_common(long request, struct task_struct *child,
...
1050                 if(pis_exit_state == EXIT_ZOMBIE){
1051                         printk("ZOMBIE detected.\n");
1052                 }
...
1071                 if (ret){
1072                         printk("BUG_ON 1068, ret=%d, request=%ld, EEE=%d, exit_state before ptrace_induce_signal = %ld, signal number = %ld\n\n",ret,request,EEE,pis_exit_state,data);
1073                         /* BUG_ON(ret != -ESRCH); */
1074                 }
...
1095         return ret;
1096 }
1097 
1098 
1099 asmlinkage long sys_ptrace(long request, long pid, long addr, long data)
1100 {
1101         struct task_struct *child = NULL;
1102         struct utrace_attached_engine *engine = NULL;
1103         struct ptrace_state *state = NULL;
1104         long ret, val;
1105 
1106         pr_debug("%d sys_ptrace(%ld, %ld, %lx, %lx)\n",
1107                  current->pid, request, pid, addr, data);
1108 
1109         ret = ptrace_start(pid, request, &child, &engine, &state);
1110         if (ret != -EIO)
1111                 goto out;
1112         if(child->exit_state){
1113                 printk("1) child->exit_state = %ld, request = %ld, data = %ld\n",child->exit_state, request, data);
1114         }
1115         val = 0;
1116         ret = arch_ptrace(&request, child, engine, addr, data, &val);
1117         if(child->exit_state){
1118                 printk("2) child->exit_state = %ld, request = %ld, data = %ld\n",child->exit_state, request, data);
1119         }
...
1128         switch (request) {
1129         default:
1130                 if(child->exit_state){
1131                         printk("3) child->exit_state = %ld, request = %ld, data = %ld\n",child->exit_state, request, data);
1132                 }
1133                 ret = ptrace_common(request, child, engine, state, addr, data);
1134                 if(child->exit_state){
1135                         printk("4) child->exit_state = %ld, request = %ld, data = %ld\n",child->exit_state, request, data);
1136                 }
1137                 break;

There are also a lot of other small cases, when the process to which we attach is already EXIT_ZOMBIE or EXIT_DEAD, but they are handled smoothly. 

So, it happens only when, in sys_ptrace, the process we get is already in EXIT_ZOMBIE state and there is a PTRACE_SYSCALL request.

From the ptrace_common, we already see that a bug was fixed against PTRACE_DETACH event. However, there is a possibility that we ptrace_update from PTRACE_SETOPTIONS and PTRACE_OLDSETOPTIONS, where also the bug will be triggered (ptrace_common -> ptrace_update -> utrace_update -> return -EALREADY on zombie).

So, from how I understand the code, we should also change the BUG_ON from PTRACE_SETOPTIONS and PTRACE_OLDSETOPTIONS. The best option, from my point of view, would be to verify if the process is in EXIT_ZOMBIE state in sys_ptrace, and to call ptrace_update on EXIT_ZOMBIE only when there is PTRACE_KILL request. However, I don't know the code enough to say it for sure. 
Could somebody, who knows better the code, affirm it?

Comment 33 Roland McGrath 2009-03-02 20:26:43 UTC

ptrace_update has no useful effect when the task is already dead (EXIT_ZOMBIE).

Comment 34 Don Zickus 2009-03-04 20:01:38 UTC

in kernel-2.6.18-133.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 41 errata-xmlrpc 2009-09-02 08:08:20 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.