Red Hat Bugzilla – Bug 469754
kernel panic seen in ptrace_induce_signal in run of rhts test /tools/gdb/gdb-any/
Last modified: 2009-01-20 15:03:09 EST
Description of problem:
after the gdb-any test was running for approximately 1.5 hours, received this panic in the -121 kernel.
Unable to handle kernel NULL pointer dereference at 0000000000000041 RIP:
PGD 18fa1067 PUD 1c9b6067 PMD 0
Oops: 0000  SMP
last sysfs file: /devices/pci0000:00/0000:00:1c.5/0000:04:00.0/irq
Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg i2c_i801 ide_cd tg3 i3000_edac i2c_core shpchp libphy edac_mc floppy cdrom serio_raw pcspkr dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 30075, comm: gdb Not tainted 2.6.18-121.el5 #1
RIP: 0010:[<ffffffff800bf655>] [<ffffffff800bf655>] ptrace_induce_signal+0x1b/0x8f
RSP: 0018:ffff810016f39ef8 EFLAGS: 00010287
RAX: 00000000fffffffb RBX: 0000000000000008 RCX: ffff8100243b2240
RDX: 0000000000000000 RSI: ffff810016b3f540 RDI: ffff810031d7f820
RBP: 00002b1a00896fd0 R08: 0000000000000009 R09: 0000000000000000
R10: ffff81003a9aa400 R11: ffff8100243b2240 R12: ffff810016b3f540
R13: ffff8100243b2240 R14: ffff810031d7f820 R15: 0000000000000000
FS: 00002b1a00897020(0000) GS:ffff81003fe017c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000041 CR3: 0000000015424000 CR4: 00000000000006e0
Process gdb (pid: 30075, threadinfo ffff810016f38000, task ffff81003f12c7a0)
Stack: 0000000000000000 ffffffff800bf79f 0000000000000000 00002b1a00896fd0
0000000000000000 000000000045ba50 0000000000000000 ffffffff800c0315
0000000000000000 0000000000000008 0000000000000000 ffffffff8005d229
Code: 8a 42 41 a8 01 74 12 48 89 fe ba 01 00 00 00 44 89 c7 e8 93
RIP [<ffffffff800bf655>] ptrace_induce_signal+0x1b/0x8f
<0>Kernel panic - not syncing: Fatal exception
Version-Release number of selected component (if applicable):
RHEL5.3-Server-20081103.nightly w/ -121 kernel
unknown, I do not think this panic has happened before.
Steps to Reproduce:
logs available at:
rhts jobid 34586
RHTS Job 34946
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000021
*pde = 3c23a067
Oops: 0000 [#1]
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video backlight sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport joydev sr_mod ide_cd cdrom floppy e752x_edac edac_mc e1000 serio_raw pata_sil680 sg pcspkr dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd uhci_hcd ohci_hcd ehci_hcd
EIP: 0060:[<c0453bdd>] Not tainted VLI
EFLAGS: 00010287 (2.6.18-121.el5 #1)
EIP is at ptrace_induce_signal+0x1a/0x86
eax: fffffffb ebx: 00000009 ecx: 00000009 edx: 00000000
esi: f60f79a0 edi: f7037aa0 ebp: c94a3b00 esp: f54d0f8c
ds: 007b es: 007b ss: 0068
Process gdb (pid: 29353, ti=f54d0000 task=f7f4eaa0 task.ti=f54d0000)
Stack: c94a3b00 00000008 f7037aa0 c0454901 f60f79a0 c0407f46 00000000 00000008
00000000 00267ff4 f54d0000 c0404f17 00000008 000072c2 00000000 00000000
00267ff4 bfedc778 ffffffda 0000007b 0000007b 0000001a b7fe5402 00000073
Code: 00 b0 01 86 82 04 05 00 00 fb 59 89 d8 5b 5e 5f c3 57 85 c9 56 89 c7 53 89 d6 8b 52 18 89 cb 74 70 83 f9 40 b8 fb ff ff ff 77 68 <8a> 42 21 a8 01 74 10 b9 01 00 00 00 89 fa 89 d8 e8 f3 9f fd ff
EIP: [<c0453bdd>] ptrace_induce_signal+0x1a/0x86 SS:ESP 0068:f54d0f8c
<0>Kernel panic - not syncing: Fatal exception
First, I must say again I know nothing, nothing about utrace...
Looking at the disassembled i386 "Code:" from comment #1,
I think that it crashes because engine->data == NULL when
ptrace_induce_signal() checks "if (state->syscall)".
ptrace_start() checks engine->data != NULL, so it was changed
Perhaps the patch I'll send in a minute can help, but anyway I
can't understand how can we use *state outside of ptrace_start()
once it drops rcu_read_lock().
Created attachment 322638 [details]
perhaps a fix.
The patch is not tested, not even compiled, and I don't
understand this code. Good luck to everyone who might want
to test it.
I'm affraid that won't help. First I think you missed some calls to ptrace_induce_signal(). Moreover, if engine->data is NULL, there probably is a race with an other function (probably ptrace_report_death() or ptrace_report_reap() ) and the ptrace_state structure has probably already been freed.
> I'm affraid that won't help.
me too ;)
> First I think you missed some calls to
I don't think so. It is static, and unless /usr/bin/grep is
buggy we have only 2 callsites from ptrace_common().
> Moreover, if engine->data is NULL, there probably is a
> race with an other function (probably ptrace_report_death() or
> ptrace_report_reap() ) and the ptrace_state structure has probably already been
Yes. That was my point when I said (comment #2) "anyway I can't
understand how can we use *state outside of ptrace_start() once
it drops rcu_read_lock()."
However, please note that ptrace_common does use the
"struct ptrace_state *state" arg, this __probabl__ means
ptrace_induce_signal() can use it too.
(In reply to comment #5)
> > First I think you missed some calls to
> > ptrace_induce_signal().
> I don't think so. It is static, and unless /usr/bin/grep is
> buggy we have only 2 callsites from ptrace_common().
Mea culpa. For some reason I can not explain to myself, I've seen more than two calls, but you're right.
> > Moreover, if engine->data is NULL, there probably is a
> > race with an other function (probably ptrace_report_death() or
> > ptrace_report_reap() ) and the ptrace_state structure has probably already been
> > freed.
> Yes. That was my point when I said (comment #2) "anyway I can't
> understand how can we use *state outside of ptrace_start() once
> it drops rcu_read_lock()."
> However, please note that ptrace_common does use the
> "struct ptrace_state *state" arg, this __probabl__ means
> ptrace_induce_signal() can use it too.
Since rcu_read_lock() is not held, I see nothing preventing ptrace_report_death() or ptrace_report_reap() to free the structure.
> > However, please note that ptrace_common does use the
> > "struct ptrace_state *state" arg, this __probabl__ means
> > ptrace_induce_signal() can use it too.
> Since rcu_read_lock() is not held, I see nothing preventing
> ptrace_report_death() or ptrace_report_reap() to free the structure.
I am afraid we misunderstood each other.
Yes! This exactly was my point, you should not explain this to me ;)
sys_ptrace()->ptrace_start(..., &state, ...) reads engine->data and
populates "struct ptrace_state *state" and drops rcu lock. After that
this "state" is used by sys_ptrace(), ptrace_common(), etc. And I
don't understand why this is safe too. Yes, ptrace_start() waits
for TASK_TRACED/TASK_STOPPED but I can't see how this can help because
I am absolutely knew to utrace.
So, either the whole code is completely broken, or the patch makes
There is certainly nothing wrong with that patch.
But I think you are right that there is a general problem with races between sys_ptrace and a dying target thread doing ptrace_done.
In the usual case, the quiescence checks in ptrace_start make sure that the target thread can't do anything. The only exception should be SIGKILL (including MT exec et al). That can lead to ptrace_report_death or ptrace_report_reap calling ptrace_done while sys_ptrace runs. In ptrace_detach, the utrace_detach return value check guarantees those won't callbacks won't run. But other paths using *state are vulnerable.
It might be necessary to switch struct ptrace_state to using an atomic_t ref count instead of just RCU.
> It might be necessary to switch struct ptrace_state to using an atomic_t ref
> count instead of just RCU.
I guess you mean using both rcu and refcnt, otherwise it is not easy
to convert all users. In that case the patch looks very simple, something
--- /kernel/ptrace.c~ 2008-10-30 12:07:45.000000000 +0100
+++ kernel/ptrace.c 2008-11-12 19:09:19.000000000 +0100
@@ -199,12 +199,18 @@ ptrace_setup(struct task_struct *target,
+static void __ptrace_state_free(struct ptrace_state *state)
+ if (state && atomic_dec_and_test(&state->refcnt)
ptrace_state_free(struct rcu_head *rhead)
struct ptrace_state *state = container_of(rhead,
struct ptrace_state, rcu);
@@ -835,7 +841,7 @@ ptrace_start(long pid, long request,
struct task_struct *child;
struct utrace_attached_engine *engine;
- struct ptrace_state *state;
+ struct ptrace_state *state = NULL;
@@ -906,6 +912,7 @@ ptrace_start(long pid, long request,
ret = -ESRCH; /* Return value for exit_state bail-out. */
@@ -949,6 +956,7 @@ out_tsk_rcu:
@@ -1157,6 +1165,7 @@ out_tsk:
pr_debug("%d ptrace -> %lx\n", current->pid, ret);
and of course we still need the previous patch I sent.
But what about ->engine? It is also owned by tracee, not by tracer,
Right, something about like that is what I meant.
wrt the engine pointer, you need to extract engine->data under rcu_read_lock (your previous patch covers that). Aside from that, the only uses of the engine pointer are in calls to utrace_* (ptrace_induce_signal and ptrace_update). To be safe, we should be doing each of those utrace_* calls under rcu_read_lock. Inside that rcu_read_lock, we must check state to see if the ptrace_report_death or ptrace_report_reap path to ptrace_done has been taken (could have them clear state->engine), meaning the engine pointer might already be bad.
grep, grep, grep... No, still don't understand this code.
But we seem to never clear state->engine? How can we check that
engine is still valid?
Perhaps we can change ptrace_done() to clear state->engine? In
that case, can't we just kill the cached "engine" which is used
in sys_ptrace() pathes? We can always use state->engine under
As for ptrace_induce_signal/ptrace_detach, yes it looks like
we can call them under rcu. But this only covers ptrace_common().
What about arch_ptrace() ? rcu is not an option here, but unless
I missed something engine is not actually used, right?
> But we seem to never clear state->engine? How can we check that
> engine is still valid?
> Perhaps we can change ptrace_done() to clear state->engine?
No, we can't afaics :(
ptrace_detach() calls rcu_engine_free(engine) first, then it
does ptrace_done(state). This means we can't validate ->engine
Well, we can add another counter, but this is soooo ugly...
> Well, we can add another counter, but this is soooo ugly...
Still I am sending the untested and extremely ugly patches,
on top of the first one. sys_compact_ptrace() is not changed
Perhaps we can avoid ptrace_state->refcnt and use get_ptrace_state()
in sys_ptrace() pathes, but I don't understand the code enough.
Created attachment 324194 [details]
2/3 make ptrace_state refcountable
make ptrace_state refcountable
Created attachment 324197 [details]
3/3 make make utrace_attached_engine refcountable
make make utrace_attached_engine refcountable
> Still I am sending the untested and extremely ugly patches,
For some reason Jerome dislikes the memory leak which
was added as a bonus. Pedant.
Updated and hopefully fixed patches:
Tested with /tools/gdb/gdb-any (thanks Anton!), no crashes.
The test has failures, but there are not realated.
You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.