Bug 469754
| Summary: | kernel panic seen in ptrace_induce_signal in run of rhts test /tools/gdb/gdb-any/ | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Mike Gahagan <mgahagan> | ||||||||
| Component: | kernel | Assignee: | Oleg Nesterov <onestero> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | high | ||||||||||
| Version: | 5.3 | CC: | dzickus, jan.kratochvil, jmarchan, mgahagan, roland, syeghiay | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2009-01-20 20:03:09 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Mike Gahagan
2008-11-03 21:30:36 UTC
http://rhts.redhat.com/testlogs/34946/123078/1041331/5004118-test_log--tools-gdb-gdb-any-EXTERNALWATCHDOG.log RHTS Job 34946 i386 RHEL5.3-Server-20081103.nightly kernel-2.6.18-121.el5.i686 /tools/gdb/gdb-any BUG: unable to handle kernel NULL pointer dereference at virtual address 00000021 printing eip: c0453bdd *pde = 3c23a067 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video backlight sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport joydev sr_mod ide_cd cdrom floppy e752x_edac edac_mc e1000 serio_raw pata_sil680 sg pcspkr dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd uhci_hcd ohci_hcd ehci_hcd CPU: 3 EIP: 0060:[<c0453bdd>] Not tainted VLI EFLAGS: 00010287 (2.6.18-121.el5 #1) EIP is at ptrace_induce_signal+0x1a/0x86 eax: fffffffb ebx: 00000009 ecx: 00000009 edx: 00000000 esi: f60f79a0 edi: f7037aa0 ebp: c94a3b00 esp: f54d0f8c ds: 007b es: 007b ss: 0068 Process gdb (pid: 29353, ti=f54d0000 task=f7f4eaa0 task.ti=f54d0000) Stack: c94a3b00 00000008 f7037aa0 c0454901 f60f79a0 c0407f46 00000000 00000008 00000000 00267ff4 f54d0000 c0404f17 00000008 000072c2 00000000 00000000 00267ff4 bfedc778 ffffffda 0000007b 0000007b 0000001a b7fe5402 00000073 Call Trace: [<c0454901>] sys_ptrace+0x49a/0x64f [<c0407f46>] do_syscall_trace+0xab/0xb1 [<c0404f17>] syscall_call+0x7/0xb ======================= Code: 00 b0 01 86 82 04 05 00 00 fb 59 89 d8 5b 5e 5f c3 57 85 c9 56 89 c7 53 89 d6 8b 52 18 89 cb 74 70 83 f9 40 b8 fb ff ff ff 77 68 <8a> 42 21 a8 01 74 10 b9 01 00 00 00 89 fa 89 d8 e8 f3 9f fd ff EIP: [<c0453bdd>] ptrace_induce_signal+0x1a/0x86 SS:ESP 0068:f54d0f8c <0>Kernel panic - not syncing: Fatal exception First, I must say again I know nothing, nothing about utrace... Looking at the disassembled i386 "Code:" from comment #1, I think that it crashes because engine->data == NULL when ptrace_induce_signal() checks "if (state->syscall)". ptrace_start() checks engine->data != NULL, so it was changed in between. Perhaps the patch I'll send in a minute can help, but anyway I can't understand how can we use *state outside of ptrace_start() once it drops rcu_read_lock(). Created attachment 322638 [details]
perhaps a fix.
The patch is not tested, not even compiled, and I don't
understand this code. Good luck to everyone who might want
to test it.
I'm affraid that won't help. First I think you missed some calls to ptrace_induce_signal(). Moreover, if engine->data is NULL, there probably is a race with an other function (probably ptrace_report_death() or ptrace_report_reap() ) and the ptrace_state structure has probably already been freed. > I'm affraid that won't help. me too ;) > First I think you missed some calls to > ptrace_induce_signal(). I don't think so. It is static, and unless /usr/bin/grep is buggy we have only 2 callsites from ptrace_common(). > Moreover, if engine->data is NULL, there probably is a > race with an other function (probably ptrace_report_death() or > ptrace_report_reap() ) and the ptrace_state structure has probably already been > freed. Yes. That was my point when I said (comment #2) "anyway I can't understand how can we use *state outside of ptrace_start() once it drops rcu_read_lock()." However, please note that ptrace_common does use the "struct ptrace_state *state" arg, this __probabl__ means ptrace_induce_signal() can use it too. (In reply to comment #5) > > First I think you missed some calls to > > ptrace_induce_signal(). > > I don't think so. It is static, and unless /usr/bin/grep is > buggy we have only 2 callsites from ptrace_common(). Mea culpa. For some reason I can not explain to myself, I've seen more than two calls, but you're right. > > > Moreover, if engine->data is NULL, there probably is a > > race with an other function (probably ptrace_report_death() or > > ptrace_report_reap() ) and the ptrace_state structure has probably already been > > freed. > > Yes. That was my point when I said (comment #2) "anyway I can't > understand how can we use *state outside of ptrace_start() once > it drops rcu_read_lock()." > > However, please note that ptrace_common does use the > "struct ptrace_state *state" arg, this __probabl__ means > ptrace_induce_signal() can use it too. Since rcu_read_lock() is not held, I see nothing preventing ptrace_report_death() or ptrace_report_reap() to free the structure. > > However, please note that ptrace_common does use the > > "struct ptrace_state *state" arg, this __probabl__ means > > ptrace_induce_signal() can use it too. > > Since rcu_read_lock() is not held, I see nothing preventing > ptrace_report_death() or ptrace_report_reap() to free the structure. I am afraid we misunderstood each other. Yes! This exactly was my point, you should not explain this to me ;) sys_ptrace()->ptrace_start(..., &state, ...) reads engine->data and populates "struct ptrace_state *state" and drops rcu lock. After that this "state" is used by sys_ptrace(), ptrace_common(), etc. And I don't understand why this is safe too. Yes, ptrace_start() waits for TASK_TRACED/TASK_STOPPED but I can't see how this can help because I am absolutely knew to utrace. So, either the whole code is completely broken, or the patch makes sense. There is certainly nothing wrong with that patch. But I think you are right that there is a general problem with races between sys_ptrace and a dying target thread doing ptrace_done. In the usual case, the quiescence checks in ptrace_start make sure that the target thread can't do anything. The only exception should be SIGKILL (including MT exec et al). That can lead to ptrace_report_death or ptrace_report_reap calling ptrace_done while sys_ptrace runs. In ptrace_detach, the utrace_detach return value check guarantees those won't callbacks won't run. But other paths using *state are vulnerable. It might be necessary to switch struct ptrace_state to using an atomic_t ref count instead of just RCU. > It might be necessary to switch struct ptrace_state to using an atomic_t ref
> count instead of just RCU.
I guess you mean using both rcu and refcnt, otherwise it is not easy
to convert all users. In that case the patch looks very simple, something
like
--- /kernel/ptrace.c~ 2008-10-30 12:07:45.000000000 +0100
+++ kernel/ptrace.c 2008-11-12 19:09:19.000000000 +0100
@@ -199,12 +199,18 @@ ptrace_setup(struct task_struct *target,
return state;
}
^
+static void __ptrace_state_free(struct ptrace_state *state)
+{
+ if (state && atomic_dec_and_test(&state->refcnt)
+ kfree(state);
+}
+
static void
ptrace_state_free(struct rcu_head *rhead)
{
struct ptrace_state *state = container_of(rhead,
struct ptrace_state, rcu);
- kfree(state);
+ do_ptrace_state_free(state);
}
^
static void
@@ -835,7 +841,7 @@ ptrace_start(long pid, long request,
{
struct task_struct *child;
struct utrace_attached_engine *engine;
- struct ptrace_state *state;
+ struct ptrace_state *state = NULL;
int ret;
^
NO_LOCKS;
@@ -906,6 +912,7 @@ ptrace_start(long pid, long request,
ret = -ESRCH; /* Return value for exit_state bail-out. */
}
^
+ atomic_inc(&state->refcnt);
rcu_read_unlock();
^
NO_LOCKS;
@@ -949,6 +956,7 @@ out_tsk_rcu:
out_tsk:
NO_LOCKS;
put_task_struct(child);
+ __ptrace_state_free(state);
out:
return ret;
}
@@ -1157,6 +1165,7 @@ out_tsk:
NO_LOCKS;
put_task_struct(child);
out:
+ __ptrace_state_free(state);
pr_debug("%d ptrace -> %lx\n", current->pid, ret);
return ret;
}
and of course we still need the previous patch I sent.
But what about ->engine? It is also owned by tracee, not by tracer,
right?
Right, something about like that is what I meant. wrt the engine pointer, you need to extract engine->data under rcu_read_lock (your previous patch covers that). Aside from that, the only uses of the engine pointer are in calls to utrace_* (ptrace_induce_signal and ptrace_update). To be safe, we should be doing each of those utrace_* calls under rcu_read_lock. Inside that rcu_read_lock, we must check state to see if the ptrace_report_death or ptrace_report_reap path to ptrace_done has been taken (could have them clear state->engine), meaning the engine pointer might already be bad. grep, grep, grep... No, still don't understand this code. But we seem to never clear state->engine? How can we check that engine is still valid? Perhaps we can change ptrace_done() to clear state->engine? In that case, can't we just kill the cached "engine" which is used in sys_ptrace() pathes? We can always use state->engine under rcu... As for ptrace_induce_signal/ptrace_detach, yes it looks like we can call them under rcu. But this only covers ptrace_common(). What about arch_ptrace() ? rcu is not an option here, but unless I missed something engine is not actually used, right? > But we seem to never clear state->engine? How can we check that > engine is still valid? > > Perhaps we can change ptrace_done() to clear state->engine? No, we can't afaics :( ptrace_detach() calls rcu_engine_free(engine) first, then it does ptrace_done(state). This means we can't validate ->engine under RCU. Well, we can add another counter, but this is soooo ugly... > Well, we can add another counter, but this is soooo ugly...
Still I am sending the untested and extremely ugly patches,
on top of the first one. sys_compact_ptrace() is not changed
yet.
Perhaps we can avoid ptrace_state->refcnt and use get_ptrace_state()
in sys_ptrace() pathes, but I don't understand the code enough.
Created attachment 324194 [details]
2/3 make ptrace_state refcountable
make ptrace_state refcountable
Created attachment 324197 [details]
3/3 make make utrace_attached_engine refcountable
make make utrace_attached_engine refcountable
> Still I am sending the untested and extremely ugly patches, For some reason Jerome dislikes the memory leak which was added as a bonus. Pedant. Updated and hopefully fixed patches: http://post-office.corp.redhat.com/archives/rhkernel-list/2008-November/msg00807.html http://post-office.corp.redhat.com/archives/rhkernel-list/2008-November/msg00808.html http://post-office.corp.redhat.com/archives/rhkernel-list/2008-November/msg00809.html Tested with /tools/gdb/gdb-any (thanks Anton!), no crashes. The test has failures, but there are not realated. See: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=37755 in kernel-2.6.18-126.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |