Bug 469754 - kernel panic seen in ptrace_induce_signal in run of rhts test /tools/gdb/gdb-any/
kernel panic seen in ptrace_induce_signal in run of rhts test /tools/gdb/gdb-...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
high Severity high
: rc
: ---
Assigned To: Oleg Nesterov
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-03 16:30 EST by Mike Gahagan
Modified: 2009-01-20 15:03 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 15:03:09 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
perhaps a fix. (936 bytes, patch)
2008-11-05 15:01 EST, Oleg Nesterov
no flags Details | Diff
2/3 make ptrace_state refcountable (1.91 KB, patch)
2008-11-20 11:41 EST, Oleg Nesterov
no flags Details | Diff
3/3 make make utrace_attached_engine refcountable (2.16 KB, patch)
2008-11-20 11:42 EST, Oleg Nesterov
no flags Details | Diff

  None (edit)
Description Mike Gahagan 2008-11-03 16:30:36 EST
Description of problem:
after the gdb-any test was running for approximately 1.5 hours, received this panic in the -121 kernel.

Unable to handle kernel NULL pointer dereference at 0000000000000041 RIP: 
 [<ffffffff800bf655>] ptrace_induce_signal+0x1b/0x8f
PGD 18fa1067 PUD 1c9b6067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:1c.5/0000:04:00.0/irq
CPU 1 
Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg i2c_i801 ide_cd tg3 i3000_edac i2c_core shpchp libphy edac_mc floppy cdrom serio_raw pcspkr dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 30075, comm: gdb Not tainted 2.6.18-121.el5 #1
RIP: 0010:[<ffffffff800bf655>]  [<ffffffff800bf655>] ptrace_induce_signal+0x1b/0x8f
RSP: 0018:ffff810016f39ef8  EFLAGS: 00010287
RAX: 00000000fffffffb RBX: 0000000000000008 RCX: ffff8100243b2240
RDX: 0000000000000000 RSI: ffff810016b3f540 RDI: ffff810031d7f820
RBP: 00002b1a00896fd0 R08: 0000000000000009 R09: 0000000000000000
R10: ffff81003a9aa400 R11: ffff8100243b2240 R12: ffff810016b3f540
R13: ffff8100243b2240 R14: ffff810031d7f820 R15: 0000000000000000
FS:  00002b1a00897020(0000) GS:ffff81003fe017c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000041 CR3: 0000000015424000 CR4: 00000000000006e0
Process gdb (pid: 30075, threadinfo ffff810016f38000, task ffff81003f12c7a0)
Stack:  0000000000000000 ffffffff800bf79f 0000000000000000 00002b1a00896fd0
 0000000000000000 000000000045ba50 0000000000000000 ffffffff800c0315
 0000000000000000 0000000000000008 0000000000000000 ffffffff8005d229
Call Trace:
 [<ffffffff800bf79f>] ptrace_common+0xd6/0x193
 [<ffffffff800c0315>] sys_ptrace+0xf0/0x1ed
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 8a 42 41 a8 01 74 12 48 89 fe ba 01 00 00 00 44 89 c7 e8 93 
RIP  [<ffffffff800bf655>] ptrace_induce_signal+0x1b/0x8f
 RSP <ffff810016f39ef8>
CR2: 0000000000000041
 <0>Kernel panic - not syncing: Fatal exception


Version-Release number of selected component (if applicable):
RHEL5.3-Server-20081103.nightly w/ -121 kernel

How reproducible:
unknown, I do not think this panic has happened before.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
logs available at:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=4944911
rhts jobid 34586
Comment 1 Jan Kratochvil 2008-11-05 10:03:36 EST
http://rhts.redhat.com/testlogs/34946/123078/1041331/5004118-test_log--tools-gdb-gdb-any-EXTERNALWATCHDOG.log
RHTS Job 34946
i386
RHEL5.3-Server-20081103.nightly
kernel-2.6.18-121.el5.i686
/tools/gdb/gdb-any

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000021
 printing eip:
c0453bdd
*pde = 3c23a067
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Modules linked in: nfs lockd fscache nfs_acl autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video backlight sbs i2c_ec i2c_core button battery asus_acpi ac parport_pc lp parport joydev sr_mod ide_cd cdrom floppy e752x_edac edac_mc e1000 serio_raw pata_sil680 sg pcspkr dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata megaraid_mbox sd_mod scsi_mod megaraid_mm ext3 jbd uhci_hcd ohci_hcd ehci_hcd
CPU:    3
EIP:    0060:[<c0453bdd>]    Not tainted VLI
EFLAGS: 00010287   (2.6.18-121.el5 #1) 
EIP is at ptrace_induce_signal+0x1a/0x86
eax: fffffffb   ebx: 00000009   ecx: 00000009   edx: 00000000
esi: f60f79a0   edi: f7037aa0   ebp: c94a3b00   esp: f54d0f8c
ds: 007b   es: 007b   ss: 0068
Process gdb (pid: 29353, ti=f54d0000 task=f7f4eaa0 task.ti=f54d0000)
Stack: c94a3b00 00000008 f7037aa0 c0454901 f60f79a0 c0407f46 00000000 00000008 
       00000000 00267ff4 f54d0000 c0404f17 00000008 000072c2 00000000 00000000 
       00267ff4 bfedc778 ffffffda 0000007b 0000007b 0000001a b7fe5402 00000073 
Call Trace:
 [<c0454901>] sys_ptrace+0x49a/0x64f
 [<c0407f46>] do_syscall_trace+0xab/0xb1
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code: 00 b0 01 86 82 04 05 00 00 fb 59 89 d8 5b 5e 5f c3 57 85 c9 56 89 c7 53 89 d6 8b 52 18 89 cb 74 70 83 f9 40 b8 fb ff ff ff 77 68 <8a> 42 21 a8 01 74 10 b9 01 00 00 00 89 fa 89 d8 e8 f3 9f fd ff 
EIP: [<c0453bdd>] ptrace_induce_signal+0x1a/0x86 SS:ESP 0068:f54d0f8c
 <0>Kernel panic - not syncing: Fatal exception
Comment 2 Oleg Nesterov 2008-11-05 14:57:15 EST
First, I must say again I know nothing, nothing about utrace...

Looking at the disassembled i386 "Code:" from comment #1,
I think that it crashes because engine->data == NULL when
ptrace_induce_signal() checks "if (state->syscall)".

ptrace_start() checks engine->data != NULL, so it was changed
in between.

Perhaps the patch I'll send in a minute can help, but anyway I
can't understand how can we use *state outside of ptrace_start()
once it drops rcu_read_lock().
Comment 3 Oleg Nesterov 2008-11-05 15:01:13 EST
Created attachment 322638 [details]
perhaps a fix.

The patch is not tested, not even compiled, and I don't
understand this code. Good luck to everyone who might want
to test it.
Comment 4 Jerome Marchand 2008-11-06 06:03:12 EST
I'm affraid that won't help. First I think you missed some calls to ptrace_induce_signal(). Moreover, if engine->data is NULL, there probably is a race with an other function (probably ptrace_report_death() or ptrace_report_reap() ) and the ptrace_state structure has probably already been freed.
Comment 5 Oleg Nesterov 2008-11-06 06:17:16 EST
> I'm affraid that won't help.

me too ;)

> First I think you missed some calls to
> ptrace_induce_signal().

I don't think so. It is static, and unless /usr/bin/grep is
buggy we have only 2 callsites from ptrace_common().

> Moreover, if engine->data is NULL, there probably is a
> race with an other function (probably ptrace_report_death() or
> ptrace_report_reap() ) and the ptrace_state structure has probably already been
> freed.

Yes. That was my point when I said (comment #2) "anyway I can't
understand how can we use *state outside of ptrace_start() once
it drops rcu_read_lock()."

However, please note that ptrace_common does use the
"struct ptrace_state *state" arg, this __probabl__ means
ptrace_induce_signal() can use it too.
Comment 6 Jerome Marchand 2008-11-06 10:41:24 EST
(In reply to comment #5)
> > First I think you missed some calls to
> > ptrace_induce_signal().
> 
> I don't think so. It is static, and unless /usr/bin/grep is
> buggy we have only 2 callsites from ptrace_common().
Mea culpa. For some reason I can not explain to myself, I've seen more than two calls, but you're right.

> 
> > Moreover, if engine->data is NULL, there probably is a
> > race with an other function (probably ptrace_report_death() or
> > ptrace_report_reap() ) and the ptrace_state structure has probably already been
> > freed.
> 
> Yes. That was my point when I said (comment #2) "anyway I can't
> understand how can we use *state outside of ptrace_start() once
> it drops rcu_read_lock()."
> 
> However, please note that ptrace_common does use the
> "struct ptrace_state *state" arg, this __probabl__ means
> ptrace_induce_signal() can use it too.

Since rcu_read_lock() is not held, I see nothing preventing ptrace_report_death() or ptrace_report_reap() to free the structure.
Comment 7 Oleg Nesterov 2008-11-06 10:58:11 EST
> > However, please note that ptrace_common does use the
> > "struct ptrace_state *state" arg, this __probabl__ means
> > ptrace_induce_signal() can use it too.
>
> Since rcu_read_lock() is not held, I see nothing preventing
> ptrace_report_death() or ptrace_report_reap() to free the structure.

I am afraid we misunderstood each other.

Yes! This exactly was my point, you should not explain this to me ;)

sys_ptrace()->ptrace_start(..., &state, ...) reads engine->data and
populates "struct ptrace_state *state" and drops rcu lock. After that
this "state" is used by sys_ptrace(), ptrace_common(), etc. And I
don't understand why this is safe too. Yes, ptrace_start() waits
for TASK_TRACED/TASK_STOPPED but I can't see how this can help because
I am absolutely knew to utrace.

So, either the whole code is completely broken, or the patch makes
sense.
Comment 8 Roland McGrath 2008-11-11 19:46:39 EST
There is certainly nothing wrong with that patch.

But I think you are right that there is a general problem with races between sys_ptrace and a dying target thread doing ptrace_done.

In the usual case, the quiescence checks in ptrace_start make sure that the target thread can't do anything.  The only exception should be SIGKILL (including MT exec et al).  That can lead to ptrace_report_death or ptrace_report_reap calling ptrace_done while sys_ptrace runs.  In ptrace_detach, the utrace_detach return value check guarantees those won't callbacks won't run.  But other paths using *state are vulnerable.

It might be necessary to switch struct ptrace_state to using an atomic_t ref count instead of just RCU.
Comment 9 Oleg Nesterov 2008-11-12 12:21:28 EST
> It might be necessary to switch struct ptrace_state to using an atomic_t ref
> count instead of just RCU.

I guess you mean using both rcu and refcnt, otherwise it is not easy
to convert all users. In that case the patch looks very simple, something
like

        --- /kernel/ptrace.c~   2008-10-30 12:07:45.000000000 +0100
        +++ kernel/ptrace.c     2008-11-12 19:09:19.000000000 +0100
        @@ -199,12 +199,18 @@ ptrace_setup(struct task_struct *target,
                return state;
         }
        ^
        +static void __ptrace_state_free(struct ptrace_state *state)
        +{
        +       if (state && atomic_dec_and_test(&state->refcnt)
        +               kfree(state);
        +}
        +
         static void
         ptrace_state_free(struct rcu_head *rhead)
         {
                struct ptrace_state *state = container_of(rhead,
                                                          struct ptrace_state, rcu);
        -       kfree(state);
        +       do_ptrace_state_free(state);
         }
        ^
         static void
        @@ -835,7 +841,7 @@ ptrace_start(long pid, long request,
         {
                struct task_struct *child;
                struct utrace_attached_engine *engine;
        -       struct ptrace_state *state;
        +       struct ptrace_state *state = NULL;
                int ret;
        ^
                NO_LOCKS;
        @@ -906,6 +912,7 @@ ptrace_start(long pid, long request,
                        ret = -ESRCH;  /* Return value for exit_state bail-out.  */
                }
        ^
        +       atomic_inc(&state->refcnt);
                rcu_read_unlock();
        ^
                NO_LOCKS;
        @@ -949,6 +956,7 @@ out_tsk_rcu:
         out_tsk:
                NO_LOCKS;
                put_task_struct(child);
        +       __ptrace_state_free(state);
         out:
                return ret;
         }
        @@ -1157,6 +1165,7 @@ out_tsk:
                NO_LOCKS;
                put_task_struct(child);
         out:
        +       __ptrace_state_free(state);
                pr_debug("%d ptrace -> %lx\n", current->pid, ret);
                return ret;
         }

and of course we still need the previous patch I sent.

But what about ->engine? It is also owned by tracee, not by tracer,
right?
Comment 10 Roland McGrath 2008-11-14 06:16:50 EST
Right, something about like that is what I meant.

wrt the engine pointer, you need to extract engine->data under rcu_read_lock (your previous patch covers that).  Aside from that, the only uses of the engine pointer are in calls to utrace_* (ptrace_induce_signal and ptrace_update).  To be safe, we should be doing each of those utrace_* calls under rcu_read_lock.  Inside that rcu_read_lock, we must check state to see if the ptrace_report_death or ptrace_report_reap path to ptrace_done has been taken (could have them clear state->engine), meaning the engine pointer might already be bad.
Comment 11 Oleg Nesterov 2008-11-18 10:01:42 EST
grep, grep, grep... No, still don't understand this code.

But we seem to never clear state->engine? How can we check that
engine is still valid?

Perhaps we can change ptrace_done() to clear state->engine? In
that case, can't we just kill the cached "engine" which is used
in sys_ptrace() pathes? We can always use state->engine under
rcu...

As for ptrace_induce_signal/ptrace_detach, yes it looks like
we can call them under rcu. But this only covers ptrace_common().

What about arch_ptrace() ? rcu is not an option here, but unless
I missed something engine is not actually used, right?
Comment 12 Oleg Nesterov 2008-11-19 12:32:14 EST
> But we seem to never clear state->engine? How can we check that
> engine is still valid?
>
> Perhaps we can change ptrace_done() to clear state->engine?

No, we can't afaics :(

ptrace_detach() calls rcu_engine_free(engine) first, then it
does ptrace_done(state). This means we can't validate ->engine
under RCU.

Well, we can add another counter, but this is soooo ugly...
Comment 13 Oleg Nesterov 2008-11-20 11:35:44 EST
> Well, we can add another counter, but this is soooo ugly...

Still I am sending the untested and extremely ugly patches,
on top of the first one. sys_compact_ptrace() is not changed
yet.

Perhaps we can avoid ptrace_state->refcnt and use get_ptrace_state()
in sys_ptrace() pathes, but I don't understand the code enough.
Comment 14 Oleg Nesterov 2008-11-20 11:41:06 EST
Created attachment 324194 [details]
2/3 make ptrace_state refcountable

make ptrace_state refcountable
Comment 15 Oleg Nesterov 2008-11-20 11:42:14 EST
Created attachment 324197 [details]
3/3 make make utrace_attached_engine refcountable

make make utrace_attached_engine refcountable
Comment 16 Oleg Nesterov 2008-11-28 07:59:43 EST
> Still I am sending the untested and extremely ugly patches,

For some reason Jerome dislikes the memory leak which
was added as a bonus. Pedant.

Updated and hopefully fixed patches:

http://post-office.corp.redhat.com/archives/rhkernel-list/2008-November/msg00807.html
http://post-office.corp.redhat.com/archives/rhkernel-list/2008-November/msg00808.html
http://post-office.corp.redhat.com/archives/rhkernel-list/2008-November/msg00809.html

Tested with /tools/gdb/gdb-any (thanks Anton!), no crashes.
The test has failures, but there are not realated.
See: http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=37755
Comment 18 Don Zickus 2008-12-09 16:04:36 EST
in kernel-2.6.18-126.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 21 errata-xmlrpc 2009-01-20 15:03:09 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.