Bug 227693 - ptrace SINGLESTEP resets SIGTRAP handler
Summary: ptrace SINGLESTEP resets SIGTRAP handler
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 6
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On: 205659
Blocks: 173278
TreeView+ depends on / blocked
 
Reported: 2007-02-07 16:49 UTC by Mark Wielaard
Modified: 2007-11-30 22:11 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2007-03-09 00:54:46 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
testcase (5.28 KB, text/x-csrc)
2007-02-07 16:49 UTC, Mark Wielaard
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 3997 0 None None None Never

Description Mark Wielaard 2007-02-07 16:49:22 UTC
Description of problem:

When using ptrace() SINGLESTEP to go through a SIGTRAP handler of a child
process the child signal handler is reset to SIG_DLF.

Version-Release number of selected component (if applicable):

2.6.19-1.2895.fc6

How reproducible:

Always

Steps to Reproduce:
1. gcc -Werror -Wall -g -O -o ptrace_step_sig ptrace_step_sig.c
2. ./ptrace_step_sig
3.
  
Actual results:

SIGTRAP handler reset (2)!
child not properly exited

Expected results:

child properly exiting and SIGTRAP handler not being reset.

Additional info:

This comes from the Frysk project.
http://sourceware.org/bugzilla/show_bug.cgi?id=3997

This does not happen with some older kernels like 2.6.17-1.2174_FC5.
This does not happen to other handlers like SIGHUP.
This does not happen when using ptrace() CONT into the signal handler (as the
attached testcase shows).

Comment 1 Mark Wielaard 2007-02-07 16:49:23 UTC
Created attachment 147568 [details]
testcase

Comment 2 Chuck Ebbert 2007-02-07 17:17:25 UTC
Happens on vanilla 2.6.18.6 from kernel.org, too:

$ gcc -o ptrace_step_sig ptrace_step_sig.c
$ ./ptrace_step_sig
SIGTRAP handler reset (2)!
child not properly exited
$ uname -a
Linux ac.ebbert.com 2.6.18.6-32smp #6 SMP Thu Dec 21 20:46:58 EST 2006 i686
athlon i386 GNU/Linux

Does not happen on 2.6.16.35


Comment 3 Chris Moller 2007-03-08 14:15:08 UTC
Just as a bit of a blog, and as notes to myself, here's what's happening so far:

Presumably (I haven't checked yet, so it's "presumably") as a result of the
ptrace (PTRACE_SINGLESTEP, pid, 0, SIGTRAP); in the testcase,
kernel/utrace.c:utrace_signal_handler_singlestep() is called.  Something in
there (again, I haven't followed that path yet) results in a call to 

    arch/i386/kernel/traps.c:do_debug()

which calls 

    arch/i386/kernel/ptrace.c:send_sigtrap(SIGTRAP,...)

which calls 

    kernel/signal.c:force_sig_info()

which then sets 

    action->sa.sa_handler = SIG_DFL;

if the current action is blocked--the handler up to that point was correctly
pointing at the testcase handler;

A comment in kernel/signal.c reads:

/*
 * Force a signal that the process can't ignore: if necessary
 * we unblock the signal and change any SIG_IGN to SIG_DFL.
 *
 * Note: If we unblock the signal, we always reset it to SIG_DFL,
 * since we do not want to have a signal handler that was blocked
 * be invoked when user space had explicitly blocked it.
 *
 * We don't want to have recursive SIGSEGV's etc, for example.
 */

so I guess the behaviour is deliberate.

It will take me more poking to figure out what, if anything, should be done
about this.  I'm going to guess though that since PTRACE_SINGLESTEP results in
the child looking like it's been stopped by a SIGTRAP, and in the testcase a
non-SIG_DFL handler is being set by the child on SIGTRAP, there's a bit of
confusion.

Comment 4 Andrew Cagney 2007-03-08 14:23:18 UTC
Nice research, yes, very much sounds like the change was deliberate.  What's the
history of that change?

Was the testcase ever posted to lkml with a heads-up this is broken?


Comment 5 Mark Wielaard 2007-03-08 15:36:12 UTC
(In reply to comment #4)
> Nice research, yes, very much sounds like the change was deliberate.  What's the
> history of that change?
> 
> Was the testcase ever posted to lkml with a heads-up this is broken?

I don't believe anybody did yet.

Comment 6 Roland McGrath 2007-03-09 00:54:46 UTC
Sorry, I did not look at this test case before now.
ptrace (PTRACE_SINGLESTEP, pid, 0, SIGTRAP) is never going to work with a
SIGTRAP signal handler unless it uses the SA_NODEFER bit in sa_flags.  Entering
a signal handler blocks the signal being handled (unless you use SA_NODEFER).

Blocked signals do not get reported to ptrace, or dealt with any other way,
until they are unblocked.  For this reason, machine traps that generate signals
use the force_sig* calls in the kernel to ensure that if the trap's signal won't
be handled by a debugger or signal handler because it's blocked, it doesn't just
resume executing the machine code that caused the trap, but resets the handler
and unblocks the signal to ensure the process crashes (unless the debugger
swallows the signal later).  This is the only safe thing to do, even when there
is a debugger that might well swallow the signal it cannot be sure that it won't
be part of a spinning loop including the debugger instead of just a spinning
loop with the signal handler.  So even when ptrace would in fact eat the signal
later, it's already reset the handler (and unblocked the signal) earlier before
it could take that risk.

This will never change with the ptrace interface, because the signal is the only
way that debugger-requested stepping is delivered.  Even with the current utrace
world via a different interface, the signal is used the same way and the same
issues apply.

In newer utrace refinements in a while, the debugger-requested single-stepping
will be reported without using a signal, so things using the new style will not
be affected by this (and by some other problems involved with debugger-induced
signals).  This still won't affect ptrace, which will use the signal as it
always has.  

Comment 7 Roland McGrath 2007-03-09 00:56:37 UTC
If you still see any issues, e.g. using SA_NODEFER, then you might be seeing bug
205659.  But the behavior of your test case as described is not a bug.

Comment 8 Mark Wielaard 2007-03-09 10:26:49 UTC
Indeed setting action.sa_flags = SA_NODEFER makes the stepping into the sig trap
handler work as expected. So the fact that this worked on older kernels was a bug?

Comment 9 Roland McGrath 2007-03-09 11:35:48 UTC
Yes, older kernels would unblock a signal in force_sig_info and then run the
user handler, which is wrong as the user handler should never be called when the
user blocked the signal.  

Comment 10 Mark Wielaard 2007-03-09 12:24:13 UTC
That does make sense.

So I guess the only way around this is figuring out the trap signal handler we
want to step into and setting a breakpoint there. But that might then also not
work because we are then already in the user handler, so the trap event
generated by it will be blocked.

Would it make sense to treat the sig trap handler as if it was registered with
SA_NODEFER if the process is traced?

Comment 11 Andrew Cagney 2007-03-09 15:16:34 UTC
That doesn't work.

Signal delivery using ptrace needs to be an atomic.  That is why ptrace was
modified so that ptrace(step,signal) would stop at the first instruction of the
handler - previously it behaved like ptrace(continue,signal).  The atomic
requirement is to ensure that the handler state isn't modified by another thread
while the debugger is attempting to query it.

For this specific case, I gather that the testcase didn't set up the handler
correctly.  With that fixed the behavior is as expected and no further changes
are required.
(In reply to comment #10)
> That does make sense.
> 
> So I guess the only way around this is figuring out the trap signal handler we
> want to step into and setting a breakpoint there. But that might then also not
> work because we are then already in the user handler, so the trap event
> generated by it will be blocked.
> 
> Would it make sense to treat the sig trap handler as if it was registered with
> SA_NODEFER if the process is traced?



Comment 12 Mark Wielaard 2007-03-09 16:35:33 UTC
(In reply to comment #11)
> For this specific case, I gather that the testcase didn't set up the handler
> correctly.  With that fixed the behavior is as expected and no further changes
> are required.

The testcase did set up the handler correctly. It just didn't specify SA_NODEFER
(which a user program being traced would mot likely also not do).

So the real question now is how to we simulate stepping into the user trap
handler using ptrace. As described in comment #10:

> So I guess the only way around this is figuring out the trap signal handler we
> want to step into and setting a breakpoint there. But that might then also not
> work because we are then already in the user handler, so the trap event
> generated by it will be blocked.
> 
> Would it make sense to treat the sig trap handler as if it was registered with
> SA_NODEFER if the process is traced?

And that last sentence should probably be amended by: "and if a ptrace cont,
step or signal is done into the trap handler"


Note You need to log in before you can comment on or make changes to this bug.