Description of problem: The ptrace(PT_STEP,SIGALARM) system call instead implements ptrace(PT_CONTINUE,SIGALARM). Version-Release number of selected component (if applicable): Roland says this is present in all i386 kernels. How reproducible: Always. Steps to Reproduce: In the below, target_resume(...) corresponds directly to a ptrace call. cagney@tomago$ gdb ./a.out [...] (gdb) b handler Breakpoint 1 at 0x80483bb: file sigstep.c, line 31. (gdb) list main 39 itimer_real = ITIMER_REAL, 40 itimer_virtual = ITIMER_VIRTUAL 41 } itimer = ITIMER_REAL; /* ITIMER_VIRTUAL; */ 42 43 main () 44 { 45 46 /* Set up the signal handler. */ 47 memset (&action, 0, sizeof (action)); 48 action.sa_handler = handler; (gdb) 49 sigaction (SIGVTALRM, &action, NULL); 50 sigaction (SIGALRM, &action, NULL); 51 52 /* The values needed for the itimer. This needs to be at least long 53 enough for the setitimer() call to return. */ 54 memset (&itime, 0, sizeof (itime)); 55 itime.it_value.tv_usec = 250 * 1000; 56 57 /* Loop for ever, constantly taking an interrupt. */ 58 while (1) (gdb) 59 { 60 /* Set up a one-off timer. A timer, rather than SIGSEGV, is 61 used as after a timer handler finishes the interrupted code 62 can safely resume. */ 63 setitimer (itimer, &itime, NULL); 64 /* Wait. */ 65 while (!done); 66 done = 0; 67 } 68 } (gdb) break 65 Breakpoint 2 at 0x8048456: file sigstep.c, line 65. (gdb) set debug target 1 (gdb) run Starting program: /home/cagney/tmp/sigstep/a.out [...] Breakpoint 2, main () at sigstep.c:65 65 while (!done); (gdb) step target_terminal_inferior () target_xfer_memory (0x8048456, xxx, 2, read, xxx) = 2, bytes = a1 64 Try to step off breakpoint, get back SIGALRM (ok), EIP doesn't change (ok). target_resume (27256, step, 0) target_wait (-1, status) = 27256, status->kind = stopped, signal = SIGALRM target_fetch_registers (eip) = 56840408 0x8048456 134513750 target_terminal_inferior () target_xfer_memory (0x8048456, xxx, 2, read, xxx) = 2, bytes = a1 64 Try to deliver SIGALRM, get back SIGTRAP (ok), EIP didn't change (not ok, should have been handler or signal trampoline). target_resume (27256, step, SIGALRM) target_wait (-1, status) = 27256, status->kind = stopped, signal = SIGTRAP target_fetch_registers (eip) = 56840408 0x8048456 134513750 target_xfer_memory (0x80483bb, xxx, 1, read, xxx) = 1, bytes = c7 target_xfer_memory (0x80483bb, xxx, 1, write, xxx) = 1, bytes = cc target_insert_breakpoint (0x80483bb, xxx) = 0 target_xfer_memory (0x8048456, xxx, 1, read, xxx) = 1, bytes = a1 target_xfer_memory (0x8048456, xxx, 1, write, xxx) = 1, bytes = cc target_insert_breakpoint (0x8048456, xxx) = 0 target_xfer_memory (0x4e02b0, xxx, 1, read, xxx) = 1, bytes = 55 target_xfer_memory (0x4e02b0, xxx, 1, write, xxx) = 1, bytes = cc target_insert_breakpoint (0x4e02b0, xxx) = 0 target_xfer_memory (0x4e3310, xxx, 1, read, xxx) = 1, bytes = 55 target_xfer_memory (0x4e3310, xxx, 1, write, xxx) = 1, bytes = cc target_insert_breakpoint (0x4e3310, xxx) = 0 target_xfer_memory (0x320820, xxx, 1, read, xxx) = 1, bytes = 55 target_xfer_memory (0x320820, xxx, 1, write, xxx) = 1, bytes = cc target_insert_breakpoint (0x320820, xxx) = 0 target_terminal_inferior () target_xfer_memory (0x8048457, xxx, 1, read, xxx) = 1, bytes = 64 GDB thinks it's still single stepping in main() so tries to do another single-step but with breakpoints inserted (it assumed the above managed to step off the breakpoint at 65). Since the PC didn't change, it re-hits the breakpoint at 65 causing a SIGTRAP and an off-by-one PC - decremented so that it matches the breakpoint. target_resume (-1, step, 0) target_wait (-1, status) = 27256, status->kind = stopped, signal = SIGTRAP target_fetch_registers (eip) = 57840408 0x8048457 134513751 target_prepare_to_store () target_store_registers (eip) = 56840408 0x8048456 134513750 target_xfer_memory (0x80483bb, xxx, 1, write, xxx) = 1, bytes = c7 target_remove_breakpoint (0x80483bb, xxx) = 0 target_xfer_memory (0x8048456, xxx, 1, write, xxx) = 1, bytes = a1 target_remove_breakpoint (0x8048456, xxx) = 0 target_xfer_memory (0x4e02b0, xxx, 1, write, xxx) = 1, bytes = 55 target_remove_breakpoint (0x4e02b0, xxx) = 0 target_xfer_memory (0x4e3310, xxx, 1, write, xxx) = 1, bytes = 55 target_remove_breakpoint (0x4e3310, xxx) = 0 target_xfer_memory (0x320820, xxx, 1, write, xxx) = 1, bytes = 55 target_remove_breakpoint (0x320820, xxx) = 0 target_terminal_ours () Breakpoint 2, main () at sigstep.c:65 65 while (!done); (gdb) The ptrace(PT_STEP,SIGNAL) should setup the signal and then (arguably) execute no instructions.
Created attachment 103122 [details] C program used in example
What's happening here is that signal handler setup (in all extant Linux kernels) clears the TF (single-step) bit in the processor flags. When it was set by PTRACE_SINGLESTEP, this means that the handler runs unchecked (unless it hits another bkpt or other signal) until it returns, restores TF and takes a single-step trap. This looks in gdb like the single-step didn't do anything, but in fact it ran the handler.
Created attachment 103280 [details] kernel patch I've just submitted this patch upstream for 2.6, and have not yet gotten feedback. The same fixes apply to 2.4/RHEL3 as well, but we can consider that only once upstream has agreed to change 2.6 behavior.
My fix (amended from what's attached here) has gone into 2.6 upstream. I will produce a RHEL3 backport of the fix.
Created attachment 103740 [details] patch for 2.4.21-20.EL to fix single-step behavior for signal handlers This is the backport of the changes that have already gone into 2.6 upstream.
A fix for this problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.8.EL).
Does this fix cover all architectures, or do I need to go through each in turn? I noticed that a recent ppc64 kernel was exibiting the same bug.
This issue is architecture-specific. You must determine whether the problem exists on each architecture, and file a bug for each.
Did this fix cover amd64?
The patch here covers i386 native, x86-64 native, and x86-64's i386 emulation.
I believe this fix is already in Taroon, though not sure in which updates. Ernie, please update here with the merge status.
The information is in comment #8. The fix has already been committed to U4.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-550.html
Removing the blocking of bug 117972 on this bugzilla entry to keep this bugzilla entry, bug 130995, from appearing on the RHEL3 U7 proposed list.