Bug 126089
Summary: | Kernel doesn't let GDB stepi out of a signal trampoline | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Andrew Cagney <cagney> |
Component: | kernel | Assignee: | Peter Martuccelli <peterm> |
Status: | CLOSED WONTFIX | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | dwmw2, ezannoni, jbaron, mingo, peterm, petrides, riel, roland, zaitcev |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-10-19 19:24:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 126095, 126699, 126911, 126913, 127384 | ||
Bug Blocks: | 116894, 117972, 127692, 127693 | ||
Attachments: |
Description
Andrew Cagney
2004-06-15 21:33:23 UTC
I have determined that this is not a kernel bug at all; the x86-64 kernel behaves consistently with the i386 kernel here. The way this works on i386 is that gdb does a ptrace POKEDATA on the part of the signal handler frame that contains the saved EFLAGS register, and sets the trace flag (0x100 bit) in the word that was already saved there. On x86-64, gdb is not doing this. The code in gdb/i386-linux-nat.c:child_resume can be copied and tweaked only slightly to treat x86-64 the same way. hack is a polite term for that code. It assumes a specific trap mechanism, and the whole point of vsyscall is that we get away from that assumption - let the kernel determine an arbitrary trap mechanism. Given the kernel is (well I _hope_ it is ...) already masking out other bits in that reestored register state, there's nothing stopping it also setting information. Both the hardware architecture and the kernel behavior in this area are exactly the same on x86-64 and on i386. Any change in kernel behavior should be done consistently on both platforms. Given that gdb has always coped with the situation on i386 before, upstream acceptance of changes in this area may not be forthcoming. PTRACE_SINGLESTEP works by setting the TF bit in the tracee's flags register and that is all. The very same effect is achieved if a thread sets its own TF bit--if someone is tracing it, it sees a SIGTRAP with all other details identical to having used PTRACE_SINGLESTEP. The proposal, then, is that when the TF bit is set in the flags register when the system call instruction executes, the TF bit should be set again after the action of the sigreturn system call to change all the registers including the flags, has taken place. Thus, the restored thread will immediately stop with a single-step trap at the PC restored by the system call. This might be restricted to the case when the thread is traced, which is the only one gdb is concerned with. I wonder what other platforms do about this issue; I am not familiar off hand with the hardware mechanisms used for single-stepping on other processors, so the issue may be different. Does gdb either fail to handle the problem, as on x86-64, or have to address it specially, as on i386, on any other Linux platform? I need the whole picture about how this is and should be handled across processors because I can pursue a change in behavior upstream. Unfortunatly, as more careful testing has revealed, that i386 hack didn't actually fix the problems - it just lessened the inpact :-( There look to be two underlying problems here, and they both appear to be present in on all architectures (at least the ones I've looked at - ia64, amd64, i386, PPC64) (and for that matter many OSs): - When single stepping a system-call, an extra instruction after the system call is executed This occures because the kernel/isa does not realise that the trapped system-call instruction should also be counted as asingle-steped instruction, and consequently resumes the process allowing a further instruction to be executed. The easiest way to fix this is to add code checking for single-step mode in the system-call return path. Something like the pseudo code: /* Single stepped the system-call, stop immediatly. */ if (frame->srr1 & PSL_SE) { frame->srr1 &= ~PSL_SE; trapsignal(p, SIGTRAP, EXC_TRC); } For the i386, that probably means modifying entry.S:system_call. Outch! - when single-stepping sigreturn, the single-step bit isn't propogated. The easiest way to do this is modify sys_sigreturn so that it either sets/propogates the single-step bit or updates single-step according to ptrace-sstep. Something like the psuedo code: sstep = (tf->srr1 & PSL_SE); *tf = sc.sc_frame; /* Propogate the single-step bit. */ tf->srr1 = (tf->srr1 & ~PSL_SE) | sstep; The i386 would probably need to add a check of ptrace-sstep to restore_sigcontext. I'd note that the function already contains: regs->eflags = (regs->eflags & ~0x40DD5) | (tmpflags & 0x40DD5); which I think should propogate the TF flag, but it doesn't. My reading of the ISA manual is that the TF bit gets cleared during the system-call trap? Can you please CC me in any up-stream discussion. Created attachment 101282 [details]
New GDB tests sigbpt.exp that illustrate the problems
The test case tries to single-step out of a signal handler back to the
instruction that caused the segv. Because, for sigreturn, two instructions are
executed (sigreturn and fault) the fault re-occures :-(
*** Bug 85327 has been marked as a duplicate of this bug. *** *** Bug 85328 has been marked as a duplicate of this bug. *** *** Bug 85326 has been marked as a duplicate of this bug. *** You said there are two problems and one of them is that single-stepping the syscall instruction in general executes the following instruction. I cannot reproduce this for an arbitrary system call. Unless you have an example where the system call is not sigreturn, then I don't see any reason to think that this independent problem exists at all. Please report exactly what evidence you saw on each architecture you tested. From the evidence I know of, the only issue of concern is what happens on single-stepping into the sigreturn/rt_sigreturn system call. On the x86 & x86-64, setting the TF bit indeed means to execute the following one instruction before trapping; I don't know other architectures but probably their single-step flags are similar. This means that what a single-stepped sigreturn wants to do is not modify the restored state at all, but in fact just restore the given state as the saved trap state for an immediate single-step stop. This may be easy to implement in the same way for all machines, i.e. the sigreturn syscall code just takes the SIGTRAP directly before returning to user mode at all. Though the plan is the same across machines, a change in a machine-dependent function using that machine's appropriate single-step flag check is required for each one. Created attachment 101317 [details]
upstream 2.6 patch for i386 sigreturn to trap on return when single-stepped into
I have tested this i386 patch with the sigbpt.c program and it avoids the
second SIGSEGV being taken on stepi through the signal return.
Created attachment 101319 [details]
Second example of instruction being skipped
As previous testcase and below example illustrate, "ret" is skipped.
Compile with: gcc -static -g -o tomago nothing.c
Red Hat Enterprise Linux AS release 3 (Taroon Update 1)
(gdb)
0x0804d870 in getpid ()
1: x/i $pc 0x804d870 <getpid>: mov $0x14,%eax
(gdb) disassemble
Dump of assembler code for function getpid:
0x0804d870 <getpid+0>: mov $0x14,%eax
0x0804d875 <getpid+5>: int $0x80
0x0804d877 <getpid+7>: ret
End of assembler dump.
(gdb) stepi
0x0804d875 in getpid ()
1: x/i $pc 0x804d875 <getpid+5>: int $0x80
(gdb)
0x0804820d in main () at nothing.c:7
7 kill (getpid (), 0);
1: x/i $pc 0x804820d <main+29>: add $0x4,%esp
(gdb)
Can you please CC in any upstream e-mail.
Previously you mentioned mostly the x86-64 case. This last comment shows the stepped-after-syscall problem on i386, where I see it in upstream 2.6 as well. I do not see any such problem on x86-64. As I asked before, please give info about all architectures you can test. Lets concentrate on i386 then. Once its all working for that architecture arguing similar changes in the others will hopefully be easier (I'll find out about the other architectures shortly). I have created bug 126699 for the i386 issue with single-stepping system calls generally. We still need to know what the issues are on other architectures, and if any are questionable then those should have their own bugs for each specific architecture's problems. A fix for this problem was committed to the RHEL3 U4 patch pool yesterday (in kernel version 2.4.21-20.8.EL). Each architecture requires a separate fix - the i386 fix being tracked by BZ 126699 (which this bug depends on). Until that is done, this bug isn't fixed. It's in NEEDINFO since that is as close as we can get to need-other-bugs-fixed. i386 is fixed. ppc64 is fixed x86-64 is fixed ia64 is not fixed s390 is not fixed What Andrew meant is that until they are all fixed, this bug cannot be closed. There was some discussion back and forth. The patch that Ernie refers to in Comment #21 took care of all x86_64 syscalls except sigreturn; thus single-stepping out of a signal handler was still problematic. I posted a patch to rhkernel-list that achieved the desired result, but there was some question as to whether it was the *correct* thing to do. Upon closer study, I determined that it was and posted a clarification to rhkernel-list, along with a reiteration of my patch. A fix for the final part of the problem has just been committed to the RHEL3 U4 patch pool this evening (in kernel version 2.4.21-20.12.EL). Hang on ... it looks like Jim incorrectly listed this bug as being fixed by his x86_64 patch. I think he should have listed bug 126911. So, I'm reverting this one back to ASSIGNED and will modify the other. Sorry about that. -ernie Okay, it looks like fixes for all archs have now been committed as of the U4 beta-candidate kernel (version 2.4.21-21.EL). We should double-check that sigreturn is working as expected on all architectures we care about. Right, and it's not. ia64 is still buggered. I am reopening this bug, since it's a catchall. It should be closed/modified only when all the areches are working. Please look at the list of bugs that this bug depends on, they shold be fixed before this can be closed. Elena, I *did* just what you said in comment #30. All 5 bugs are in MODIFIED state. Should the ia64 bug (126913) also be reverted to ASSIGNED state now? Sorry, bug 126913 depends on 126095 which was postponed. I added the dependency here explicitly, instead of relying on nested dependencies. User jparadis's account has been closed This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you. |