Bug 194570
Summary: | PPC rt_sigreturn does not restore context after interrupted syscall | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Neil Campbell <neilc> | ||||||||||||
Component: | kernel | Assignee: | Scott Moser <smoser> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 4.6 | CC: | bergner, jbaron, poelstra | ||||||||||||
Target Milestone: | --- | Keywords: | OtherQA | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | powerpc | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | RHBA-2007-0791 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2007-11-15 16:14:16 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 234547 | ||||||||||||||
Attachments: |
|
Description
Neil Campbell
2006-06-14 05:43:58 UTC
Created attachment 130802 [details]
Test case which shows the behaviour described.
The attached test case is for ppc32. I can provide one for ppc64 if required. For clarity, I'll describe what the test case is trying to do: It registers handlers for both SIGALRM and SIGSEGV. It then requests an alarm signal in 1 second's time and enters a read() which will block. In the handler for the signal, it changes the context so that all the registers from r13 to r31 contain 0xdeadbeef. It sets the ip on the context to 0, so that after the sigreturn a segfault will occur, and and the context which was restored after the alarm handler will be available in the segfault handler. Finally it loads some of the registers with dummy values and sigreturns. When the segfault handler prints out the register state, it shows that r13 was updated with the 0xdeadbeef value, and the ip was set to 0, but the other registers were not restored at all - they contain the dummy values loaded in the alarm handler. Here's the output when running the test case: Setting up an alarm signal to arrive in 1s. Starting a read syscall that will block and get interrupted. alarmHandler: setting regs 13-31 to 0xdeadbeef in the context alarmHandler: loading r13-31 with bogus values and jumping to 0 to force SIGSEGV. In segv handler, this is the context that received the SIGSEGV: IP: 0x0 r0: 0x0 r1: 0xffffe2a0 r2: 0xf7fd8480 r3: 0x4 r4: 0x10000798 r5: 0x52000422 r6: 0x4000 r7: 0x0 r8: 0xd032 r9: 0x20 r10: 0x1030 r11: 0x0 r12: 0x2 r13: 0xdeadbeef r14: 0x1414 r15: 0x1515 r16: 0x1616 r17: 0x1717 r18: 0x1818 r19: 0x1919 r20: 0x2020 r21: 0x2121 r22: 0x2222 r23: 0x2323 r24: 0x2424 r25: 0x2525 r26: 0x2626 r27: 0x2727 r28: 0x2828 r29: 0x2929 r30: 0x3030 r31: 0xffffe2a0 Note that some registers aren't restored at all during the sigreturn. If you pass -DRAISE to gcc when you build the test, it calls raise(SIGALRM) rather than receiving the signal during the syscall. In this case you can see that all the relevant registers are updated. This is the corresponding output from the test with the -DRAISE option: Calling raise(SIGALRM) alarmHandler: setting regs 13-31 to 0xdeadbeef in the context alarmHandler: loading r13-31 with bogus values and jumping to 0 to force SIGSEGV. In segv handler, this is the context that received the SIGSEGV: IP: 0x0 r0: 0xfa r1: 0xffffe670 r2: 0xf7fd8480 r3: 0x0 r4: 0x5a25 r5: 0xe r6: 0x4000 r7: 0xfe50c18 r8: 0xf932 r9: 0x0 r10: 0x0 r11: 0x0 r12: 0x0 r13: 0xdeadbeef r14: 0xdeadbeef r15: 0xdeadbeef r16: 0xdeadbeef r17: 0xdeadbeef r18: 0xdeadbeef r19: 0xdeadbeef r20: 0xdeadbeef r21: 0xdeadbeef r22: 0xdeadbeef r23: 0xdeadbeef r24: 0xdeadbeef r25: 0xdeadbeef r26: 0xdeadbeef r27: 0xdeadbeef r28: 0xdeadbeef r29: 0xdeadbeef r30: 0xdeadbeef r31: 0xdeadbeef I don't know whether this is relevant or not, but running the original test through strace causes all the registers to be restored as normal. IBM, I failed to mention this in my previous update. This issue came in after the 4.5 list was closed. It is unlikely that we will have enough progress on this within the 4.5 time frame. Is there a strong business reason,( be it a high customer impact, or the number of support calls this may generate etc.) you would want this to get fixed in 4.5? Regards, Archana This event sent from IssueTracker by araghavan issue 99307 ----- Additional Comments From sglass.com 2006-08-31 11:31 EDT ------- Archana, Here is the business case of why we need this fixed in RHEL 4.5: This fix is needed for our Transitive solution. The business case for Transitive is as follow: Transitive business case: - What is QuickTransit for LoP? - A Dynamic Binary Translator that generates code on the fly - based on translation of a Linux/x86 binary - Executes directly on a POWER platform running Linux - no recompilation or code changes required - Optimizes and caches frequently executed code - conceptually similar to a Just-in-Time (JIT) compiler for Java - Problem: - Breadth of pLinux application portfolio still lagging - Making progress on attacking major ISV applications - Takes time to fill in matrix, especially for the ancillary applications/utilities - Approach - QuickTransit -- Dynamic Binary Translation technology Linux/x86 to Linux/POWER translator product - Capable of running Linux/x86 applications transparently on the Linux/POWER platform - QuickTransit runs applications in a special environment mirrors the Linux/x86 environment of an actual x86 install ensures Linux/x86 applications able to access correct versions of libraries, other applications and other files new x86 software is installed into the environment without interfering with the Linux/POWER system - Results: - Considerable increase in applications availability to the Linux/POWER platfroms with minimal to no cost to the ISV's Please reconsider getting this into RHEL 4.5 Thanks This event sent from IssueTracker by araghavan issue 99307 Created attachment 144549 [details]
sysctl.txt
file attachment from LTC:sysctl.txt This event sent from IssueTracker by araghavan issue 99307 ----- Additional Comments From mjwolf.com 2006-12-29 10:46 EDT ------- the sysctl -a output This event sent from IssueTracker by araghavan issue 99307 file attachment from LTC:strace-25990.txt This event sent from IssueTracker by araghavan issue 99307 ----- Additional Comments From mjwolf.com 2006-12-29 11:36 EDT ------- strace output from program attaching the strace from the program. When running without strace the registers are wrong. When running with strace the registers are preserved This event sent from IssueTracker by araghavan issue 99307 Created attachment 144550 [details]
strace-25990.txt
Archana, is the kernel build you used to generate this trace available for me to download and test? Sorry, ignore that comment. Since I raised the bug, I'd forgotten that stracing the test caused it to change the behaviour. For what it's worth, this bug doesn't seem to occur on RHEL5 beta 2. Moving out to R4.6. ----- Additional Comments From mjwolf.com 2007-02-01 15:43 EDT ------- ok using git bisect I found the patch that fixes the problem. The sha1 id for the git trees is: 1bd79336a426c5e4f3bab142407059ceb12cadf9 The description of the patch is: " A careful reading of the recent changes to the system call entry/exit paths revealed several problems, plus some things that could be simplified and improved: * 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast exit path, so it was only doing anything with it once it saw some other bit being set. In other words, the noerror behaviour would apply to the next system call where we had to reschedule or deliver a signal, which is not necessarily the current system call. * 32-bit wasn't doing the call to ptrace_notify in the syscall exit path when the _TIF_SINGLESTEP bit was set. * _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and _TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only set by system calls. I took it out of _TIF_USER_WORK_MASK. * On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile registers to be restored (unless perhaps a signal was delivered or the syscall was traced or single-stepped). Thus the non-volatile registers weren't restored on exit from a signal handler. We probably got away with it mostly because signal handlers written in C wouldn't alter the non-volatile registers. * On 32-bit I simplified the code and made it more like 64-bit by making the syscall exit path jump to ret_from_except to handle preemption and signal delivery. * 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was set - but I think because of that 32-bit was actually restoring the non-volatile registers on exit from a signal handler. * I changed the order of enabling interrupts and saving the non-volatile registers before calling do_syscall_trace_leave; now we enable interrupts first. Signed-off-by: Paul Mackerras <paulus>" I'm working on back port the relevant parts of the patch This event sent from IssueTracker by araghavan issue 99307 IBM, This touches common code and to even be considered for 4.5, we need a patch immediately. This touches common code and must go into 4.5 beta if it is to make 4.5 ----- Additional Comments From mjwolf=40us.ibm.com 2007-02-08 14:35 EDT = ------- the patch is dependant on 4 or so other patches. I'm trying to see if just parts of one patch would be enough to fix the problem. To back port all wo= uld potentially make it large and invasive =20 This event sent from IssueTracker by araghavan issue 99307 No patch available, moving out to 4.6. ----- Additional Comments From mjwolf.com 2007-02-20 16:04 EDT ------- I believe the relevent portion of the patches needed will be powerpc only it may not touch other code. still working on getting the patch to work properly without backporting tons of code. I'm concerned that backporting too much will require LOTS of testing to assure that the signalling code is still working properly This event sent from IssueTracker by araghavan issue 99307 changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pmac.com ------- Additional Comments From mjwolf.com 2007-02-21 10:15 EDT ------- adding Paul Mackerras to the cc list. He is looking at this also and believes we will be able to come up with a patch that only touches the Power architecture code. This event sent from IssueTracker by araghavan issue 99307 File uploaded: sigfix.patch This event sent from IssueTracker by araghavan issue 99307 it_file 83635 file attachment from LTC:sigfix.patch This event sent from IssueTracker by araghavan issue 99307 ----- Additional Comments From pmac.com (prefers email at paulus.com) 2007-02-22 01:15 EDT ------- Patch to fix signal return to restore all registers This patch fixes the problem. I tested it with the test program compiled both in 32-bit and 64-bit mode (to run correctly in 64-bit mode, I had to modify the test program to avoid changing r13, since r13 is the thread-local storage pointer). I tested on a POWER5+ partition with RHEL4.4 installed. Previously, if a signal was delivered on return from a system call, the sigreturn code (invoked when the handler returns via the sys_sigreturn or sys_rt_sigreturn system call) would return the result value from the original system code, i.e. regs->result. This would be negative for an interrupted system call. Consequently the code starting at the "80:" label in arch/ppc64/kernel/entry.S would return to userspace via the system call return path rather than the general exception return path (i.e. ret_from_except). This meant two things; first, r3 (the return value) was set to the negative of regs->result, and the SO bit was set in CR0 to indicate an error. Secondly, the non-volatile registers were not restored (which is our bug). Now, setting r3 and CR0.SO again is mostly unnecessary, since r3 and CR0.SO were already set appropriately before the signal was delivered. The only time we really needed to set them again was if regs->result was changed since then. The only time that happens is in syscall_restart() (in arch/ppc64/kernel/signal.c) or do_signal32 (arch/ppc64/kernel/signal32.c). Thus, if we make sure that regs->gpr[3] is set appropriately whenever regs->result is changed to some negative value, we then don't need to reexecute the syscall return path when doing a [rt_]sigreturn. Note that CR0.SO is already correct (i.e. set) for those cases where regs->result is being changed from one negative value to another negative value (which is the only case we have). Thus, this patch makes sure that the places where a negative regs->result is set to -EINTR also set regs->gpr[3] to EINTR. Then, it makes the [rt_]sigreturn functions return 0 rather than regs->result, so that signal return will restore all registers in all circumstances. This event sent from IssueTracker by araghavan issue 99307 File uploaded: syscallInterrupt.c This event sent from IssueTracker by araghavan issue 99307 it_file 83698 file attachment from LTC:syscallInterrupt.c This event sent from IssueTracker by araghavan issue 99307 changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #23362|0 |1 is obsolete| | ------- Additional Comments From mjwolf.com 2007-02-22 11:47 EDT ------- modified test case attaching the modified test case that doesnt touch r13 This event sent from IssueTracker by araghavan issue 99307 changed: What |Removed |Added ---------------------------------------------------------------------------- Status|FIXEDAWAITINGTEST |TESTED ------- Additional Comments From mjwolf.com 2007-02-22 12:18 EDT ------- it has been tested on power5 systems and works. moving the bug to the submitted state This event sent from IssueTracker by araghavan issue 99307 Created attachment 148610 [details]
proposed patch
Created attachment 148611 [details]
modified test case
Given the scope of the change, the fact that IBM has tested and that a test case is available, I would be OK taking in this change after the 4.5 beta is released. Posted as a 4.6 bug. See http://post-office.corp.redhat.com/archives/rhkernel-list/2007-February/msg00422.html Changing the version field to 4.6.0 to reflect the fact that we are pursuing this as a 4.6 fix based on the following comment from Paul Mackerras: - The reason why this hasn't been seen before is that it is very unusual for a signal handler to modify the saved processor state (the state of the processor at the point where the signal was delivered---this state is stored on the stack). The Transitive pAVE application is the only program I know of that does this and expects it to work. More specifically, the ABI requires C functions to preserve the contents of non-volatile registers (r13 - r31). Thus these registers have the same state on exit from a signal handler function as on entry, which is the same state as at the point where the signal was delivered. Thus the fact that the kernel was not restoring the state of those registers on signal return doesn't matter in almost all circumstances (since those registers are unchanged anyway). It's only if the signal handler function modifies the state stored on the stack and expects those modifications to show up in the processor state after the signal return that you notice that the kernel didn't actually restore those registers from the state stored on the stack. committed in stream U6 build 55.13. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ A fix for this issue should have been included in the packages contained in the RHEL4.6 Beta released on RHN (also available at partners.redhat.com). Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. A fix for this issue should have been included in the packages contained in the RHEL4.6-Snapshot1 on partners.redhat.com. Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Confirmed as fixed in 4.6 beta, thanks! An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html |