Bug 194570

Summary: PPC rt_sigreturn does not restore context after interrupted syscall
Product: Red Hat Enterprise Linux 4 Reporter: Neil Campbell <neilc>
Component: kernelAssignee: Scott Moser <smoser>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: bergner, jbaron, poelstra
Target Milestone: ---Keywords: OtherQA
Target Release: ---   
Hardware: powerpc   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0791 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-15 16:14:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 234547    
Attachments:
Description Flags
Test case which shows the behaviour described.
none
sysctl.txt
none
strace-25990.txt
none
proposed patch
none
modified test case none

Description Neil Campbell 2006-06-14 05:43:58 UTC
Description of problem:

After a syscall is interrupted by a signal, the context passed to the signal
handler is not restored when the signal handler returns (ie when rt_sigreturn is
called).

Version-Release number of selected component (if applicable):

RHEL4 U3, ppc/ppc64.


How reproducible:

Always.

Steps to Reproduce:

1. Register a signal handler for SIGALRM.
2. Request an alarm signal with alarm(1).
3. Enter a syscall which will block, eg a read of stdin.
4. In the alarm signal handler, modify all the user registers in the ucontext
which is passed as the third argument.
5. Return from the signal handler.
  
Actual results:

Only certain registers are restored from the ucontext during the sigreturn.  The
IP, and registers 1, 2 and 13 are restored, the others are not.

Expected results:

All changes to the context are reflected after the sigreturn.

Additional info:
I ported the test case to x86, and it appears that all registers may be restored
in the same situation on that platform.

(I'm reraising this after the problem with the bugzilla servers.  It was
formerly bug 195013.)

Comment 1 Neil Campbell 2006-06-14 05:43:59 UTC
Created attachment 130802 [details]
Test case which shows the behaviour described.

Comment 2 Neil Campbell 2006-06-14 05:45:45 UTC
The attached test case is for ppc32.  I can provide one for ppc64 if required. 
For clarity, I'll describe what the test case is trying to do:

It registers handlers for both SIGALRM and SIGSEGV.  It then requests an alarm
signal in 1 second's time and enters a read() which will block.  In the handler
for the signal, it changes the context so that all the registers from r13 to r31 
contain 0xdeadbeef.  It sets the ip on the context to 0, so that after the
sigreturn a segfault will occur, and and the context which was restored after
the alarm handler will be available in the segfault handler.  Finally it loads
some of the registers with dummy values and sigreturns.

When the segfault handler prints out the register state, it shows that r13 was 
updated with the 0xdeadbeef value, and the ip was set to 0, but the other 
registers were not restored at all - they contain the dummy values loaded in 
the alarm handler.

Here's the output when running the test case:

Setting up an alarm signal to arrive in 1s.
Starting a read syscall that will block and get interrupted.
alarmHandler: setting regs 13-31 to 0xdeadbeef in the context
alarmHandler: loading r13-31 with bogus values and jumping to 0 to force SIGSEGV.
In segv handler, this is the context that received the SIGSEGV:
IP: 0x0
r0: 0x0
r1: 0xffffe2a0
r2: 0xf7fd8480
r3: 0x4
r4: 0x10000798
r5: 0x52000422
r6: 0x4000
r7: 0x0
r8: 0xd032
r9: 0x20
r10: 0x1030
r11: 0x0
r12: 0x2
r13: 0xdeadbeef
r14: 0x1414
r15: 0x1515
r16: 0x1616
r17: 0x1717
r18: 0x1818
r19: 0x1919
r20: 0x2020
r21: 0x2121
r22: 0x2222
r23: 0x2323
r24: 0x2424
r25: 0x2525
r26: 0x2626
r27: 0x2727
r28: 0x2828
r29: 0x2929
r30: 0x3030
r31: 0xffffe2a0

Note that some registers aren't restored at all during the sigreturn.

If you pass -DRAISE to gcc when you build the test, it calls raise(SIGALRM)
rather than receiving the signal during the syscall.  In this case you can see
that all the relevant registers are updated.  This is the corresponding output
from the test with the -DRAISE option:

Calling raise(SIGALRM)
alarmHandler: setting regs 13-31 to 0xdeadbeef in the context
alarmHandler: loading r13-31 with bogus values and jumping to 0 to force SIGSEGV.
In segv handler, this is the context that received the SIGSEGV:
IP: 0x0
r0: 0xfa
r1: 0xffffe670
r2: 0xf7fd8480
r3: 0x0
r4: 0x5a25
r5: 0xe
r6: 0x4000
r7: 0xfe50c18
r8: 0xf932
r9: 0x0
r10: 0x0
r11: 0x0
r12: 0x0
r13: 0xdeadbeef
r14: 0xdeadbeef
r15: 0xdeadbeef
r16: 0xdeadbeef
r17: 0xdeadbeef
r18: 0xdeadbeef
r19: 0xdeadbeef
r20: 0xdeadbeef
r21: 0xdeadbeef
r22: 0xdeadbeef
r23: 0xdeadbeef
r24: 0xdeadbeef
r25: 0xdeadbeef
r26: 0xdeadbeef
r27: 0xdeadbeef
r28: 0xdeadbeef
r29: 0xdeadbeef
r30: 0xdeadbeef
r31: 0xdeadbeef



Comment 3 Neil Campbell 2006-06-14 05:46:22 UTC
I don't know whether this is relevant or not, but running the original test
through strace causes all the registers to be restored as normal.

Comment 4 Issue Tracker 2006-08-30 20:24:01 UTC
IBM,

I failed to mention this in my previous update. This issue came in after
the 4.5 list was closed. It is unlikely that we will have enough progress
on this within the 4.5 time frame. Is there a strong business reason,( be
it a high customer impact, or the number of support calls this may generate
etc.) you would want this to get fixed in 4.5?

Regards,
Archana


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 5 Issue Tracker 2006-08-31 16:03:54 UTC
----- Additional Comments From sglass.com  2006-08-31 11:31 EDT
-------
Archana,
Here is the business case of why we need this fixed in RHEL 4.5:

This fix is needed for our Transitive solution.  The business case for 
Transitive is as follow:

Transitive business case:

 - What is QuickTransit for LoP?
   - A Dynamic Binary Translator that generates code on the fly - based on 
   
translation of a Linux/x86 binary
   - Executes directly on a POWER platform running Linux - no recompilation
or 
code changes required
   - Optimizes and caches frequently executed code - conceptually similar
to a 
Just-in-Time (JIT) compiler for Java

- Problem:
  - Breadth of pLinux application portfolio still lagging
  - Making progress on attacking major ISV applications
  - Takes time to fill in matrix, especially for the ancillary 
applications/utilities

- Approach
  - QuickTransit  -- Dynamic Binary Translation technology Linux/x86 to 
Linux/POWER translator product
  - Capable of running Linux/x86 applications transparently on the
Linux/POWER 
platform
  - QuickTransit runs applications in a special environment mirrors the 
Linux/x86 environment of an actual x86 install ensures Linux/x86
applications 
able to access correct versions of libraries, other applications and other

files new x86 software is installed into the environment without
interfering 
with the Linux/POWER system

- Results:
  - Considerable increase in applications availability to the Linux/POWER

platfroms with minimal to no cost  to the ISV's

Please reconsider getting this into RHEL 4.5  Thanks 


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 8 Archana K. Raghavan 2006-12-29 15:54:35 UTC
Created attachment 144549 [details]
sysctl.txt

Comment 9 Issue Tracker 2006-12-29 16:44:14 UTC
file attachment from LTC:sysctl.txt


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 10 Issue Tracker 2006-12-29 16:44:18 UTC
----- Additional Comments From mjwolf.com  2006-12-29 10:46 EDT
-------
 
the sysctl -a output 


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 11 Issue Tracker 2006-12-29 16:44:21 UTC
file attachment from LTC:strace-25990.txt


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 12 Issue Tracker 2006-12-29 16:44:25 UTC
----- Additional Comments From mjwolf.com  2006-12-29 11:36 EDT
-------
 
strace output from program

attaching the strace from the program.	When running without strace the
registers are wrong.  When running with strace the registers are preserved



This event sent from IssueTracker by araghavan 
 issue 99307

Comment 13 Archana K. Raghavan 2006-12-29 16:45:37 UTC
Created attachment 144550 [details]
strace-25990.txt

Comment 14 Neil Campbell 2007-01-03 18:31:20 UTC
Archana, is the kernel build you used to generate this trace available for me to
download and test?

Comment 15 Neil Campbell 2007-01-03 18:41:03 UTC
Sorry, ignore that comment.  Since I raised the bug, I'd forgotten that stracing
the test caused it to change the behaviour.

For what it's worth, this bug doesn't seem to occur on RHEL5 beta 2.

Comment 16 Peter Martuccelli 2007-01-30 18:42:24 UTC
Moving out to R4.6.

Comment 17 Issue Tracker 2007-02-01 21:00:15 UTC
----- Additional Comments From mjwolf.com  2007-02-01 15:43 EDT
-------
ok using git bisect I found the patch that fixes the problem.  The sha1 id
for
the git trees is:
1bd79336a426c5e4f3bab142407059ceb12cadf9

The description of the patch is:
"   
    A careful reading of the recent changes to the system call entry/exit
    paths revealed several problems, plus some things that could be
    simplified and improved:

    * 32-bit wasn't testing the _TIF_NOERROR bit in the syscall fast
exit
      path, so it was only doing anything with it once it saw some other
      bit being set.  In other words, the noerror behaviour would apply
to
      the next system call where we had to reschedule or deliver a
signal,
      which is not necessarily the current system call.

    * 32-bit wasn't doing the call to ptrace_notify in the syscall exit
      path when the _TIF_SINGLESTEP bit was set.

    * _TIF_RESTOREALL was in both _TIF_USER_WORK_MASK and
      _TIF_PERSYSCALL_MASK, which is odd since _TIF_RESTOREALL is only
set
      by system calls.  I took it out of _TIF_USER_WORK_MASK.

    * On 64-bit, _TIF_RESTOREALL wasn't causing the non-volatile
registers
      to be restored (unless perhaps a signal was delivered or the
syscall
      was traced or single-stepped).  Thus the non-volatile registers
      weren't restored on exit from a signal handler.  We probably got
      away with it mostly because signal handlers written in C wouldn't
      alter the non-volatile registers.

    * On 32-bit I simplified the code and made it more like 64-bit by
      making the syscall exit path jump to ret_from_except to handle
      preemption and signal delivery.

    * 32-bit was calling do_signal unnecessarily when _TIF_RESTOREALL was
      set - but I think because of that 32-bit was actually restoring the
      non-volatile registers on exit from a signal handler.

    * I changed the order of enabling interrupts and saving the
      non-volatile registers before calling do_syscall_trace_leave; now
we
      enable interrupts first.

    Signed-off-by: Paul Mackerras <paulus>"


I'm working on back port the relevant parts of the patch 


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 19 John Jarvis 2007-02-08 20:17:29 UTC
IBM, This touches common code and to even be considered for 4.5, we need a patch
immediately.  This touches common code and must go into 4.5 beta if it is to
make 4.5

Comment 20 Issue Tracker 2007-02-08 22:51:07 UTC
----- Additional Comments From mjwolf=40us.ibm.com  2007-02-08 14:35 EDT
=
-------
the patch is dependant on 4 or so other patches.  I'm trying to see if
just
parts of one patch would be enough to fix the problem.  To back port all
wo=
uld
potentially make it large and invasive
=20 


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 21 Peter Martuccelli 2007-02-20 15:24:03 UTC
No patch available, moving out to 4.6.

Comment 22 Issue Tracker 2007-02-20 22:06:17 UTC
----- Additional Comments From mjwolf.com  2007-02-20 16:04 EDT
-------
I believe the relevent portion of the patches needed will be powerpc only
it may
not touch other code.  still working on getting the patch to work
properly
without backporting tons of code.  I'm concerned that backporting too much
will
require LOTS of testing to assure that the signalling code is still working
properly 


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 24 Issue Tracker 2007-02-21 15:22:17 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pmac.com




------- Additional Comments From mjwolf.com  2007-02-21 10:15 EDT
-------
adding Paul Mackerras to the cc list.  He is looking at this also and
believes
we will be able to come up with a patch that only touches the Power
architecture
code. 


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 25 Issue Tracker 2007-02-22 19:14:05 UTC
File uploaded: sigfix.patch

This event sent from IssueTracker by araghavan 
 issue 99307
it_file 83635

Comment 26 Issue Tracker 2007-02-22 19:14:10 UTC
file attachment from LTC:sigfix.patch


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 27 Issue Tracker 2007-02-22 19:14:13 UTC
----- Additional Comments From pmac.com (prefers email at
paulus.com)  2007-02-22 01:15 EDT -------
 
Patch to fix signal return to restore all registers

This patch fixes the problem.  I tested it with the test program compiled
both
in 32-bit and 64-bit mode (to run correctly in 64-bit mode, I had to modify
the
test program to avoid changing r13, since r13 is the thread-local storage
pointer).  I tested on a POWER5+ partition with RHEL4.4 installed.

Previously, if a signal was delivered on return from a system call, the
sigreturn code (invoked when the handler returns via the sys_sigreturn or
sys_rt_sigreturn system call) would return the result value from the
original
system code, i.e. regs->result.  This would be negative for an
interrupted
system call.  Consequently the code starting at the "80:" label in
arch/ppc64/kernel/entry.S would return to userspace via the system call
return
path rather than the general exception return path (i.e. ret_from_except).

This meant two things; first, r3 (the return value) was set to the negative
of
regs->result, and the SO bit was set in CR0 to indicate an error. 
Secondly,
the non-volatile registers were not restored (which is our bug).

Now, setting r3 and CR0.SO again is mostly unnecessary, since r3 and
CR0.SO
were already set appropriately before the signal was delivered.  The only
time
we really needed to set them again was if regs->result was changed since
then. 
The only time that happens is in syscall_restart() (in
arch/ppc64/kernel/signal.c) or do_signal32
(arch/ppc64/kernel/signal32.c).

Thus, if we make sure that regs->gpr[3] is set appropriately whenever
regs->result is changed to some negative value, we then don't need to
reexecute
the syscall return path when doing a [rt_]sigreturn.  Note that CR0.SO is
already correct (i.e. set) for those cases where regs->result is being
changed
from one negative value to another negative value (which is the only case
we
have).

Thus, this patch makes sure that the places where a negative regs->result
is
set to -EINTR also set regs->gpr[3] to EINTR.  Then, it makes the
[rt_]sigreturn functions return 0 rather than regs->result, so that
signal
return will restore all registers in all circumstances. 


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 28 Issue Tracker 2007-02-22 19:14:19 UTC
File uploaded: syscallInterrupt.c

This event sent from IssueTracker by araghavan 
 issue 99307
it_file 83698

Comment 29 Issue Tracker 2007-02-22 19:14:25 UTC
file attachment from LTC:syscallInterrupt.c


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 30 Issue Tracker 2007-02-22 19:14:29 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #23362|0                           |1
        is obsolete|                            |




------- Additional Comments From mjwolf.com  2007-02-22 11:47 EDT
-------
 
modified test case

attaching the modified test case that doesnt touch r13 


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 31 Issue Tracker 2007-02-22 19:14:34 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|FIXEDAWAITINGTEST           |TESTED




------- Additional Comments From mjwolf.com  2007-02-22 12:18 EDT
-------
it has been tested on power5 systems and works.  moving the bug to the
submitted
state 


This event sent from IssueTracker by araghavan 
 issue 99307

Comment 33 Archana K. Raghavan 2007-02-22 19:16:11 UTC
Created attachment 148610 [details]
proposed patch

Comment 34 Archana K. Raghavan 2007-02-22 19:17:00 UTC
Created attachment 148611 [details]
modified test case

Comment 36 Jay Turner 2007-02-23 12:21:54 UTC
Given the scope of the change, the fact that IBM has tested and that a test case
is available, I would be OK taking in this change after the 4.5 beta is released.

Comment 37 Janice Girouard - IBM on-site partner 2007-02-23 22:49:46 UTC
Posted as a 4.6 bug.  See
http://post-office.corp.redhat.com/archives/rhkernel-list/2007-February/msg00422.html

Comment 38 Janice Girouard - IBM on-site partner 2007-02-23 22:52:50 UTC
Changing the version field to 4.6.0 to reflect the fact that we are pursuing
this as a 4.6 fix based on the following comment from Paul Mackerras:

-
The reason why this hasn't been seen before is that it is very unusual for a
signal handler to modify the saved processor state (the state of the processor
at the point where the signal was delivered---this state is stored on the
stack).  The Transitive pAVE application is the only program I know of that does
this and expects it to work.

More specifically, the ABI requires C functions to preserve the contents of
non-volatile registers (r13 - r31).  Thus these registers have the same state on
exit from a signal handler function as on entry, which is the same state as at
the point where the signal was delivered.  Thus the fact that the kernel was not
restoring the state of those registers on signal return doesn't matter in almost
all circumstances (since those registers are unchanged anyway).  It's only if
the signal handler function modifies the state stored on the stack and expects
those modifications to show up in the processor state after the signal return
that you notice that the kernel didn't actually restore those registers from the
state stored on the stack.

Comment 41 Jason Baron 2007-06-27 14:45:00 UTC
committed in stream U6 build 55.13. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 43 John Poelstra 2007-08-29 16:31:46 UTC
A fix for this issue should have been included in the packages contained in the
RHEL4.6 Beta released on RHN (also available at partners.redhat.com).  

Requested action: Please verify that your issue is fixed to ensure that it is
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 44 John Poelstra 2007-09-05 22:24:39 UTC
A fix for this issue should have been included in the packages contained in 
the RHEL4.6-Snapshot1 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed to ensure that it is 
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed, 
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent 
symptoms of the problem you are having and change the status of the bug to 
FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test 
results to Issue Tracker.  If you need assistance accessing 
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 45 Neil Campbell 2007-09-10 16:14:38 UTC
Confirmed as fixed in 4.6 beta, thanks!

Comment 47 errata-xmlrpc 2007-11-15 16:14:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html