126908 – single-stepping system call executes two instructions on powerpc

Bug 126908 - single-stepping system call executes two instructions on powerpc

Summary: single-stepping system call executes two instructions on powerpc

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	powerpc
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	David Woodhouse
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2004-06-28 22:40 UTC by Andrew Cagney
Modified:	2007-11-30 22:07 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-20 20:55:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch using same method as i386/x86_64 (1.73 KB, text/plain) 2004-09-20 20:57 UTC, David Woodhouse	no flags	Details
Updated patch (2.12 KB, patch) 2004-09-20 22:41 UTC, David Woodhouse	no flags	Details \| Diff
Correct patch. (2.75 KB, patch) 2004-09-21 00:54 UTC, David Woodhouse	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:550	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4	2004-12-20 05:00:00 UTC

Description Andrew Cagney 2004-06-28 22:40:39 UTC

Description of problem:

See PR 117972.

Version-Release number of selected component (if applicable):


How reproducible:

Always.

Steps to Reproduce:

$ cag > nothing.c
#include <signal.h>

main ()
{
  while (1) 
    {
      kill (getpid (), 0);
    }
}

$ cc -g -o to-rhaps7 nothing.c
$ gdb ./to-rhaps7
(gdb) run
Starting program: /home/cagney/tmp/to-rhaps 

Program received signal SIGINT, Interrupt.
0x0ff49144 in getpid () from /lib/tls/libc.so.6
(gdb) disassemble 
Dump of assembler code for function getpid:
0x0ff4913c <getpid+0>:  li      r0,20
0x0ff49140 <getpid+4>:  sc
0x0ff49144 <getpid+8>:  blr
End of assembler dump.
(gdb) break 0x0ff49140
Function "0x0ff49140" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) break *0x0ff49140
Breakpoint 1 at 0xff49140
(gdb) c
Continuing.

Breakpoint 1, 0x0ff49140 in getpid () from /lib/tls/libc.so.6
(gdb) display/i $pc
1: x/i $pc  0xff49140 <getpid+4>:       sc
(gdb) del 1
(gdb) disassemble 
Dump of assembler code for function getpid:
0x0ff4913c <getpid+0>:  li      r0,20
0x0ff49140 <getpid+4>:  sc
0x0ff49144 <getpid+8>:  blr
End of assembler dump.
(gdb) stepi
0x1000046c in main () at nothing.c:7
7             kill (getpid (), 0);
1: x/i $pc  0x1000046c <main+28>:       mr      r0,r3

Notice how the STEPI executed both:
0x0ff49140 <getpid+4>:  sc
0x0ff49144 <getpid+8>:  blr

Comment 9 Roland McGrath 2004-09-20 18:40:58 UTC

I don't know PPC in detail, so we may need to consult on whether this
approach makes sense there.  The issue on x86/x86-64 is that the
hardware single-step flag set in the processor flags when returning
from a system call means to execute one user instruction before stopping,
so the instruction immediately after the system call entry instruction
doesn't get traced by single-step.  The approach to fix that is a
software bit PT_SINGLESTEP that's set by PTRACE_SINGLESTEP and that
system call return notices to mean it should simulate a single-step
trap with the PC of the first user instruction to be run after the
syscall.
If the meaning of the PPC's MSR_SE bit is the same as x86's TF, then
copying that method should be fine.

The patch looks incomplete, because it doesn't change
syscall_enter_leave to actually do the tracing in the PT_SINGLESTEP case.

Comment 11 David Woodhouse 2004-09-20 20:57:55 UTC

Created attachment 104031 [details]
Patch using same method as i386/x86_64

This attempts to fix the problem in the same way we fix it for i386 and x86_64,
in bug #126699.

Comment 12 Roland McGrath 2004-09-20 21:07:54 UTC

That patch looks to me like it will work, not knowing PPC myself.
For the x86 changes, I felt it appropriate to get the change of
behavior incorporated in 2.6 upstream before we committed to changing
the RHEL3 behavior.

Comment 14 David Woodhouse 2004-09-20 22:41:20 UTC

Created attachment 104042 [details]
Updated patch

This patch has more chance of working -- the previous one had the set/clear of
PT_SINGLESTEP in set_single_step() and clear_single_step() the wrong way round.


But it doesn't actually seem to work. I'm not entirely sure why. More
investigation required.

Comment 15 Ernie Petrides 2004-09-20 23:23:18 UTC

Reassigning to DavidW and reverting to ASSIGNED state.  (David,
I change bugs to MODIFIED state when the associated patches are
actually committed to CVS.)

Comment 16 David Woodhouse 2004-09-21 00:54:01 UTC

Created attachment 104050 [details]
Correct patch.

The previous patch works only in 64-bit mode. It'll work a little better for
32-bit gdb if I put the same changes into ptrace32.c as I have in ptrace.c.

This doesn't look like it should be an issue for x86_64 ptrace32.c, because
that one just calls through to the 64-bit functions.

Comment 17 Andrew Cagney 2004-09-23 20:24:08 UTC

Using p630.lab.boston.redhat.com and the sources in
/tmp/cagney/gdb+dejagnu-20040607/ configured in /tmp/cagney/native using:

$ cd /tmp/cagney/native/
$ CC='gcc -m64' /tmp/cagney/gdb+dejagnu-20040607/configure
$ make
$ file gdb/gdb
<something about 64-bit elf>

(which gives a 64-bit GDB), and tested using:

$ cd /tmp/cagney/native/gdb/testsuite
$ make check RUNTESTFLAGS='--target_board=unix/-m32\ unix/-m64
sigstep.exp'
$ less gdb.log

(assuming no typos) I'm seeing that:

- 32-bit stepping of sigreturn works
For GDB, this is the critical system call that must not double-step. 
This can be seen with the sigstep.exp test where it stepi's the "sc"
instruction.

- 64-bit stepping of sigreturn works when tested by hand
GDB is scrambling its backtrace when single-stepping through an
epolog, but that is a separate GDB problem.

- 32-bit and 64-bit stepi when a pending signal doesn't work
It would appear to free run.  This is a related and known problem, see
130995.

So i think this bug is fixed.

Comment 18 Ernie Petrides 2004-09-24 09:36:16 UTC

A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.11.EL).

Comment 19 John Flanagan 2004-12-20 20:55:23 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html

Note You need to log in before you can comment on or make changes to this bug.