Description of problem: On a GDB testcase was found i386 debugger running on x86_64 kernel accidentally causes ERESTARTSYS to be returned in errno in the process being debugged. It is not reproducible on an i386 debugger on an i386 kernel. It is not reproducible on an x86_64 debugger on an x86_64 kernel. Version-Release number of selected component (if applicable): kernel-2.6.23.15-137.fc8.x86_64 (F8) kernel-2.6.25-0.65.rc2.git7.fc9.x86_64 (Rawhide kernel on F8) How reproducible: Always, reliably. Steps to Reproduce: 1. wget http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/erestartsys.c?cvsroot=systemtap 2. gcc -o erestartsys erestartsys.c -Wall -ggdb2 -D_GNU_SOURCE -m32 -lutil;./erestartsys;echo $? Actual results: 1 Expected results: 0 Additional info: Correct (0): /* kernel-2.6.23.15-137.fc8.x86_64 -m64. */ /* kernel.org 2.6.22-rc4-git7 x86_64 -m64. */ /* kernel-2.6.23.15-137.fc8.i686 (-m32). */ Broken (1): /* kernel.org 2.6.22-rc4-git7 x86_64 on -m32. */ /* kernel-2.6.23.15-137.fc8.x86_64 -m32. */ Unsupported restarting (77): /* kernel.org 2.4.33 i686. */ /* kernel-2.6.18-53.el5.s390x -m64. */ The GDB case: cd /home/jkratoch/redhat/fedora/gdb/devel-m32/gdb-6.7.1/build-i386-redhat-linux-gnu/gdb/testsuite $ file ../gdb gdb.base/interrupt ../gdb: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped gdb.base/interrupt: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped $ ../gdb gdb.base/interrupt GNU gdb Red Hat Linux (6.7.1-15.fc8rh) Copyright (C) 2007 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"... Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run Starting program: /home/jkratoch/redhat/fedora/gdb/devel-m32/gdb-6.7.1/build-i386-redhat-linux-gnu/gdb/testsuite/gdb.base/interrupt Missing separate debuginfo for /lib/ld-linux.so.2 Try: yum --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/ac/2eeb206486bb7315d6ac4cd64de0cb50838ff6.debug Missing separate debuginfo for /lib/libm.so.6 Try: yum --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/92/8ab51a53627c59877a85dd9afecc1619ca866c.debug Missing separate debuginfo for /lib/libc.so.6 Try: yum --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/ba/4ea1118691c826426e9410cafb798f25cefad5.debug talk to me baby <--- Put CTRL-C from the console here. Program received signal SIGINT, Interrupt. 0xffffe410 in __kernel_vsyscall () (gdb) p func1 () $1 = 4 (gdb) cont Continuing. Unknown error 512 ^^^ The message `Unknown error 512' should have never been seen.
The setting of orig_eax to -1 in compat mode on x86_64 seems unnecessary; the program works the same with this patch applied: #ifdef __i386__ - user.orig_eax = -1L; + printf("%X\n", (unsigned int)user.orig_eax); +// user.orig_eax = -1L; #endif
ptrace (PTRACE_SETREGS) could be truncating the original internal 64-bit register values to 32 bits.
It is, and that was my first thought about orig_ax checks. But a hack for that did not solve the bug. I'm still investigating.
got it, fixing upstream
Just a reason why the Comment 1 "simplification" is not acceptable: (In reply to comment #1) > The setting of orig_eax to -1 in compat mode on x86_64 seems unnecessary; the > program works the same with this patch applied: "works" that the problem is reproducible. But it would not correctly PASS even after the kernel bug gets fixed. The testcase fails for native x86_64-on-x86_64 with the `orig_rax' reset removed. And also the code does not make much sense without the `orig_[re]ax' reset. Thanks as always, Roland.
(In reply to comment #5) > Just a reason why the Comment 1 "simplification" is not acceptable: > > (In reply to comment #1) > > The setting of orig_eax to -1 in compat mode on x86_64 seems unnecessary; the > > program works the same with this patch applied: > > "works" that the problem is reproducible. > But it would not correctly PASS even after the kernel bug gets fixed. > The testcase fails for native x86_64-on-x86_64 with the `orig_rax' reset > removed. And also the code does not make much sense without the `orig_[re]ax' > reset. Well of course... it was just a clue that something was so wrong that setting orig_eax to -1 had no effect one way or the other for 32-on-64 code. :)
Is it necessary to fix this very obscure bug in F8 or is just fixing rawhide good enough?
This bug has gone unnoticed at least since 2.6.9 and probably since the dawn of time. It is not urgent AFAIK.
Upstream fix from Roland: http://www.ussg.iu.edu/hypermail/linux/kernel/0802.3/2516.html
Building the patch in rawhide.
Verified as fixed for kernel-2.6.25.4-10.fc8.{x86_64,i686}.