Bug 832534

Summary: ptrace: call inferior segfaults
Product: [Fedora] Fedora Reporter: Nathan Sidwell <nathan>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: gansalmon, gbenson, itamar, jan.kratochvil, jonathan, kernel-maint, madhu.chinakonda, onestero, pmuldoon, sergiodj, tromey
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-3.4.2-4.fc17.i686 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-18 12:56:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer none

Description Nathan Sidwell 2012-06-15 16:50:59 UTC
Created attachment 592186 [details]
reproducer

Description of problem:
call inferior results in segfault.
This failure mode is somewhat annoying for programs that provide introspection functions that can be called from the debugger.  For instance gcc's 'debug_rtx' function.  FC16 was fine.

Version-Release number of selected component (if applicable):
GNU gdb (GDB) Fedora (7.4.50.20120120-42.fc17)

How reproducible:
reproducible with simple example.

Steps to Reproduce:
1. compile example program with 'gcc -g hello.c'
2. invoke gdb with 'gdb a.out'
3. start program, stopping at main with 'start'
4. call 'foo' function with 'call foo ()'
  
Actual results:
(gdb) call foo ()
b

Program received signal SIGSEGV, Segmentation fault.
0x08048413 in foo () at hello.c:6
6	}
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on".
Evaluation of the expression containing the function
(foo) will be abandoned.
When the function is done executing, GDB will silently stop.
(gdb) <NEXT PROMPT>

Expected results:
(gdb) call foo ()
b
(gdb) <NEXT PROMPT>


Additional info:
The location of the fault is the final 'ret' instruction.  The stack contents are:
(gdb) x/4x $sp
0xbffff0fc:	0xbffff100	0x00000001	0xbffff1c4	0xbffff1cc
which looks to me like the return address is 0xbffff100.  I guess if the stack's not executable, that'll result in a segfault.  I don't know i686 enough to know whether trying to jump to the stack reports the segfault at the source jumping instruction, or at the target location though. Also, 0x00000001 doesn't dissassemble to a brk instruction.

Historically I thought gdb implemented call-inferior by placing an internal breakpoint at _start and then initializing the stack such that the function return ends up at that breakpoint.  But perhaps that's (just?) changed.

Comment 1 Nathan Sidwell 2012-06-15 17:22:02 UTC
sigh, it's really annoying when you have a trap handler.  Continuing ends up that handler and your debugging is over :(

Comment 2 Nathan Sidwell 2012-06-15 18:16:34 UTC
a freshly built gdb from FSF HEAD exhibits the same problem :(

Comment 3 Nathan Sidwell 2012-06-15 18:27:03 UTC
yeah, definitely looking like a non-executable stack problem. Here's the maint info breakpoint snippet:
0       call dummy     del  y   0xbffff100  inf 1 thread 1
	stop only in thread 1

Comment 4 Jan Kratochvil 2012-06-16 20:39:30 UTC
Sorry but I do not have the problem reproducible.
Using fresh (with updates, not updates-testing) F17 i386 in KVM.

kernel-PAE-3.4.0-1.fc17.i686
glibc-2.15-37.fc17.i686
gdb-7.4.50.20120120-42.fc17.i686
[jkratoch@f17-i386 ~]$ getenforce 
Enforcing
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
(gdb) call foo()
b
(gdb) _


(In reply to comment #0)
> Historically I thought gdb implemented call-inferior by placing an internal
> breakpoint at _start and then initializing the stack such that the function
> return ends up at that breakpoint.  But perhaps that's (just?) changed.

Yes, it has changed in Fedora
  http://pkgs.fedoraproject.org/gitweb/?p=gdb.git;a=blob_plain;f=gdb-x86-onstack-2of2.patch;hb=f17
  +  set_gdbarch_call_dummy_location (gdbarch, ON_STACK);

and it is going to change in FSF GDB HEAD:
  [patch 3/3] Use ON_STACK for i386/amd64 (gdb2495.exp regression)
  http://sourceware.org/ml/gdb-patches/2012-06/msg00419.html

(In reply to comment #2)
> a freshly built gdb from FSF HEAD exhibits the same problem :(

FSF HEAD still does not use ON_STACK so it should behave very differently.

(In reply to comment #3)
> yeah, definitely looking like a non-executable stack problem. Here's the
> maint info breakpoint snippet:
> 0       call dummy     del  y   0xbffff100  inf 1 thread 1
> 	stop only in thread 1

I was analysing it before and I have not found either non-executable stack or SELinux to be a problem for ON_STACK:
http://sourceware.org/ml/gdb-patches/2011-12/msg00902.html


I am interested in this issue as it is going to be upstreamed for FSF GDB HEAD soon for gdb-7.5.  But I really do not have the problem reproducible.

Comment 5 Nathan Sidwell 2012-06-18 06:25:10 UTC
Thanks for looking at this.

Hm, I suspect I must have muddled paths between installed gdb and freshly build FSF HEAD gdb.  retrying, I find an FSF gdb does NOT exhibit this problem, and indeed places the dummy breakpoint at _start:
0       call dummy     del  y   0x08048300 <_start> inf 1 thread 1

What would you like me to check further to help you reproduce it?  It's an XFCE spin and I've installed the usual developer tools on it.  There's a vmware VM installed for running windows (the vm itself wasn't running).

Any kernel config data that would be useful?

I did confirm that 'execstack -s' or linking with '-z execstack' resolved the problem for me.

Comment 6 Jan Kratochvil 2012-06-18 06:39:17 UTC
(In reply to comment #5)
> There's a
> vmware VM installed for running windows (the vm itself wasn't running).
> 
> Any kernel config data that would be useful?

What exact kernel version is it?  As it is not in VM what exact CPU is it incl. its "microcode" level, from /proc/cpuinfo?

Thanks.

Comment 7 Nathan Sidwell 2012-06-18 09:46:46 UTC
uname -a reports:
Linux cartagia 3.4.0-1.fc17.i686 #1 SMP Sun Jun 3 07:16:04 UTC 2012 i686 i686 i386 GNU/Linux

/proc/cpuinfo:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     U9600  @ 1.60GHz
stepping	: 10
microcode	: 0xa0b
cpu MHz		: 800.000
cache size	: 3072 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr 
sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dts tpr_shadow vnmi flexpriority
bogomips	: 3191.90
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     U9600  @ 1.60GHz
stepping	: 10
microcode	: 0xa0b
cpu MHz		: 800.000
cache size	: 3072 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr 
sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dts tpr_shadow vnmi flexpriority
bogomips	: 3191.90
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Comment 8 Jan Kratochvil 2012-06-18 11:21:57 UTC
(In reply to comment #0)
> The location of the fault is the final 'ret' instruction.  The stack
> contents are:
> (gdb) x/4x $sp
> 0xbffff0fc:	0xbffff100	0x00000001	0xbffff1c4	0xbffff1cc
> which looks to me like the return address is 0xbffff100.  I guess if the
> stack's not executable, that'll result in a segfault.  I don't know i686
> enough to know whether trying to jump to the stack reports the segfault at
> the source jumping instruction, or at the target location though.

At the target location.


> Also,
> 0x00000001 doesn't dissassemble to a brk instruction.

GDB displays memory without the placed breakpoints, there can be anything.

Comment 9 Jan Kratochvil 2012-06-18 12:56:21 UTC
New kernel ptrace testcase:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/ret-to-nxpage.c?cvsroot=systemtap

pass kernel-3.4.0-1.fc17.x86_64
pass kernel-3.4.0-1.fc17.x86_64 -m32
FAIL kernel-3.4.0-1.fc17.i686
pass kernel-PAE-3.4.0-1.fc17.i686
pass kernel-3.4.2-4.fc17.i686
pass kernel-PAE-3.4.2-4.fc17.i686
pass kernel-3.5.0-0.rc2.git0.3.fc18.i686
pass kernel-PAE-3.5.0-0.rc2.git0.3.fc18.i686
pass kernel-2.6.32-220.el6.i686
(all in KVM with host kernel-3.3.5-2.fc16.x86_64 and i7-920 CPU)

I did not bisect some specific kernel upstream change but it seems to be fixed already.  Does it ring a bell for you, Oleg?

With that kernel it traps really at that "ret" instruction.  Unaware how kernel manages CPU to do it.

Comment 10 Nathan Sidwell 2012-06-19 06:55:56 UTC
I saw there was a kernel update, so I grabbed:
Linux cartagia 3.4.2-4.fc17.i686 #1 SMP Thu Jun 14 22:19:00 UTC 2012 i686 i686 i386 GNU/Linux

this has resolved the problem.

Thanks for your help!