From the glibc-2.28-9.fc29.i686 build log: =====FAIL: stdlib/tst-setcontext9.out===== making contexts swap contexts start f1 swap contexts in f2 set context in f1 set context Didn't expect signal from child: got `Segmentation fault' This was observed with kernel 4.17.3-200.fc28.x86_64. I've seen it on rawhide and on other architectures as well (but not x86_64).
The value of %ebx stored on the stack, used after resuming f2 and calling puts ("end f2") is corrupted. The corruption appears to happen here: Hardware watchpoint 6: *(int *)0xffffd44c Old value = 1448452096 New value = 1448438070 0xf7e613ee in swapcontext () from /lib/libc.so.6 #0 0xf7e613ee in swapcontext () from /lib/libc.so.6 #1 0x56556936 in f1 () at tst-setcontext9.c:47 #2 0xf7e6134f in makecontext () from /lib/libc.so.6 #3 0x5655a100 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) info symb 0x5655a100 ctx in section .bss of …/stdlib/tst-setcontext9 It seems that the allocated stack is reused. I don't know why this happens.
This is specific to the toolchain used here. It doesn't reproduce for i686 using the F28 toolchain. I'm going to dig a bit more here.
(In reply to Carlos O'Donell from comment #2) > This is specific to the toolchain used here. It doesn't reproduce for i686 > using the F28 toolchain. I'm going to dig a bit more here. Did you use the RPM build flags? I think it needs PIE to trigger on i686. The bug will likely reproduce on other architectures which spill the GOT pointer, too.
Fails in an i686 F29 mock chroot, running on F28. So it does indeed seem to be a toolchain issue. Looking at the differences now.
(In reply to Florian Weimer from comment #1) > The value of %ebx stored on the stack, used after resuming f2 and calling > puts ("end f2") is corrupted. I can confirm that using the proper GOT value as computed from the start of the function we can find the .rodata pointer and fined the string constant 'end f2' correctly: (gdb) p/x $edx $29 = 0x804c000 ^^^^^^^^^ $edx is PC, plus offset == .got.plt (gdb) p/x $edx - 0x1fcc $27 = 0x804a034 ^^^^^^^^^ got relative address of rodata constant. (gdb) x/7c 0x804a034 0x804a034: 101 'e' 110 'n' 100 'd' 32 ' ' 102 'f' 50 '2' 0 '\000' However, at the end of f2 we have: 0x0804947a <+122>: mov 0x18(%esp),%esi (gdb) p/x $esi $33 = 0x8049548 Should have been 0x804c000 as above. As you note it looks like the stack is being reused and overwritten. In a "working" version of the test program the compiler picks $ebx, and because this is callee-saves it doesn't have to save and restore this to the stack. Quick review of f2 stack manipulations (bytes): - call __x86.get_pc_thunk.dx (store pc in edx) - push esi (Down 0x4, total 0x4) [save esi] - push ebx (Down 0x4, total 0x8) [save ebx] - sub 32 (Down 0x20, total 0x28) - compute .got.plt address and store to esp+0x18 esp+24 = saved esi esp+20 = saved ebx esp+1c to esp+1f = empty (4 bytes) esp+18 = saved .got.plt address ... esp+0 = nothing yet - .got.plt address located at esp+0x18 - push eax (Down 0x4, total 0x2c) - pop edx (Up 0x4, total 0x28) - pop ecx (Up 0x4, total 0x24) - $ebx / GOT located at $esp+0x14 - push eax (Down 0x4, total 0x28) - push eax (Down 0x4, total 0x2c) - .got.plt address located at $esp+0x1c - swapcontext call - add 0x10 esp (Up 0x10, total 0x1c) - .got.plt address located at $esp+0xc - sub 0xc esp (Down 0xc, total 0x28) - .got.plt address located at esp+0x18 Totals stack manipulations in f2() look correct, and access via 'mov 0x18(%esp),%esi' looks correct. A double check around the swapcontext(): (gdb) c Continuing. swap contexts in f2 Breakpoint 7, 0x0804943c in f2 () at tst-setcontext9.c:34 34 if (swapcontext (&ctx[4], &ctx[2]) != 0) (gdb) x/4x $esp + 0x1c 0xffffc57c: 0x00 0xc0 0x04 0x08 (gdb) c Continuing. set context in f1 set context Breakpoint 6, 0x08049449 in f2 () at tst-setcontext9.c:34 34 if (swapcontext (&ctx[4], &ctx[2]) != 0) (gdb) x/4x $esp + 0x1c 0xffffc57c: 0x48 0x95 0x04 0x08 (gdb) Shows the stack is corrupted during the call and the on-stack value is not preserved. (gdb) p/x *(int *)0xffffc57c $42 = 0x804c000 (gdb) watch *(int *)0xffffc57c Hardware watchpoint 8: *(int *)0xffffc57c (gdb) c Continuing. Hardware watchpoint 8: *(int *)0xffffc57c Old value = 134529024 New value = 134517990 0xf7e2d32a in swapcontext () from /lib/libc.so.6 (gdb) bt #0 0xf7e2d32a in swapcontext () from /lib/libc.so.6 #1 0x080494e6 in f1 () at tst-setcontext9.c:47 #2 0xf7e2d28b in makecontext () from /lib/libc.so.6 #3 0x0804c080 in ?? () #4 0xf7e06211 in __libc_start_main () from /lib/libc.so.6 #5 0x0804c000 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) And this makes perfect sense because... (In reply to Florian Weimer from comment #1) > It seems that the allocated stack is reused. I don't know why this happens. ... the test is buggy. This makes complete sense. We are in f2(), which is a callee of f1(), and when we swapcontext() to f1() via &ctx[2], all we are doing is moving esp back up to the starting value it had when we created &ctx[2], and there is enough work done in f1() after getcontext() that it will clobber all the values on the stack in f2() since f1() need not have made the stack adjustment early in the prologue (if it did then we would have entered f2() with enough stack gap to avoid the clobber). All of this is exacerbated by -fpie which requires lots of saves/restores from the stack to get all the required addresses computed. It was bound to fail. In fact there is only 32-bytes of stack between the start of f1() and the start of f2(). The simplest solution that retains the present test structure is: * Split f1 into f1a and f1b. * In f1a adjust ctx[2] to have it's own stack via makecontext. * From f1a call f2() directly. * Have f1b check for if (done) {...} else exit (EXIT_FAILURE);. The lesson here is that we should not setcontext back to a caller if you don't know if there is enough stack between the caller and the calee. I'll hand this back to you so you can look at this in your timezone.
(In reply to Carlos O'Donell from comment #5) > The lesson here is that we should not setcontext back to a caller if you > don't know if there is enough stack between the caller and the calee. ... if you plan to return to the callee with another setcontext.
Fixed upstream in master. commit 791b350dc725545e3f9b5db0f97ebdbc60c9735f (HEAD -> master, origin/master, origin/HEAD) Author: Carlos O'Donell <carlos> Date: Wed Sep 5 01:16:42 2018 -0400 Fix tst-setcontext9 for optimized small stacks. If the compiler reduces the stack usage in function f1 before calling into function f2, then when we swapcontext back to f1 and continue execution we may overwrite registers that were spilled to the stack while f2 was executing. Later when we return to f2 the corrupt registers will be reloaded from the stack and the test will crash. This was most commonly observed on i686 with __x86.get_pc_thunk.dx and needing to save and restore $edx. Overall i686 has few registers and the spilling to the stack is bound to happen, therefore the solution to making this test robust is to split function f1 into two parts f1a and f1b, and allocate f1b it's own stack such that subsequent execution does not overwrite the stack in use by function f2. Tested on i686 and x86_64. Signed-off-by: Carlos O'Donell <carlos>
The test still fails on POWER in glibc-2.28.9000-7.fc30. The upstream fix is insufficient.
(In reply to Florian Weimer from comment #8) > The test still fails on POWER in glibc-2.28.9000-7.fc30. The upstream fix > is insufficient. I will investigate this. I assume the failure can be seen on ppc64le.
(In reply to Carlos O'Donell from comment #9) > (In reply to Florian Weimer from comment #8) > > The test still fails on POWER in glibc-2.28.9000-7.fc30. The upstream fix > > is insufficient. > > I will investigate this. I assume the failure can be seen on ppc64le. I see Andreas has fixed this already: commit f841c97e515a1673485a2b12b3c280073d737890 Author: Andreas Schwab <schwab> Date: Thu Sep 27 11:12:13 2018 +0200 Fix stack overflow in tst-setcontext9 (bug 23717) The function f1a, executed on a stack of size 32k, allocates an object of size 32k on the stack. Make the stack variables static to reduce excessive stack usage. And cherry picked into 2.28. commit 3a67c72c1512f778304a5644dea2fcf5bdece274 Author: Andreas Schwab <schwab> Date: Thu Sep 27 12:37:06 2018 +0200 Fix stack overflow in tst-setcontext9 (bug 23717) The function f1a, executed on a stack of size 32k, allocates an object of size 32k on the stack. Make the stack variables static to reduce excessive stack usage. (cherry picked from commit f841c97e515a1673485a2b12b3c280073d737890)
Correct, fixes are under way.
glibc-2.28-17.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-0e5874aba9
glibc-2.28-17.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-0e5874aba9
glibc-2.28-17.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.