Description of problem: The compiler option flag -fstack-clash-protection does not check some on-stack allocations of fixed-length local arrays, despite being promoted as "reliably detects all stack exhaustion". Version-Release number of selected component (if applicable): gcc-7.2.1-5.fc28.x86_64 How reproducible: every time Steps to Reproduce: 1. [test case below] Compare generated code that is compiled with versus without -fstack-clash-protection. 2. 3. Actual results: No checking for stack exhaustion with on-stack allocation of small local array when -fstack-clash-protection is specified. Expected results: -fstack-clash-protection checks all stack allocations Additional info: ===== foo.c test case extern int g(int *x, char **y); int main(int argc, char *argv[]) { int x[10]; g(&x[3], argv); return 0; } ===== ===== foo.s generated from "gcc -fstack-clash-protection" -S -O foo.c" .file "foo.c" .text .globl main .type main, @function main: .LFB0: .cfi_startproc subq $56, %rsp /* unchecked */ .cfi_def_cfa_offset 64 leaq 12(%rsp), %rdi call g movl $0, %eax addq $56, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (GNU) 7.2.1 20180101 (Red Hat 7.2.1-5)" .section .note.GNU-stack,"",@progbits ===== There is no check for stack exhaustion by "subq $56, %rsp".
?? The caller pushes the return address to stack, so there is an implicit stack probe at that point, and call g is yet another stack probe just 64 bytes below it. The point of -fstack-clash-protection is that you will not bypass a whole page.
"F28 System Wide Change: Hardening Flags Updates for Fedora 28" claims: As a result, all stack overflows (i.e., situations where the allocated stack is completely exhausted) will reliably result in crashes. That requires checking each stack allocation, because a thread may be started with a "ridiculously small stack" that becomes exhausted very quickly. Else the meaning of "all stack overflows" should be clarified. Perhaps it means "only on threads that are started via glibc and all code is compiled by gcc." Perhaps it means that there is a new thread-local variable (at an eternal-constant offset in thread-local storage) which gives the stack limit, "everybody" (all code generators and runtime libraries) must check it, and gcc implements optimizations based on knowing the underlying page size (which of course does vary on x86* [4Kib, 2MiB, 1GiB] and some other machines.)
In particular, will there be an additional argument to clone(), giving the stack size?
The ABI already requires a guard page at the end of the stack. If an application sets up its own stacks, it has to arrange for such a guard page. This is not a new requirement at all. Note that unlike previous stack probing implementations, -fstack-clash-protection only probes in the actually used area of the stack, so really small stacks with a guard page are still fully supported in this mode.
Jakub and Florian are correct here. The push of the return address by the caller on the stack acts as an implicit probe. As a result the callee can allocate up to 4k before it has to start probing. There may be mis-understanding of what stack clash protection does that are at the heart of these BZs (and the text WRT the system wide F28 changes does not help here). Stack clash protection is designed to prevent stack allocations from jumping the guard page -- it isn't really concerned with overflows. Clearly we need to get the F28 change text fixed to match reality. I plan on doing some work on the F28 change requests and such early next week, assuming I can stop responding to questions WRT the latest CVEs :(