Created attachment 353971 [details] reproducer (C cource) Jumping from alternate stack back confuses the fortify check. See the attached file conftest.c. Version-Release number of selected component (if applicable): 2.10.90-3 (The problem does not appear on Fedora 11's glibc-2.10.1 as it does not check longjmp's.) How reproducible: always, observed on i586, x86_64, and ppc. Steps to Reproduce: 1. gcc -D_FORTIFY_SOURCE=2 -o conftest conftest.c 2. ./conftest 3. observe __fortify_fail called from __longjmp_chk (Thanks to Paolo Bonzini who did the analysis of the problem.)
Created attachment 353972 [details] short patch (Paolo Bonzini) This patch by Paolo Bonzini works, but has a disadvantage of adding to much overhead to every longjmp call.
Created attachment 353973 [details] longer patch (Paolo Bonzini) This longer patch is better, except that it does not work. ;-) Quoting Paolo: "[it] does not work because it lacks pushes/pops to save the caller-save registers across the calls to __longjmp_chk_fail. But at least most of the plumbing is done."
Using longjmp from a signal handler using an alternative stack is very wrong. The kernel keeps track of whether the thread is currently using an alternative stack or not. This is completely thrown over by using longjmp. The error is IMO very appropriate. The application should be fixed.
Created attachment 354136 [details] reproducer with debug output (C source) It's true that on other operating special care is required to reset the flag, for example using siglongjmp or setcontext. In fact, the reproducer that Stepan attached is a configure test that tries to generate *two* SIGSEGVs exactly to detect whether this kind of special care is needed; if it fails, subsequent tests will try siglongjmp or setcontext. The very fact that the testcase Stepan attached passed on Fedora 11 shows that for Linux the kernel can determine autonomously whether the thread is running on the normal stack. I'm attaching another testcase (tested on Fedora 11) that demonstrates this by running sigaltstack right after the longjmp, and showing that SS_ONSTACK is cleared automatically. Having said that (i.e. your comment "This is completely thrown over by using longjmp" is wrong) I might agree that libsigsegv should be fixed if only for the sake of POSIXicity. Would you use siglongjmp or using setcontext or either?
I've fixed this now upstream for x86 and x86-64. PPC waits for IBM to do their part. I've used a different approach than the proposed patch. Should be in 20.10.90-12 once Andreas builds it.
Thanks.
Fully fixed in 2.10.90-15.
(In reply to comment #7) > Fully fixed in 2.10.90-15. Thanks, the bug indeed is fixed on most archs. But, unfortunately, things go south on x86-64, see the following log: http://koji.fedoraproject.org/koji/getfile?taskID=1665971&name=build.log The last line printed is checking whether a signal handler can be left through longjmp... no That should be equivalent to the failure with the test program attached here, so I'm reopening the bug. Moreover, the build hangs forever after printing that line. I guess one of the subsequent checks hangs the build; here is the expected output: checking whether a signal handler can be left through longjmp... yes checking whether a signal handler can be left through longjmp and sigaltstack... yes checking whether a signal handler can be left through longjmp and setcontext... yes checking whether a signal handler can be left through siglongjmp... yes checking whether a signal handler can be left through siglongjmp and sigaltstack... yes checking whether a signal handler can be left through siglongjmp and setcontext... yes
Forgot to say, comment #8 describes my experience with current rawhide, which has glib-2.10.90-21.
Please provide the configure test.
Here's the hanging check that libsigsegv (bug #511723) uses anyway: #include <stdlib.h> #include <signal.h> #include <setjmp.h> #if HAVE_SETRLIMIT # include <sys/types.h> # include <sys/time.h> # include <sys/resource.h> #endif #ifndef SIGSTKSZ # define SIGSTKSZ 16384 #endif jmp_buf mainloop; sigset_t mainsigset; int pass = 0; void stackoverflow_handler (int sig) { pass++; sigprocmask (SIG_SETMASK, &mainsigset, NULL); { } longjmp (mainloop, pass); } volatile int * recurse_1 (volatile int n, volatile int *p) { if (n >= 0) *recurse_1 (n + 1, p) += n; return p; } int recurse (volatile int n) { int sum = 0; return *recurse_1 (n, &sum); } int main () { char mystack[SIGSTKSZ]; stack_t altstack; struct sigaction action; sigset_t emptyset; #if defined HAVE_SETRLIMIT && defined RLIMIT_STACK /* Before starting the endless recursion, try to be friendly to the user's machine. On some Linux 2.2.x systems, there is no stack limit for user processes at all. We don't want to kill such systems. */ struct rlimit rl; rl.rlim_cur = rl.rlim_max = 0x100000; /* 1 MB */ setrlimit (RLIMIT_STACK, &rl); #endif /* Install the alternate stack. */ altstack.ss_sp = mystack; altstack.ss_size = sizeof (mystack); altstack.ss_flags = 0; /* no SS_DISABLE */ if (sigaltstack (&altstack, NULL) < 0) exit (1); /* Install the SIGSEGV handler. */ sigemptyset (&action.sa_mask); action.sa_handler = &stackoverflow_handler; action.sa_flags = SA_ONSTACK; sigaction (SIGSEGV, &action, (struct sigaction *) NULL); sigaction (SIGBUS, &action, (struct sigaction *) NULL); /* Save the current signal mask. */ sigemptyset (&emptyset); sigprocmask (SIG_BLOCK, &emptyset, &mainsigset); /* Provoke two stack overflows in a row. */ if (setjmp (mainloop) < 2) { recurse (0); exit (2); } exit (0); }
You need to define HAVE_SETRLIMIT.
I believe it is already, but I can double-check.
From, a debug scratch build on x86_64, http://koji.fedoraproject.org/koji/taskinfo?taskID=1695350 The defines used in that test include: #define PACKAGE_NAME "" #define PACKAGE_TARNAME "" #define PACKAGE_VERSION "" #define PACKAGE_STRING "" #define PACKAGE_BUGREPORT "" #define PACKAGE "libsigsegv" #define VERSION "2.6" #define STDC_HEADERS 1 #define HAVE_SYS_TYPES_H 1 #define HAVE_SYS_STAT_H 1 #define HAVE_STDLIB_H 1 #define HAVE_STRING_H 1 #define HAVE_MEMORY_H 1 #define HAVE_STRINGS_H 1 #define HAVE_INTTYPES_H 1 #define HAVE_STDINT_H 1 #define HAVE_UNISTD_H 1 #define HAVE_DLFCN_H 1 #define LT_OBJDIR ".libs/" #define HAVE_SYS_SIGNAL_H 1 #define CFG_SIGNALS "signals.h" #define HAVE_UNISTD_H 1 #define HAVE_GETPAGESIZE 1 #define HAVE_SYSCONF_PAGESIZE 1 #define HAVE_MMAP_ANON 1 #define HAVE_MMAP_ANONYMOUS 1 #define HAVE_MMAP_DEVZERO 1 #define CFG_FAULT "fault-posix.h" #define CFG_MACHFAULT "fault-none.h" #define STACK_DIRECTION -1 #define HAVE_MINCORE 1 #define HAVE_STACKVMA 1 #define CFG_STACKVMA "stackvma-linux.c" #define HAVE_GETRLIMIT 1 #define HAVE_SETRLIMIT 1 #define HAVE_SIGALTSTACK 1 #define HAVE_WORKING_SIGALTSTACK 1
Your testcase does not include any of these symbols. Without HAVE_SETRLIMIT it will try to gobble all available memory.
Created attachment 362040 [details] reproducer with HAVE_SETRLIMIT Since Andreas's objection is correct, here is a valid test program, including also the fix for bug 524795. I don't have a Rawhide machine in order to test it.