Bug 512103 - longjmp from sighandle alternate stack causes fortify failure
longjmp from sighandle alternate stack causes fortify failure
Product: Fedora
Classification: Fedora
Component: glibc (Show other bugs)
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Andreas Schwab
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks: 516995 511723
  Show dependency treegraph
Reported: 2009-07-16 06:52 EDT by Stepan Kasal
Modified: 2009-10-30 14:47 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-09-22 03:35:12 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
reproducer (C cource) (1.68 KB, text/plain)
2009-07-16 06:52 EDT, Stepan Kasal
no flags Details
short patch (Paolo Bonzini) (809 bytes, patch)
2009-07-16 06:59 EDT, Stepan Kasal
no flags Details | Diff
longer patch (Paolo Bonzini) (8.83 KB, patch)
2009-07-16 07:04 EDT, Stepan Kasal
no flags Details | Diff
reproducer with debug output (C source) (2.30 KB, text/plain)
2009-07-17 08:53 EDT, Paolo Bonzini
no flags Details
reproducer with HAVE_SETRLIMIT (1.71 KB, text/plain)
2009-09-22 04:39 EDT, Paolo Bonzini
no flags Details

  None (edit)
Description Stepan Kasal 2009-07-16 06:52:04 EDT
Created attachment 353971 [details]
reproducer (C cource)

Jumping from alternate stack back confuses the fortify check.
See the attached file conftest.c.

Version-Release number of selected component (if applicable):

(The problem does not appear on Fedora 11's glibc-2.10.1 as it does not check longjmp's.)

How reproducible:
always, observed on i586, x86_64, and ppc.

Steps to Reproduce:
1. gcc -D_FORTIFY_SOURCE=2 -o conftest conftest.c
2. ./conftest
3. observe __fortify_fail called from __longjmp_chk

(Thanks to Paolo Bonzini who did the analysis of the problem.)
Comment 1 Stepan Kasal 2009-07-16 06:59:57 EDT
Created attachment 353972 [details]
short patch (Paolo Bonzini)

This patch by Paolo Bonzini works, but has a disadvantage of adding to much overhead to every longjmp call.
Comment 2 Stepan Kasal 2009-07-16 07:04:06 EDT
Created attachment 353973 [details]
longer patch (Paolo Bonzini)

This longer patch is better, except that it does not work.  ;-)

Quoting Paolo: "[it] does not work because it lacks pushes/pops to save the caller-save registers across the calls to __longjmp_chk_fail.  But at least most of the plumbing is done."
Comment 3 Ulrich Drepper 2009-07-16 23:53:07 EDT
Using longjmp from a signal handler using an alternative stack is very wrong.  The kernel keeps track of whether the thread is currently using an alternative stack or not.  This is completely thrown over by using longjmp.  The error is IMO very appropriate.  The application should be fixed.
Comment 4 Paolo Bonzini 2009-07-17 08:53:39 EDT
Created attachment 354136 [details]
reproducer with debug output (C source)

It's true that on other operating special care is required to reset the flag, for example using siglongjmp or setcontext.  In fact, the reproducer that Stepan attached is a configure test that tries to generate *two* SIGSEGVs exactly to detect whether this kind of special care is needed; if it fails, subsequent tests will try siglongjmp or setcontext.

The very fact that the testcase Stepan attached passed on Fedora 11 shows that for Linux the kernel can determine autonomously whether the thread is running on the normal stack.  I'm attaching another testcase (tested on Fedora 11) that demonstrates this by running sigaltstack right after the longjmp, and showing that SS_ONSTACK is cleared automatically.

Having said that (i.e. your comment "This is completely thrown over by using longjmp" is wrong) I might agree that libsigsegv should be fixed if only for the sake of POSIXicity.  Would you use siglongjmp or using setcontext or either?
Comment 5 Ulrich Drepper 2009-07-31 00:53:53 EDT
I've fixed this now upstream for x86 and x86-64.  PPC waits for IBM to do their part.  I've used a different approach than the proposed patch.  Should be in 20.10.90-12 once Andreas builds it.
Comment 6 Paolo Bonzini 2009-08-01 06:47:06 EDT
Comment 7 Andreas Schwab 2009-08-24 10:15:50 EDT
Fully fixed in 2.10.90-15.
Comment 8 Stepan Kasal 2009-09-09 15:58:50 EDT
(In reply to comment #7)
> Fully fixed in 2.10.90-15.  

Thanks, the bug indeed is fixed on most archs.

But, unfortunately, things go south on x86-64, see the following log:

The last line printed is
  checking whether a signal handler can be left through longjmp... no

That should be equivalent to the failure with the test program attached here, so I'm reopening the bug.

Moreover, the build hangs forever after printing that line.
I guess one of the subsequent checks hangs the build; here is the expected output:

checking whether a signal handler can be left through longjmp... yes
checking whether a signal handler can be left through longjmp and sigaltstack... yes
checking whether a signal handler can be left through longjmp and setcontext... yes
checking whether a signal handler can be left through siglongjmp... yes
checking whether a signal handler can be left through siglongjmp and sigaltstack... yes
checking whether a signal handler can be left through siglongjmp and setcontext... yes
Comment 9 Stepan Kasal 2009-09-09 16:06:52 EDT
Forgot to say, comment #8 describes my experience with current rawhide, which has glib-2.10.90-21.
Comment 11 Andreas Schwab 2009-09-10 05:44:22 EDT
Please provide the configure test.
Comment 12 Rex Dieter 2009-09-20 12:53:10 EDT
Here's the hanging check that libsigsegv (bug #511723) uses anyway:

#include <stdlib.h>
#include <signal.h>
#include <setjmp.h>

# include <sys/types.h>
# include <sys/time.h>
# include <sys/resource.h>
#ifndef SIGSTKSZ
# define SIGSTKSZ 16384
jmp_buf mainloop;
sigset_t mainsigset;
int pass = 0;
void stackoverflow_handler (int sig)
  sigprocmask (SIG_SETMASK, &mainsigset, NULL);
  {  }
  longjmp (mainloop, pass);
volatile int * recurse_1 (volatile int n, volatile int *p)
  if (n >= 0)
    *recurse_1 (n + 1, p) += n;
  return p;
int recurse (volatile int n)
  int sum = 0;
  return *recurse_1 (n, &sum);
int main ()
  char mystack[SIGSTKSZ];
  stack_t altstack;
  struct sigaction action;
  sigset_t emptyset;
#if defined HAVE_SETRLIMIT && defined RLIMIT_STACK
  /* Before starting the endless recursion, try to be friendly to the user's
     machine.  On some Linux 2.2.x systems, there is no stack limit for user
     processes at all.  We don't want to kill such systems.  */
  struct rlimit rl;
  rl.rlim_cur = rl.rlim_max = 0x100000; /* 1 MB */
  setrlimit (RLIMIT_STACK, &rl);
  /* Install the alternate stack.  */
  altstack.ss_sp = mystack;
  altstack.ss_size = sizeof (mystack);
  altstack.ss_flags = 0; /* no SS_DISABLE */
  if (sigaltstack (&altstack, NULL) < 0)
    exit (1);
  /* Install the SIGSEGV handler.  */
  sigemptyset (&action.sa_mask);
  action.sa_handler = &stackoverflow_handler;
  action.sa_flags = SA_ONSTACK;
  sigaction (SIGSEGV, &action, (struct sigaction *) NULL);
  sigaction (SIGBUS, &action, (struct sigaction *) NULL);
  /* Save the current signal mask.  */
  sigemptyset (&emptyset);
  sigprocmask (SIG_BLOCK, &emptyset, &mainsigset);
  /* Provoke two stack overflows in a row.  */
  if (setjmp (mainloop) < 2)
      recurse (0);
      exit (2);
  exit (0);
Comment 13 Andreas Schwab 2009-09-21 06:37:46 EDT
You need to define HAVE_SETRLIMIT.
Comment 14 Rex Dieter 2009-09-21 12:10:46 EDT
I believe it is already, but I can double-check.
Comment 15 Rex Dieter 2009-09-21 13:02:18 EDT
From, a debug scratch build on x86_64,

The defines used in that test include:
#define PACKAGE_NAME ""
#define PACKAGE "libsigsegv"
#define VERSION "2.6"
#define STDC_HEADERS 1
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_STDLIB_H 1
#define HAVE_STRING_H 1
#define HAVE_MEMORY_H 1
#define HAVE_STRINGS_H 1
#define HAVE_STDINT_H 1
#define HAVE_UNISTD_H 1
#define HAVE_DLFCN_H 1
#define LT_OBJDIR ".libs/"
#define CFG_SIGNALS "signals.h"
#define HAVE_UNISTD_H 1
#define HAVE_MMAP_ANON 1
#define CFG_FAULT "fault-posix.h"
#define CFG_MACHFAULT "fault-none.h"
#define HAVE_MINCORE 1
#define CFG_STACKVMA "stackvma-linux.c"
Comment 16 Andreas Schwab 2009-09-22 03:35:12 EDT
Your testcase does not include any of these symbols.  Without HAVE_SETRLIMIT it will try to gobble all available memory.
Comment 17 Paolo Bonzini 2009-09-22 04:39:10 EDT
Created attachment 362040 [details]
reproducer with HAVE_SETRLIMIT

Since Andreas's objection is correct, here is a valid test program, including also the fix for bug 524795.  I don't have a Rawhide machine in order to test it.

Note You need to log in before you can comment on or make changes to this bug.