Bug 512103 - longjmp from sighandle alternate stack causes fortify failure
Summary: longjmp from sighandle alternate stack causes fortify failure
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: glibc
Version: rawhide
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Andreas Schwab
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 511723 516995
TreeView+ depends on / blocked
 
Reported: 2009-07-16 10:52 UTC by Stepan Kasal
Modified: 2009-10-30 18:47 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2009-09-22 07:35:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
reproducer (C cource) (1.68 KB, text/plain)
2009-07-16 10:52 UTC, Stepan Kasal
no flags Details
short patch (Paolo Bonzini) (809 bytes, patch)
2009-07-16 10:59 UTC, Stepan Kasal
no flags Details | Diff
longer patch (Paolo Bonzini) (8.83 KB, patch)
2009-07-16 11:04 UTC, Stepan Kasal
no flags Details | Diff
reproducer with debug output (C source) (2.30 KB, text/plain)
2009-07-17 12:53 UTC, Paolo Bonzini
no flags Details
reproducer with HAVE_SETRLIMIT (1.71 KB, text/plain)
2009-09-22 08:39 UTC, Paolo Bonzini
no flags Details

Description Stepan Kasal 2009-07-16 10:52:04 UTC
Created attachment 353971 [details]
reproducer (C cource)

Jumping from alternate stack back confuses the fortify check.
See the attached file conftest.c.

Version-Release number of selected component (if applicable):
2.10.90-3

(The problem does not appear on Fedora 11's glibc-2.10.1 as it does not check longjmp's.)

How reproducible:
always, observed on i586, x86_64, and ppc.

Steps to Reproduce:
1. gcc -D_FORTIFY_SOURCE=2 -o conftest conftest.c
2. ./conftest
3. observe __fortify_fail called from __longjmp_chk

(Thanks to Paolo Bonzini who did the analysis of the problem.)

Comment 1 Stepan Kasal 2009-07-16 10:59:57 UTC
Created attachment 353972 [details]
short patch (Paolo Bonzini)

This patch by Paolo Bonzini works, but has a disadvantage of adding to much overhead to every longjmp call.

Comment 2 Stepan Kasal 2009-07-16 11:04:06 UTC
Created attachment 353973 [details]
longer patch (Paolo Bonzini)

This longer patch is better, except that it does not work.  ;-)

Quoting Paolo: "[it] does not work because it lacks pushes/pops to save the caller-save registers across the calls to __longjmp_chk_fail.  But at least most of the plumbing is done."

Comment 3 Ulrich Drepper 2009-07-17 03:53:07 UTC
Using longjmp from a signal handler using an alternative stack is very wrong.  The kernel keeps track of whether the thread is currently using an alternative stack or not.  This is completely thrown over by using longjmp.  The error is IMO very appropriate.  The application should be fixed.

Comment 4 Paolo Bonzini 2009-07-17 12:53:39 UTC
Created attachment 354136 [details]
reproducer with debug output (C source)

It's true that on other operating special care is required to reset the flag, for example using siglongjmp or setcontext.  In fact, the reproducer that Stepan attached is a configure test that tries to generate *two* SIGSEGVs exactly to detect whether this kind of special care is needed; if it fails, subsequent tests will try siglongjmp or setcontext.

The very fact that the testcase Stepan attached passed on Fedora 11 shows that for Linux the kernel can determine autonomously whether the thread is running on the normal stack.  I'm attaching another testcase (tested on Fedora 11) that demonstrates this by running sigaltstack right after the longjmp, and showing that SS_ONSTACK is cleared automatically.

Having said that (i.e. your comment "This is completely thrown over by using longjmp" is wrong) I might agree that libsigsegv should be fixed if only for the sake of POSIXicity.  Would you use siglongjmp or using setcontext or either?

Comment 5 Ulrich Drepper 2009-07-31 04:53:53 UTC
I've fixed this now upstream for x86 and x86-64.  PPC waits for IBM to do their part.  I've used a different approach than the proposed patch.  Should be in 20.10.90-12 once Andreas builds it.

Comment 6 Paolo Bonzini 2009-08-01 10:47:06 UTC
Thanks.

Comment 7 Andreas Schwab 2009-08-24 14:15:50 UTC
Fully fixed in 2.10.90-15.

Comment 8 Stepan Kasal 2009-09-09 19:58:50 UTC
(In reply to comment #7)
> Fully fixed in 2.10.90-15.  

Thanks, the bug indeed is fixed on most archs.

But, unfortunately, things go south on x86-64, see the following log:
http://koji.fedoraproject.org/koji/getfile?taskID=1665971&name=build.log

The last line printed is
  checking whether a signal handler can be left through longjmp... no

That should be equivalent to the failure with the test program attached here, so I'm reopening the bug.

Moreover, the build hangs forever after printing that line.
I guess one of the subsequent checks hangs the build; here is the expected output:

checking whether a signal handler can be left through longjmp... yes
checking whether a signal handler can be left through longjmp and sigaltstack... yes
checking whether a signal handler can be left through longjmp and setcontext... yes
checking whether a signal handler can be left through siglongjmp... yes
checking whether a signal handler can be left through siglongjmp and sigaltstack... yes
checking whether a signal handler can be left through siglongjmp and setcontext... yes

Comment 9 Stepan Kasal 2009-09-09 20:06:52 UTC
Forgot to say, comment #8 describes my experience with current rawhide, which has glib-2.10.90-21.

Comment 11 Andreas Schwab 2009-09-10 09:44:22 UTC
Please provide the configure test.

Comment 12 Rex Dieter 2009-09-20 16:53:10 UTC
Here's the hanging check that libsigsegv (bug #511723) uses anyway:

#include <stdlib.h>
#include <signal.h>
#include <setjmp.h>

#if HAVE_SETRLIMIT
# include <sys/types.h>
# include <sys/time.h>
# include <sys/resource.h>
#endif
#ifndef SIGSTKSZ
# define SIGSTKSZ 16384
#endif
jmp_buf mainloop;
sigset_t mainsigset;
int pass = 0;
void stackoverflow_handler (int sig)
{
  pass++;
  sigprocmask (SIG_SETMASK, &mainsigset, NULL);
  {  }
  longjmp (mainloop, pass);
}
volatile int * recurse_1 (volatile int n, volatile int *p)
{
  if (n >= 0)
    *recurse_1 (n + 1, p) += n;
  return p;
}
int recurse (volatile int n)
{
  int sum = 0;
  return *recurse_1 (n, &sum);
}
int main ()
{
  char mystack[SIGSTKSZ];
  stack_t altstack;
  struct sigaction action;
  sigset_t emptyset;
#if defined HAVE_SETRLIMIT && defined RLIMIT_STACK
  /* Before starting the endless recursion, try to be friendly to the user's
     machine.  On some Linux 2.2.x systems, there is no stack limit for user
     processes at all.  We don't want to kill such systems.  */
  struct rlimit rl;
  rl.rlim_cur = rl.rlim_max = 0x100000; /* 1 MB */
  setrlimit (RLIMIT_STACK, &rl);
#endif
  /* Install the alternate stack.  */
  altstack.ss_sp = mystack;
  altstack.ss_size = sizeof (mystack);
  altstack.ss_flags = 0; /* no SS_DISABLE */
  if (sigaltstack (&altstack, NULL) < 0)
    exit (1);
  /* Install the SIGSEGV handler.  */
  sigemptyset (&action.sa_mask);
  action.sa_handler = &stackoverflow_handler;
  action.sa_flags = SA_ONSTACK;
  sigaction (SIGSEGV, &action, (struct sigaction *) NULL);
  sigaction (SIGBUS, &action, (struct sigaction *) NULL);
  /* Save the current signal mask.  */
  sigemptyset (&emptyset);
  sigprocmask (SIG_BLOCK, &emptyset, &mainsigset);
  /* Provoke two stack overflows in a row.  */
  if (setjmp (mainloop) < 2)
    {
      recurse (0);
      exit (2);
    }
  exit (0);
}

Comment 13 Andreas Schwab 2009-09-21 10:37:46 UTC
You need to define HAVE_SETRLIMIT.

Comment 14 Rex Dieter 2009-09-21 16:10:46 UTC
I believe it is already, but I can double-check.

Comment 15 Rex Dieter 2009-09-21 17:02:18 UTC
From, a debug scratch build on x86_64,
http://koji.fedoraproject.org/koji/taskinfo?taskID=1695350

The defines used in that test include:
#define PACKAGE_NAME ""
#define PACKAGE_TARNAME ""
#define PACKAGE_VERSION ""
#define PACKAGE_STRING ""
#define PACKAGE_BUGREPORT ""
#define PACKAGE "libsigsegv"
#define VERSION "2.6"
#define STDC_HEADERS 1
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_STDLIB_H 1
#define HAVE_STRING_H 1
#define HAVE_MEMORY_H 1
#define HAVE_STRINGS_H 1
#define HAVE_INTTYPES_H 1
#define HAVE_STDINT_H 1
#define HAVE_UNISTD_H 1
#define HAVE_DLFCN_H 1
#define LT_OBJDIR ".libs/"
#define HAVE_SYS_SIGNAL_H 1
#define CFG_SIGNALS "signals.h"
#define HAVE_UNISTD_H 1
#define HAVE_GETPAGESIZE 1
#define HAVE_SYSCONF_PAGESIZE 1
#define HAVE_MMAP_ANON 1
#define HAVE_MMAP_ANONYMOUS 1
#define HAVE_MMAP_DEVZERO 1
#define CFG_FAULT "fault-posix.h"
#define CFG_MACHFAULT "fault-none.h"
#define STACK_DIRECTION -1
#define HAVE_MINCORE 1
#define HAVE_STACKVMA 1
#define CFG_STACKVMA "stackvma-linux.c"
#define HAVE_GETRLIMIT 1
#define HAVE_SETRLIMIT 1
#define HAVE_SIGALTSTACK 1
#define HAVE_WORKING_SIGALTSTACK 1

Comment 16 Andreas Schwab 2009-09-22 07:35:12 UTC
Your testcase does not include any of these symbols.  Without HAVE_SETRLIMIT it will try to gobble all available memory.

Comment 17 Paolo Bonzini 2009-09-22 08:39:10 UTC
Created attachment 362040 [details]
reproducer with HAVE_SETRLIMIT

Since Andreas's objection is correct, here is a valid test program, including also the fix for bug 524795.  I don't have a Rawhide machine in order to test it.


Note You need to log in before you can comment on or make changes to this bug.