Bug 108634 - Signal handler installation races with signal, glibc-2.3.2
Summary: Signal handler installation races with signal, glibc-2.3.2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: glibc
Version: 9
Hardware: i386
OS: Linux
medium
low
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-10-30 20:25 UTC by Erik
Modified: 2016-11-24 15:03 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2003-11-14 15:13:55 UTC
Embargoed:


Attachments (Terms of Use)
A patch for RH AS 2.1 (2.12 KB, patch)
2004-04-13 17:28 UTC, H.J. Lu
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2003:325 0 normal SHIPPED_LIVE : Updated glibc packages provide security and bug fixes 2003-11-12 05:00:00 UTC
Red Hat Product Errata RHSA-2003:334 0 normal SHIPPED_LIVE Low: glibc security update 2003-11-14 05:00:00 UTC

Description Erik 2003-10-30 20:25:49 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030701

Description of problem:
This problem was first noticed in a real-world test while benchmarking
mail delivery through exim.  The following requirements must be met for
the problem occur:

- Non-NPTL-enabled kernel.  For example, vanilla 2.4.21 (not Red Hat
  patched kernels).

- glibc 2.3.2.  2.3.1 does not exhibit the problem.

- Program must be compiled with -lpthread.

Attached is the example program which can be compiled with:

        gcc -o signal-crash-example signal-crash-example.c -lpthread

When the problem occurs, the process exits with a Segmentation Fault. 
The test program should produce this in under a second.

strace seems to help the problem occur, and shows this trace:

rt_sigaction(SIGCHLD, {0x804bb68, [CHLD], SA_RESTORER|SA_RESTART, 0x804d8c8},
{SIG_DFL}, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

Compiled with -g3, GDB shows this backtrace:

Starting program: /root/signal-crash-example 
[New Thread 16384 (LWP 25497)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16384 (LWP 25497)]
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x400294be in __pthread_sighandler () from /lib/i686/libpthread.so.0
#2  <signal handler called>
#3  0x4009652f in __libc_sigaction () from /lib/i686/libc.so.6
#4  0x40026a8a in sigaction () from /lib/i686/libpthread.so.0
#5  0x40096631 in sigaction () from /lib/i686/libc.so.6
#6  0x400963e3 in ssignal () from /lib/i686/libc.so.6
#7  0x08048567 in main (argc=1, argv=0xbfffee24) at signal-crash-example.c:34
#8  0x40083a07 in __libc_start_main () from /lib/i686/libc.so.6

Is it trying to jump to 0x00000000?  That definitely won't work...

Here is the sample signal-crash-example.c code which reproduces the problem:

#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>

void signalhandler(int sig)
{
}

int main(int argc, char *argv[])
{
        pid_t pid;
        int i;

        while (1) {
                for (i = 0; i < 100; i++){
                        if ((pid = fork()) == -1) {
                                perror("Fork error");
                                exit(1);
                        }
                        if (pid == 0) {
                                usleep(100000);
                                signal(SIGCHLD, SIG_DFL);
                                exit(0);
                        }
                }
                usleep(100000);
                for (i = 0; i < 10000; i++){
                        signal(SIGCHLD, SIG_DFL);
                        signal(SIGCHLD, signalhandler);
                        signal(SIGCHLD, SIG_DFL);
                        signal(SIGCHLD, signalhandler);
                }
                while (wait(NULL) != -1)
                        ;
        }
}


Version-Release number of selected component (if applicable):
glibc-2.3.2-27.9

How reproducible:
Always

Steps to Reproduce:
1. compile the sample code above with pthread flag
2. execute binary
3.
    

Actual Results:  Segmentation fault (core dumped)

Additional info:

This was also tested on RedHat 8

Comment 1 Jakub Jelinek 2003-10-31 17:24:00 UTC
Don't know why you claim that 2.3.1 doesn't exhibit the problem, I can very
easily reproduce it on any linuxthreads I've tried (e.g. 2.2.4, 2.2.5, 2.3.2;
the relevant code hasn't changed since at least 1998 when it was added to glibc).
Signal handling is broken in way more ways in linuxthreads than just this one,
which doesn't mean we won't look at this exact case, just that it is certainly
not very high priority.  For usable signal handling there is always NPTL.

Comment 2 Erik 2003-10-31 19:40:16 UTC
Yes, there is always NPTL, but correct me if I'm wrong, you can only use NPTL
with RH's kernels, not a custom kernel from kernel.org.

I know that RH9 ships with a glibc capable of regular linux threads and NPTL
which will dynamically switch between them depending on the capabilities of the
kernel.

Is there a single patch available that we can apply to a regular kernel.org
kernel so that we can utilize NPTL on custom compiled kernels? If that were the
case perhaps it would 'fix' this issue as well.


Comment 3 Ulrich Drepper 2003-11-04 19:02:31 UTC
You miss the point.  LinuxThreads and signals never mixed, never
worked, never will be.  If you need signals and threads you have to
use NPTL.  This is not some act of forcing you to use a RH kernel. 
The functionality simply wasn't available before.

Either stop using signals or require NPTL.  There is no other reliable
way.

Comment 4 Erik 2003-11-04 19:24:58 UTC
OK, that makes sense. Now, is there a patch that we can apply to a
vanilla kernel which will enable us to utilize NPTL??

I've tried to look through RH kernel SRPMs to see if I could find a
single NPTL kernel patch but so far have been unable to do so.

TIA

Comment 6 Ulrich Drepper 2003-11-10 21:57:55 UTC
Please try the test version of the RHL9 errata at

  ftp://people.redhat.com/jakub/glibc/errata/2.3.2-27.9.4/

and let us know how it works.

Comment 7 Jakub Jelinek 2003-11-14 15:13:55 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2003-334.html


Comment 8 H.J. Lu 2004-04-13 17:28:14 UTC
Created attachment 99367 [details]
A patch for RH AS 2.1

RH AS 2.1 has similar problem. This patch is backported from
mainline.


Note You need to log in before you can comment on or make changes to this bug.