Description of problem: When we launch a threaded application as a child of an "xterm -e" command on RHEL 4 Update 1 and Update 2, the process exits immediately after the first thread exit. Version-Release number of selected component (if applicable): kernel-2.6.9-11.EL and kernel-2.6.9-22.EL How reproducible: Everytime Steps to Reproduce: 1. Run test program under "xterm -hold -e" 2. 3. Actual results: Test program prematurely exits immediately after first child thread exits. Expected results: Test program should run to completion. Additional info: When we launch a threaded application as a child of an "xterm -e" command on RHEL 4 Update 1 and Update 2, the process exits immediately after the first thread exit. This occurs with applications built on RHEL 3 Update 4 as well as ones built on RHEL 4 Update 1. We have seen this on i386 and x86_64 platforms. If the applications are launched from a command-line or from a script or with "execv" from another program, they do NOT prematurely exit. If the applications built on RHEL 3 Update 4 are launched as a child of an "xterm -e" command on RHEL 3 Update 4, they do NOT prematurely exit. The premature exit only happens when launching an application as a child of an "xterm -e" command on RHEL 4 Update 1 and Update 2. We have debugged this issue as far as we can. When "xterm -e" is used to run applications, "xterm -e" is the session leader and when a thread in the child application exits, the kernel sends a SIGHUP to the child. The SIGHUP causes the child process to exit unless SIGHUP signals are specifically handled or set to SIG_IGN by the application. The way "xterm" sets up it session and launches the child process has not changed between xterm-175 in the RHEL 3 release and xterm-192 in the RHEL 4 release. The difference in thread exit handling between RHEL 3 and RHEL 4 is probably in the thread exit handling by the kernel. In RHEL 4 if the application is not the session leader and a thread exits, the kernel sends a SIGHUP and if the application is the session leader and a thread exits, no SIGHUP is sent. Our work-around has been to set SIGHUP signal handling to SIG_IGN. We have noticed that this work-around may cause a child process not to exit when its parent dies. Fortunately we have very few instances where this can occur and where it does occur application work-flow is not seriously affected. This issue also appears to be a regression. A user group thread at Hull Linux User Group -> Kernel Talk described this(or a very very similar) issue in October of 2004. The thread was titled "NPTL: Parent thread receives SIGHUP when child thread terminates?" and can be found at http://www.thisishull.net/showthread.php?t=50930&highlight=NPTL Roland McGrath published a fix for this issue in that user group thread. Has this bug crept back into the 2.6 kernel or are we seeing something else? We have a test program which demonstrates this issue. Compile the program with gcc -pthread -o thread_test1 thread_test1.c On an RHEL 4 U1 or U2 system, run the program from the command-line and you will see that threads are created and exited normally and the program runs to normal completion. Run the program with "xterm -hold -e thread_test1" and you will see that the process exits after the first thread exit. If you add an argument to thread_test1, it sets a signal handler for SIGHUP. Running "xterm -hold -e thread_test1 sig" you will see a SIGHUP immediately after the first tread exit and the program runs to normal completion.
Created attachment 124322 [details] Thread test program demonstating premature exit when run under xterm -e
Can you please test this on a U3 beta kernel? you can find one at: http://people.redhat.com/~jbaron/rhel4/RPMS.kernel/ i believe this issue is resolved in U3. thanks.
Tested the thread_test1 program and our application programs under the U3 beta kernel and this did appear to resolve the issue. Thanks for the quick response.
sure. thanks for verifying the fix. *** This bug has been marked as a duplicate of 166454 ***
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html