Bug 1260525 - perl core dumps on terminating threads that handled a signal
perl core dumps on terminating threads that handled a signal
Status: CLOSED CANTFIX
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: perl (Show other bugs)
6.7
All Linux
unspecified Severity high
: rc
: ---
Assigned To: perl-maint-list
BaseOS QE - Apps
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-07 03:16 EDT by Thorsten Scherf
Modified: 2015-12-22 09:14 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-09-09 07:29:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
The suggested reproducer (873 bytes, text/plain)
2015-09-07 05:14 EDT, Petr Šabata
no flags Details
Reproducer (895 bytes, text/plain)
2015-09-09 07:06 EDT, Petr Pisar
no flags Details
Fix ported to 5.10.1 with a regression (30.00 KB, application/x-tar)
2015-09-09 07:24 EDT, Petr Pisar
no flags Details
Regression test (156 bytes, text/plain)
2015-09-09 07:27 EDT, Petr Pisar
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2105111 None None None 2015-12-22 09:14 EST

  None (edit)
Description Thorsten Scherf 2015-09-07 03:16:25 EDT
Description of problem:
An application has a threaded perl program that core dumps when it is sent a signal 15 to tell it to shut down. They have given me a reproducer, which I will attach.  To get the core dump, you have to run it in the background and then do a kill -15 of that background process (and have ulimit -c set high enough to get a core of course). I recongnize that thread-safe-ness is a controversial topic in perl, but http://search.cpan.org/~lbaxter/Sys-SigAction/lib/Sys/SigAction.pm does seem to imply that the sigaction they are using is supposed to work with threads starting with perl 5.8 and this is 5.10.1-141.el6. I see that it seems to work okay without a core dump in RHEL7 on perl 5.16.3-285.el7

Version-Release number of selected component (if applicable):
perl-5.10.1-141.el6.

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Petr Šabata 2015-09-07 05:14:46 EDT
Created attachment 1070871 [details]
The suggested reproducer
Comment 2 Petr Pisar 2015-09-07 12:01:55 EDT
According to my tests, upstream fixed it between versions 5.13.1 and 5.13.2.
Comment 3 Thorsten Scherf 2015-09-07 12:52:33 EDT
HI Petr, so the statement from the above mentioned URL is not completely true? 

"""
Perl 5.8.0 and later versions implements 'safe' signal handling on platforms which support the POSIX sigaction() function. This is accomplished by having perl note that a signal has arrived, but deferring the execution of the signal handler until such time as it is safe to do so. Unfortunately these changes can break some existing scripts, if they depended on a system routine being interrupted by the signal's arrival. The perl 5.8.0 implementation was modified further in version 5.8.2.
"""
Comment 4 Petr Pisar 2015-09-08 02:07:00 EDT
(In reply to Thorsten Scherf from comment #3)
> HI Petr, so the statement from the above mentioned URL is not completely
> true? 
> 
> """
> Perl 5.8.0 and later versions implements 'safe' signal handling on platforms
> which support the POSIX sigaction() function. This is accomplished by having
> perl note that a signal has arrived, but deferring the execution of the
> signal handler until such time as it is safe to do so. Unfortunately these
> changes can break some existing scripts, if they depended on a system
> routine being interrupted by the signal's arrival. The perl 5.8.0
> implementation was modified further in version 5.8.2.
> """

Safe signals solves a problem when you register a signaler handler from a Perl code and then the operating system signal delivers the signal while the perl interpreter is doing some internals unrelated to the Perl code. When safe signals are enabled (which is a default since 5.8.0), perl interpreter blocks all signals while doing some internals and unblock them only for the time when your Perl code is evaluated.

There is no word about threads. There were many improvements and fixes after 5.12.0 regarding threads, for example there was introduced a new hook for dealing with signals designed for "threads" perl module.

I'm still locating all the commits that fixes it. I worry that they could introduce some ABI changes which are not suitable for RHEL-6.
Comment 5 Petr Pisar 2015-09-08 02:34:31 EDT
First upstream's good commit:

commit 05d04d9c74ee968bace5e063c9ded74f94b3df24
Author: Nicholas Clark <nick@ccl4.org>
Date:   Thu Feb 25 21:35:39 2010 +0000

    Don't clone the contents of lexicals in pads.
    
    This stops the values of lexicals in active stack frames in the parent leaking
    into the lexicals in the child thread.
    
    With an exception for lexicals with a reference count of > 1, to cope with the
    implementation of ?{{ ... }} blocks in regexps. :-(
Comment 6 Thorsten Scherf 2015-09-08 03:18:17 EDT
(In reply to Petr Pisar from comment #4)
> (In reply to Thorsten Scherf from comment #3)
> > HI Petr, so the statement from the above mentioned URL is not completely
> > true? 
> > 
> > """
> > Perl 5.8.0 and later versions implements 'safe' signal handling on platforms
> > which support the POSIX sigaction() function. This is accomplished by having
> > perl note that a signal has arrived, but deferring the execution of the
> > signal handler until such time as it is safe to do so. Unfortunately these
> > changes can break some existing scripts, if they depended on a system
> > routine being interrupted by the signal's arrival. The perl 5.8.0
> > implementation was modified further in version 5.8.2.
> > """
> 
> Safe signals solves a problem when you register a signaler handler from a
> Perl code and then the operating system signal delivers the signal while the
> perl interpreter is doing some internals unrelated to the Perl code. When
> safe signals are enabled (which is a default since 5.8.0), perl interpreter
> blocks all signals while doing some internals and unblock them only for the
> time when your Perl code is evaluated.
> 
> There is no word about threads. There were many improvements and fixes after
> 5.12.0 regarding threads, for example there was introduced a new hook for
> dealing with signals designed for "threads" perl module.

Ah, you are right. Was already late when I made the last comment. ;)

> I'm still locating all the commits that fixes it. I worry that they could
> introduce some ABI changes which are not suitable for RHEL-6.

Appreciate your investigations.
Comment 7 Petr Pisar 2015-09-09 07:06:21 EDT
Created attachment 1071687 [details]
Reproducer
Comment 8 Petr Pisar 2015-09-09 07:18:41 EDT
The fix:

commit 05d04d9c74ee968bace5e063c9ded74f94b3df24
Author: Nicholas Clark <nick@ccl4.org>
Date:   Thu Feb 25 21:35:39 2010 +0000

    Don't clone the contents of lexicals in pads.
    
requires commits:

commit d5b1589c09b534ccfeb2eae26b3de9339c1bf22b
Author: Nicholas Clark <nick@ccl4.org>
Date:   Wed Feb 10 09:57:23 2010 +0000

    Convert PAD_DUP to a function Perl_padlist_dup().
    
commit 6de654a5795b6f7915432ff16bcdac0688492a9b
Author: Nicholas Clark <nick@ccl4.org>
Date:   Thu Feb 25 14:21:18 2010 +0000

    In Perl_padlist_dup() don't duplicate @_ or pads caused by recursion.

that are based on:

commit a09252eb79f700c93c37322c1ad831cf3193571b
Author: Nicholas Clark <nick@ccl4.org>
Date:   Tue Feb 23 14:48:17 2010 +0000

    Convert Perl_sv_dup_inc() from a macro to a real function.

This patches fixes the crash on terminating threads that handled a signal. 

However, after porting them to 5.10.1, a new regression arises (a memory leak check in op/threads.t). The regression is fixed by:

commit d08d57ef17162c52e2024a3ba6755f778acbc697
Author: Nicholas Clark <nick@ccl4.org>
Date:   Wed Feb 24 17:15:41 2010 +0000

    Better ithreads cloning - add all SVs with a 0 refcnt to the temps stack.

which requires

commit 1db366cc74404c47243e1d86efa59c6559db818e
Author: Nicholas Clark <nick@ccl4.org>
Date:   Mon May 24 15:48:06 2010 +0100

    Cleaner implementations for Perl_clone_params_{new,del}

The 1db366cc74404c47243e1d86efa59c6559db818e commit changes API and ABI.


Therefore the reported issue cannot be fixed with these patches in RHEL-6. I recommend customer to use RHEL-7 where the fix with a new threads implementation is included.

I will attach the ported fix here for those who are interested. But I warn that it involves a regression.
Comment 9 Petr Pisar 2015-09-09 07:24:43 EDT
Created attachment 1071689 [details]
Fix ported to 5.10.1 with a regression
Comment 10 Petr Pisar 2015-09-09 07:27:37 EDT
Created attachment 1071693 [details]
Regression test

This demonstrates the regression. Affected perl reports one leaked scalar:

$ LD_PRELOAD=./libperl.so ./perl -Ilib  /tmp/regression 
Scalars leaked: 1

Note You need to log in before you can comment on or make changes to this bug.