Bug 2007327

Summary: glibc: Regression with pthread_once implementation.
Product: Red Hat Enterprise Linux 8 Reporter: Paulo Andrade <pandrade>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED ERRATA QA Contact: Martin Coufal <mcoufal>
Severity: medium Docs Contact: Petr Hybl <phybl>
Priority: unspecified    
Version: 8.3CC: ashankar, codonell, dj, fweimer, jcoopman, jvaldez, mcoufal, mnewsome, pfrankli, phybl, sipoyare, skolosov, thomas.russell
Target Milestone: rcKeywords: Bugfix, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: glibc-2.28-186.el8 Doc Type: Bug Fix
Doc Text:
.`pthread_once()` in glibc has been fixed to correctly support C++ exceptions Previously, the `pthread_once()` implementation could result in a hang when using pass:q[`libstdc++`] library functions. For example pass:q[`libstdc++`]'s `std::call_once()` called a function that threw an exception which would result in a hang. With this update, `pthread_once()` is fixed and no longer hangs when an exception is thrown.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-10 15:17:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paulo Andrade 2021-09-23 15:09:26 UTC
Customer has code that relies on exception leaving a pthread_once function.
After patches for bz#1163509 the code stopped working.

Sample reproducer:
"""
#include <cstdlib>
#include <iostream>
#include <mutex>

namespace
{
std::once_flag flag;
} // close anonymous namespace

void mayThrow(bool willThrow = false)
{
    if (willThrow)
    {
        throw std::runtime_error("Threw error");
    }
}

int main()
{
    try
    {
        std::call_once(flag, mayThrow, true);
    }
    catch (const std::exception& e)
    {
        std::cout << "Threw exception: " << e.what() << '\n';
    }

    std::call_once(flag, mayThrow, false);
    std::cout << "Succeeded" << std::endl;
    return EXIT_SUCCESS;
}
"""

"""
g++ -lpthread pthread_once_test.cpp -o pthread-once-test
"""

Expected output, that happens in rhel7 before glibc-2.17-288.el7.x86_64 is:
"""
$ ./pthread-once-test
Threw exception: Threw error
Succeeded
"""

Current behavior is a hang:
"""
$ ./pthread-once-test
Threw exception: Threw error
"""

The gcc bug report is likely related, and has the explanation:
"""
According to N2447, "If the invocation of func results in an exception being thrown, the exception is propagated to the caller and the effects are as-if this invocation of call_once did not occur."
"""

Comment 1 Florian Weimer 2021-10-01 13:48:41 UTC
This was fixed upstream for good with:

commit f0419e6a10740a672b28e112c409ae24f5e890ab
Author: Jakub Jelinek <jakub>
Date:   Thu Mar 4 15:15:33 2021 +0100

    [PATCH] pthread_once hangs when init routine throws an exception [BZ #18435]
    
    This is another attempt at making pthread_once handle throwing exceptions
    from the init routine callback.  As the new testcases show, just switching
    to the cleanup attribute based cleanup does fix the tst-once5 test, but
    breaks the new tst-oncey3 test.  That is because when throwing exceptions,
    only the unwind info registered cleanups (i.e. C++ destructors or cleanup
    attribute), when cancelling threads and there has been unwind info from the
    cancellation point up to whatever needs cleanup both unwind info registered
    cleanups and THREAD_SETMEM (self, cleanup, ...) registered cleanups are
    invoked, but once we hit some frame with no unwind info, only the
    THREAD_SETMEM (self, cleanup, ...) registered cleanups are invoked.
    So, to stay fully backwards compatible (allow init routines without
    unwind info which encounter cancellation points) and handle exception throwing
    we actually need to register the pthread_once cleanups in both unwind info
    and in the THREAD_SETMEM (self, cleanup, ...) way.
    If an exception is thrown, only the former will happen and we in that case
    need to also unregister the THREAD_SETMEM (self, cleanup, ...) registered
    handler, because otherwise after catching the exception the user code could
    call deeper into the stack some cancellation point, get cancelled and then
    a stale cleanup handler would clobber stack and probably crash.
    If a thread calling init routine is cancelled and unwind info ends before
    the pthread_once frame, it will be cleaned up through self->cleanup as
    before.  And if unwind info is present, unwind_stop first calls the
    self->cleanup registered handler for the frame, then it will call the
    unwind info registered handler but that will already see __do_it == 0
    and do nothing.

This is probably not a straightforward backport, but as far as I recall it is backportable in principle.

Comment 2 Carlos O'Donell 2021-10-01 13:57:54 UTC
Paulo,

I'm reviewing this for inclusion in 8.6, but I'd like to know exactly which releases need fixing?

For example 8.6 will likely be the next EUS branch that customers would move to when they migrate from RHEL7 EUS releases.

Comment 3 Paulo Andrade 2021-10-01 14:09:14 UTC
Hi Carlos,

I see Thomas is already on CC of the bug, and might provide extra input.

For the moment I believe the earlier the better. Customer is I believe
stuck with rhel 7.6 glibc, and now porting to rhel 8 there is no option
other than having the patch backported, unless some workaround could be
found.

Comment 4 Thomas Russell 2021-10-01 14:37:37 UTC
Indeed - we are currently targeting RHEL 8.3 as the update target from RHEL 7.6, and this is currently an issue for us in upgrading. So we would be looking for a viable workaround or a backport.

Comment 5 Carlos O'Donell 2021-10-01 15:04:20 UTC
(In reply to Thomas Russell from comment #4)
> Indeed - we are currently targeting RHEL 8.3 as the update target from RHEL
> 7.6, and this is currently an issue for us in upgrading. So we would be
> looking for a viable workaround or a backport.

There isn't an easy workaround for this, we're going to have to backport to get you the fixes.

However, you are out of support in RHEL 7.6 (modulo very specific SAP SKUs), you'd really want to upgrade to RHEL 7.9 with ELS.

Likewise RHEL 8.3 has no EUS, so you should be upgrading to RHEL 8.4 where we a longer runway for getting you fixes that you need.

Have you discussed these options with anyone and the technical choices for which releases you have picked?

Comment 6 Thomas Russell 2021-10-01 15:29:47 UTC
We won't need any changes made to RHEL 7.6, since that OS has a version of GLIBC which is unaffected by the aforementioned issue.

With regard to RHEL 8.3, I've reached out to our Linux Engineering team to find out more information on why we chose this as our target rather than 8.4. However, as I understand it, RHEL 8.3 is still in support, so we can get a backport of the fix to it, is that correct?

Comment 7 Carlos O'Donell 2021-10-01 16:23:06 UTC
(In reply to Thomas Russell from comment #6)
> We won't need any changes made to RHEL 7.6, since that OS has a version of
> GLIBC which is unaffected by the aforementioned issue.

My primary concern, from an engineering perspective, is that if you upgrade to RHEL 7.9 in the future, because of a CVE, or another bug fix, that you'll *immediately* hit this issue.

I'm starting to evaluate this for inclusion in RHEL 7.9 and RHEL 8.6 (and we can discuss RHEL 8.4 after the conclusion of your review within you engineering teams).

> With regard to RHEL 8.3, I've reached out to our Linux Engineering team to
> find out more information on why we chose this as our target rather than
> 8.4. However, as I understand it, RHEL 8.3 is still in support, so we can
> get a backport of the fix to it, is that correct?

No, RHEL 8.3 is completely out of support, RHEL 8.4 is the most recently released, and also happens to be (because it's an even release) a releases with available Extended Update Support until Mid-2025.[1]

If you're asking for advice, I'd aim for RHEL 8.4 and try hard for RHEL 8.6 (the next EUS). That would give you the longest support window.

Thomas, I've reached out to my technical support product manager to engage with CITADEL to make sure you have the best success given whatever constraints you're operating under.

[1] https://access.redhat.com/support/policy/updates/errata

Comment 29 errata-xmlrpc 2022-05-10 15:17:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glibc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2005