Bug 1538776

Summary: Continue to support threads using PTHREAD_STACK_MIN incorrectly.
Product: Red Hat Enterprise Linux 7 Reporter: Paulo Andrade <pandrade>
Component: glibcAssignee: glibc team <glibc-bugzilla>
Status: CLOSED DUPLICATE QA Contact: qe-baseos-tools-bugs
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.4CC: ashankar, codonell, fweimer, jwright, mnewsome, pandrade, pfrankli
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-29 16:21:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Paulo Andrade 2018-01-25 19:44:38 UTC
This should be a known issue, as it is corrected with glibc-2.17-221.el7,
but the later is not yet available to customers.
The issue should be the backport of
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=b52b0d793dcb226ecb0ecca1e672ca265973233c
and the latest patches (glibc-2.17-221.el7) appear to correct it.

Comment 3 Florian Weimer 2018-01-25 19:56:13 UTC
Is LD_BIND_NOW=1 a suitable workaround?  Disabling lazy binding avoids the dynamic linker trampoline, and thus its extended stack usage.

Comment 4 Paulo Andrade 2018-01-25 20:04:24 UTC
For the moment I told the user to downgrade to glibc-2.17-196.el7
I will also suggest LD_BIND_NOW=1

Comment 6 Carlos O'Donell 2018-01-25 20:20:17 UTC
(In reply to Paulo Andrade from comment #0)
> This should be a known issue, as it is corrected with glibc-2.17-221.el7,
> but the later is not yet available to customers.
> The issue should be the backport of
> https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;
> h=b52b0d793dcb226ecb0ecca1e672ca265973233c
> and the latest patches (glibc-2.17-221.el7) appear to correct it.

The issue is more complicated than this.

Customers have used PTHREAD_STACK_MIN incorrectly over the years. The minimum stack is only guaranteed to start the thread, and nothing else. All other allocations required by the thread function must be added to the minimum stack. We have seen two cases now where small-stack threads have failed because they did not include enough stack space and instead used PTHREAD_STACK_MIN, hoping there would be enough slack there to allow them to operate. In particular, pthread_cancel() is not guaranteed to complete in a PTHREAD_STACK_MIN sized stack.

Having said this, we will be fixing this issue in RHEL 7.5, using a two-pronged strategy, first by changing some of the accounting for guard pages, and second by shifting some of the normal pthread_cancel() work back to the dynamic loader. Those two options should be enough to allow existing RHEL 7.5 applications like those in the example, to keep running. We do this because we value stability and backwards compatibility in RHEL. However, this should be a warning to application authors that they have relied upon leftover space in PTHREAD_STACK_MIN to run their thread routines, and that is not a guaranteed assumption. That leftover space belongs to the implementation to use.

Comment 8 Florian Weimer 2018-01-29 16:21:55 UTC

*** This bug has been marked as a duplicate of bug 1527904 ***