RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1740039 - glibc: wrong handling of dlopen() of a nonexistent/broken library, dl_tls_max_dtv_idx incremented too early
Summary: glibc: wrong handling of dlopen() of a nonexistent/broken library, dl_tls_max...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: glibc
Version: 7.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 7.8
Assignee: Florian Weimer
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
: 1670620 (view as bug list)
Depends On:
Blocks: 1599298 1754591
TreeView+ depends on / blocked
 
Reported: 2019-08-12 07:27 UTC by Antonio Di Monaco
Modified: 2020-03-31 19:08 UTC (History)
16 users (show)

Fixed In Version: glibc-2.17-307.el7
Doc Type: Bug Fix
Doc Text:
Cause: An attempt to call dlopen on an ET_EXEC executable fails as expected, but also leaves the dynamic loader in an inconsistent state. Consequence: A later call to pthread_create crashes with a segmentation fault, due to inconsistent TLS data structures. Fix: A check has been added to dlopen to reject ET_EXEC executables earlier during execution, before the TLS data structures become inconsistent. Result: The dlopen failure is reported, and subsequent pthread_create calls behave as expected.
Clone Of:
Environment:
Last Closed: 2020-03-31 19:08:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0989 0 None None None 2020-03-31 19:08:51 UTC
Sourceware 16634 0 P2 RESOLVED Application calling dlopen("./a.out",...) may run into _dl_allocate_tls_init: Assertion `listp != ((void *)0)' failed! 2020-10-28 18:36:55 UTC
Sourceware 24930 0 P2 RESOLVED dlopen of PIE executable can result in _dl_allocate_tls_init assertion failure 2020-10-28 18:36:56 UTC

Description Antonio Di Monaco 2019-08-12 07:27:29 UTC
Description of problem:
A program which repeatedly is calling dlopen() to open shared libraries which are created dynamically on demand can crash after a certain number of dlopen()'s.

Version-Release number of selected component (if applicable):
glibc 2.17

How reproducible:
See test program and instructions in https://sourceware.org/bugzilla/show_bug.cgi?id=16634

Steps to Reproduce:
1. Create and compile the test program as per https://sourceware.org/bugzilla/show_bug.cgi?id=16634
2. Run the program:
# ./a.out


Actual results:
The test program aborts with:
...
61: &x = 0x7feaa003873c
62: &x = 0x7feaa003873c
63: &x = 0x7feaa003873c
Segmentation fault (core dumped)

Expected results:
The program should run as long as there are dlopen() calls, and either end after a certain number of calls (in the case of the test program, 100) or run forever and maybe reporting dlopen() call failures after resources are exhausted.

Additional info:
This issue is likely related to https://sourceware.org/bugzilla/show_bug.cgi?id=16634 "Application
calling dlopen("./a.out",...) may run into _dl_allocate_tls_init:
Assertion `listp != ((void *)0)' failed!", which is fixed in glibc 2.20.

RHEL 7 comes with glibc 2.17, and there is currently no open Red Hat bug for this upstream bug.

The upstream bug https://sourceware.org/bugzilla/show_bug.cgi?id=16634 mentions that the application fails after 64 iterations with "Assertion ... failed!" but the same program on RHEL 7 just dumps core after 64 iterations.

The upstream bug mentions that the underlying problem is that an application erroneously tries to
repeatedly call dlopen("a.out", ...). In other words: If the application is flawless, this bug
will never be encountered. In the case of SAP HANA, however, the SAP HANA software is doing on-demand code generation
which effectively generates shared libraries which are then loaded/unloaded potentially quite often. This behavior can
trigger this bug, causing potentially long time for error analysis and unnecesasry downtime in production systems.

I could not reproduce the bug on RHEL 8 GA, glibc 2.28, release 42.el8_0.1: The program exits normally even after 10.000 iterations.

Comment 5 Florian Weimer 2019-08-12 09:43:24 UTC
A repository with a test build is available here:

https://people.redhat.com/~fweimer/IQlrkw5SoVmo/

The repository file for /etc/yum.repos.d is here:

https://people.redhat.com/~fweimer/IQlrkw5SoVmo/glibc-2.17-306.el7.fweimer.bz1740039.1.repo

Would you please verify that this build fixes the original problem?  Thanks.

Comment 6 Florian Weimer 2019-08-12 14:36:03 UTC
Building the upstream test requires some changes.  See bug 1740088.  I submitted a test generalization upstream:

https://sourceware.org/ml/libc-alpha/2019-08/msg00229.html

Comment 8 Gary Case 2019-08-22 14:25:08 UTC
FYI, this bug is being moved to be a 7.9 item as our development work on glibc for RHEL 7.8 has concluded. That being said, depending on when the feedback from SAP arrives we may still be able to include this as a 7.8 item.

Comment 10 Antonio Di Monaco 2019-09-19 08:11:58 UTC
Hi,

I confirm that the patched glibc fixes the issue.

Thanks,

BR,
Antonio

Comment 12 Carlos O'Donell 2019-10-07 05:07:19 UTC
Antonio,

Testing by Red Hat has revealed that the upstream patch to fix this issue is incomplete.

Further changes to the dynamic loader were required to fix the thread-local storage issues seen in your test case scenario.

We want to have the upstream change go through enough operational hours to show that it doesn't have any further impact on dlopen, arriving signals during dlopen, etc. This means that we will not immediately be backporting these changes into a z-stream release.

As we understand it these issues impact only SAP validation, but not customer deployments. Given the limited impact on customers we want to ensure that the riskier but correct fix does not impact our joint customers. Again, we will be doing upstream testing before we do downstream deployment of the fix in RHEL.

We will work to provide a new test fix for SAP that includes what we believe is a more complete set of fixes. Please be patient while we get this ready for testing. Florian will be working on delivering the test fix for SAP.

Comment 13 Florian Weimer 2019-10-16 15:14:11 UTC
I have posted yet another upstream test fix:

  https://sourceware.org/ml/libc-alpha/2019-10/msg00491.html

This splits the TLS modid tests (which we want to backport) from the self-dlopen tests (which are not needed).

Comment 17 Sergey Kolosov 2019-11-08 13:46:34 UTC
Verified with the reproducer and glibc testsite test elf/tst-dlopen-tlsmodid.

Comment 18 Carlos O'Donell 2020-01-10 14:53:07 UTC
*** Bug 1670620 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2020-03-31 19:08:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0989


Note You need to log in before you can comment on or make changes to this bug.