RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2189923 - glibc: Lazy binding during dlclose may fail unnecessarily
Summary: glibc: Lazy binding during dlclose may fail unnecessarily
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: glibc
Version: 9.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Carlos O'Donell
QA Contact: Sergey Kolosov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-26 12:58 UTC by Paulo Andrade
Modified: 2023-11-07 10:10 UTC (History)
18 users (show)

Fixed In Version: glibc-2.34-71.el9
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-07 08:37:51 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
symbol-resolution-segfault-master.zip (6.07 KB, application/zip)
2023-04-26 12:58 UTC, Paulo Andrade
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-155849 0 None None None 2023-04-26 13:01:13 UTC
Red Hat Product Errata RHBA-2023:6582 0 None None None 2023-11-07 08:38:14 UTC
Sourceware 30425 0 P2 RESOLVED Symbol lookup during dlclose may fail unnecessarily 2023-06-05 16:10:15 UTC

Description Paulo Andrade 2023-04-26 12:58:53 UTC
Created attachment 1960126 [details]
symbol-resolution-segfault-master.zip

Customer describe his findings in the README.md file.

Basically, customer has a dlopened shared library that defines a global
C++ object in a "child" library.

One of the source code files is compiled without optimization, that creates
a destructor function.

The other is compiled with optimizations, what inlines the destructor.
When unloading the libraries.

Basically there is a incorrect dependency on destructors, or wrong calls.

We suggested he user to make sure to build everything with the same
compiler options, but they said it is not under their control. This is
because the actual crashes, not the sample, is caused by STL and they do
not have control on those. Due to this, customer said they are needing
to compile disabling inlining of everything.

For the sample reproducer one possible workaround would be to add
__attribute__((noinline)) to Member::~Member(), but as customer said,
they do not have control of the STL source code.

Do you have some suggestion of possible workarounds, or better explain
the issue to the customer?

Comment 3 Florian Weimer 2023-04-26 17:03:40 UTC
This looks like this needs an upstream discussion. Would the customer be willing to raise this on libc-alpha, along with their reproducer? This is great stuff, it would be a loss to hide it in a private bug somewhere.

It's possible to avoid the problematic interposition with -Wl,-Bsymbolic, not just with -Wl,-z,now. (-flto -fno-semantic-interposition should work as well, but I haven't tested it yet.)

There are a bunch of oddities here. The l_removed check in elf/dl-lookup.c was effectively introduced to fix this upstream bug: https://sourceware.org/bugzilla/show_bug.cgi?id=821 But it does not contain much information why this was needed. There should not be any harm to bind about-to-be unloaded modules to about-to-be-unloaded modules in principle, but there might be some subtle corner-cases involving recursive dlclose calls.

It's unfortunate that we flag the definition of _ZN6MemberD1Ev as weak in libparent.so:

$ eu-readelf --symbols=.dynsym libparent.so | grep _ZN6MemberD1Ev
   12: 0000000000001186     21 FUNC    WEAK   DEFAULT       12 _ZN6MemberD1Ev

This comes from the weak symbols in parent_pic.o, which are used to implement C++ vague linkage. If we had a non-weak definition here, at least we'd get a proper ld.so error message, instead of the crash due to a jump to address zero.

We should be clear that given the product life-cycle, this issue will not be addressed in Red Hat Enterprise Linux 7. We need to see what we can do about later releases, but even 8 will be tough, especially if the changes required to support this scenario turn out to be invasive.

Comment 4 Paulo Andrade 2023-04-26 17:24:49 UTC
Many thanks again for the feedback Florian!

Customer allowed making the reproducer public.

I preferred to make the bug report private at first due to uncertainty
of possible workarounds, if any, or if the response would just be to
not mix compiler options.

Indeed the 3 suggested workarounds correct the issue at least for the
sample reproducer.

I will let the customer know about the possible workarounds and suggestion
to raise it on glibc-alpha.

Comment 5 Florian Weimer 2023-05-05 20:20:35 UTC
I'm treating this mostly as a glibc bug. The weak symbol is a GCC issue, but not the core problem.

The patch below (against upstream) fixes the reproducer.  Admittedly it's a bit silly.

diff --git a/elf/dl-lookup.c b/elf/dl-lookup.c
index 05f36a2507..ecc1896c9d 100644
--- a/elf/dl-lookup.c
+++ b/elf/dl-lookup.c
@@ -366,8 +366,10 @@ do_lookup_x (const char *undef_name, unsigned int new_hash,
       if ((type_class & ELF_RTYPE_CLASS_COPY) && map->l_type == lt_executable)
 	continue;
 
-      /* Do not look into objects which are going to be removed.  */
-      if (map->l_removed)
+      /* Do not look into objects which are going to be removed.
+	 References from a map that is under removal are allowed, to
+	 enable lazy binding during dlclose.  */
+      if (map->l_removed && !undef_map->l_removed)
 	continue;
 
       /* Print some debugging info if wanted.  */

I think it's quite low-risk. If we are worried about undef_map being NULL for some reason, we could a check for that after backporting.

Comment 20 errata-xmlrpc 2023-11-07 08:37:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glibc bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6582


Note You need to log in before you can comment on or make changes to this bug.