Bug 2189923
| Summary: | glibc: Lazy binding during dlclose may fail unnecessarily | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Paulo Andrade <pandrade> | ||||
| Component: | glibc | Assignee: | Carlos O'Donell <codonell> | ||||
| Status: | VERIFIED --- | QA Contact: | Sergey Kolosov <skolosov> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 9.3 | CC: | alanm, amike, ashankar, brclark, casantos, codonell, dj, dmalcolm, fweimer, jakub, jwright, mkielian, mkolbas, ohudlick, pandrade, pfrankli, sipoyare, skolosov | ||||
| Target Milestone: | rc | Keywords: | Bugfix, Triaged | ||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | glibc-2.34-71.el9 | Doc Type: | No Doc Update | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | Type: | Bug | |||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
This looks like this needs an upstream discussion. Would the customer be willing to raise this on libc-alpha, along with their reproducer? This is great stuff, it would be a loss to hide it in a private bug somewhere. It's possible to avoid the problematic interposition with -Wl,-Bsymbolic, not just with -Wl,-z,now. (-flto -fno-semantic-interposition should work as well, but I haven't tested it yet.) There are a bunch of oddities here. The l_removed check in elf/dl-lookup.c was effectively introduced to fix this upstream bug: https://sourceware.org/bugzilla/show_bug.cgi?id=821 But it does not contain much information why this was needed. There should not be any harm to bind about-to-be unloaded modules to about-to-be-unloaded modules in principle, but there might be some subtle corner-cases involving recursive dlclose calls. It's unfortunate that we flag the definition of _ZN6MemberD1Ev as weak in libparent.so: $ eu-readelf --symbols=.dynsym libparent.so | grep _ZN6MemberD1Ev 12: 0000000000001186 21 FUNC WEAK DEFAULT 12 _ZN6MemberD1Ev This comes from the weak symbols in parent_pic.o, which are used to implement C++ vague linkage. If we had a non-weak definition here, at least we'd get a proper ld.so error message, instead of the crash due to a jump to address zero. We should be clear that given the product life-cycle, this issue will not be addressed in Red Hat Enterprise Linux 7. We need to see what we can do about later releases, but even 8 will be tough, especially if the changes required to support this scenario turn out to be invasive. Many thanks again for the feedback Florian! Customer allowed making the reproducer public. I preferred to make the bug report private at first due to uncertainty of possible workarounds, if any, or if the response would just be to not mix compiler options. Indeed the 3 suggested workarounds correct the issue at least for the sample reproducer. I will let the customer know about the possible workarounds and suggestion to raise it on glibc-alpha. I'm treating this mostly as a glibc bug. The weak symbol is a GCC issue, but not the core problem.
The patch below (against upstream) fixes the reproducer. Admittedly it's a bit silly.
diff --git a/elf/dl-lookup.c b/elf/dl-lookup.c
index 05f36a2507..ecc1896c9d 100644
--- a/elf/dl-lookup.c
+++ b/elf/dl-lookup.c
@@ -366,8 +366,10 @@ do_lookup_x (const char *undef_name, unsigned int new_hash,
if ((type_class & ELF_RTYPE_CLASS_COPY) && map->l_type == lt_executable)
continue;
- /* Do not look into objects which are going to be removed. */
- if (map->l_removed)
+ /* Do not look into objects which are going to be removed.
+ References from a map that is under removal are allowed, to
+ enable lazy binding during dlclose. */
+ if (map->l_removed && !undef_map->l_removed)
continue;
/* Print some debugging info if wanted. */
I think it's quite low-risk. If we are worried about undef_map being NULL for some reason, we could a check for that after backporting.
|
Created attachment 1960126 [details] symbol-resolution-segfault-master.zip Customer describe his findings in the README.md file. Basically, customer has a dlopened shared library that defines a global C++ object in a "child" library. One of the source code files is compiled without optimization, that creates a destructor function. The other is compiled with optimizations, what inlines the destructor. When unloading the libraries. Basically there is a incorrect dependency on destructors, or wrong calls. We suggested he user to make sure to build everything with the same compiler options, but they said it is not under their control. This is because the actual crashes, not the sample, is caused by STL and they do not have control on those. Due to this, customer said they are needing to compile disabling inlining of everything. For the sample reproducer one possible workaround would be to add __attribute__((noinline)) to Member::~Member(), but as customer said, they do not have control of the STL source code. Do you have some suggestion of possible workarounds, or better explain the issue to the customer?