Bug 1621927
| Summary: | glibc: [RFE][LLNL 7.7 Bug] Implement RTLD_PARENT for glibc. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Ben Woodard <woodard> | ||||
| Component: | glibc | Assignee: | glibc team <glibc-bugzilla> | ||||
| Status: | CLOSED UPSTREAM | QA Contact: | qe-baseos-tools-bugs | ||||
| Severity: | low | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 8.2 | CC: | ashankar, codonell, dj, fweimer, mgrondona, mnewsome, pfrankli, tgummels, woodard | ||||
| Target Milestone: | rc | Keywords: | FutureFeature, Triaged | ||||
| Target Release: | 8.2 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-01-20 14:45:23 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1599298 | ||||||
| Attachments: |
|
||||||
|
Description
Ben Woodard
2018-08-23 22:24:27 UTC
This problem has also cropped up when writing custom pam modules.
The original problem report was (in hopes that this makes it easier to understand):
Got a DSO problem that I think there *must* be a better way to solve.
I have a dlopened module in a main program that itself uses a library which links against Lua.
The library is used to open Lua scripts which serve as configuration. The Lua script calls Lua's `require` function which itself dlopens a C Lua module
That Lua C module gets an error from ld.so "can't find symbol lua_gettop" which is a symbol from liblua.so
liblua is linked to the library which is loading the lua script
The only way around this I've found so far is to dlopen(liblua.so) from the module of the first part with RTLD_GLOBAL to force the liblua symbols global for the program so that they are visible to libraries it dlopens
seems like there should be a simpler way
I've run into this problem in the past and only figured it out far enough to use the dlopen() trick
If having real program names helps, it is flux-broker->dlopen("sched.so")->links_with("librdl.so")->lua_loadfile ("rdl.lua")->dlopen("cpuset.so")->"undefined symbol lua_gettop"
librdl.so is linked with liblua.so
when librdl is used outside of a dlopened module, symbol resolution works fine
sched.so is dlopened with RTLD_LOCAL|RTLD_NOW|RTLD_DEEPBIND
we can't change *that* dlopen to RTLD_GLOBAL because symbols in the modules loaded by the flux-broker process are the same
It isn't used by the main program, only linked to librdl which itself is linked to it sched.so
The relevant part of the LD_DEBUG output is:
28282: relocation processing: /home/ben/Work/DL-link/test/lib/libd.so
28282: symbol=_ITM_deregisterTMCloneTable; lookup in file=./main [0]
28282: symbol=_ITM_deregisterTMCloneTable; lookup in file=/lib64/libdl.so.2 [0]
28282: symbol=_ITM_deregisterTMCloneTable; lookup in file=/lib64/libc.so.6 [0]
28282: symbol=_ITM_deregisterTMCloneTable; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
28282: symbol=_ITM_deregisterTMCloneTable; lookup in file=/home/ben/Work/DL-link/test/lib/libd.so [0]
28282: symbol=_ITM_deregisterTMCloneTable; lookup in file=/lib64/libdl.so.2 [0]
28282: symbol=_ITM_deregisterTMCloneTable; lookup in file=/lib64/libc.so.6 [0]
28282: symbol=_ITM_deregisterTMCloneTable; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
28282: symbol=__gmon_start__; lookup in file=./main [0]
28282: symbol=__gmon_start__; lookup in file=/lib64/libdl.so.2 [0]
28282: symbol=__gmon_start__; lookup in file=/lib64/libc.so.6 [0]
28282: symbol=__gmon_start__; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
28282: symbol=__gmon_start__; lookup in file=/home/ben/Work/DL-link/test/lib/libd.so [0]
28282: symbol=__gmon_start__; lookup in file=/lib64/libdl.so.2 [0]
28282: symbol=__gmon_start__; lookup in file=/lib64/libc.so.6 [0]
28282: symbol=__gmon_start__; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
28282: symbol=_ITM_registerTMCloneTable; lookup in file=./main [0]
28282: symbol=_ITM_registerTMCloneTable; lookup in file=/lib64/libdl.so.2 [0]
28282: symbol=_ITM_registerTMCloneTable; lookup in file=/lib64/libc.so.6 [0]
28282: symbol=_ITM_registerTMCloneTable; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
28282: symbol=_ITM_registerTMCloneTable; lookup in file=/home/ben/Work/DL-link/test/lib/libd.so [0]
28282: symbol=_ITM_registerTMCloneTable; lookup in file=/lib64/libdl.so.2 [0]
28282: symbol=_ITM_registerTMCloneTable; lookup in file=/lib64/libc.so.6 [0]
28282: symbol=_ITM_registerTMCloneTable; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
28282: symbol=__cxa_finalize; lookup in file=./main [0]
28282: symbol=__cxa_finalize; lookup in file=/lib64/libdl.so.2 [0]
28282: symbol=__cxa_finalize; lookup in file=/lib64/libc.so.6 [0]
28282: binding file /home/ben/Work/DL-link/test/lib/libd.so [0] to /lib64/libc.so.6 [0]: normal symbol `__cxa_finalize' [GLIBC_2.2.5
]
28282: symbol=libe_func; lookup in file=./main [0]
28282: symbol=libe_func; lookup in file=/lib64/libdl.so.2 [0]
28282: symbol=libe_func; lookup in file=/lib64/libc.so.6 [0]
28282: symbol=libe_func; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
28282: symbol=libe_func; lookup in file=/home/ben/Work/DL-link/test/lib/libd.so [0]
28282: symbol=libe_func; lookup in file=/lib64/libdl.so.2 [0]
28282: symbol=libe_func; lookup in file=/lib64/libc.so.6 [0]
28282: symbol=libe_func; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
28282: /home/ben/Work/DL-link/test/lib/libd.so: error: symbol lookup error: undefined symbol: libe_func (fatal)
28282:
28282: file=/home/ben/Work/DL-link/test/lib/libd.so [0]; destroying link map
From that you can see that it is never searching the local namespace from which the call to libe_func() is called. That call is being made libd.so's start_libd() function.
(In reply to Ben Woodard from comment #0) > However, if one of those required libraries dlopen's a library then the > local linkmap from which this library is searching is not searched. This is behaving exactly as expected. When libe dlopen's libd with RTLD_LOCAL, then libd has it's own lookup scope that does *not* include libe. This is the semantics of RTLD_LOCAL. If you do not want to link libd against libe, then you must load libe RTLD_GLOBAL. Youd on't explain why this is not an option. I expect that you don't want to pollute the global lookup scope with the lua symbols. A future alternative here will be dlmopen, since you could open a new namespace and then load lua in that namespace with RTLD_GLOBAL, and still avoid the pollution of the normal base namespace. I'm reviewing Collabora's patches for dlmopen upstream, so it looks like glibc 2.29 might have some interesting support for this. The case of the PAM modules is more interesting, but still the same case. If librdl.so is going to use LUA and it expects to be loaded with RTLD_LOCAL, then it must *reload* lua with RTLD_GLOBAL, and this is called a "promotion" in which case ld.so should promote LUA to RTLD_GLOBAL binding. I don't see any problem here. We have global scopes, and we have local scopes. You have to look at how they interact and use them to solve your scoping problems. Here you want to isolate lua with a local scope, but at the same time the lua community wants to use global scope binding to avoid lua modules depending directly on the lua DSO. So this conflicts with developer usage. I believe another solution might be to implement RTLD_PARENT and RTLD_GROUP from Solaris to have better control over binding. With RTLD_PARENT the caller of dlopen has it's symbols made available to the loaded scope. So lua would make it's symbols available to plugins, but no deeper. With RTLD_GROUP you can make a closed set of symbol deps. I'll leave this open for a while in case you want to discuss, but it will be closed as NOTABUG, or you can change it to an RFE for RTLD_PARENT. I'm retitling this to indicate a desire to have RTLD_PARENT which would allow LUA 's language to load other DSOs and share it's own symbols with them for relocation, but not for subsequent dlsym/dlvsym access. If that's not going to help your particular use case with LUA, then please provide a complete example for the use case you're trying to support. Given the complexity of implementing RTLD_PARENT this must be tracked upstream and fixed there first. Once those semantics are fixed upstream then they will be included in RHEL. I've filed the following upstream bug for the glibc team to use: https://sourceware.org/bugzilla/show_bug.cgi?id=25421 I'm marking this bug CLOSED/UPSTREAM. We are going to track this upstream. |