Bug 1300049
| Summary: | dlerror () returns NULL after dlsym (RTLD_NEXT) of a non-existent symbol | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Joe Wright <jwright> | |
| Component: | glibc | Assignee: | Florian Weimer <fweimer> | |
| Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-tools-bugs | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 7.3 | CC: | ashankar, cww, fweimer, mnewsome, pfrankli | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1333945 (view as bug list) | Environment: | ||
| Last Closed: | 2016-05-12 14:14:22 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1333945 | |||
| Bug Blocks: | 1203710 | |||
Looking at _dl_lookup_symbol_x, this may indeed be a bug due to the way RTLD_NEXT is implemented: it continues after lookup errors, but this way, it never signals the error. On the other hand, the error is deliberately masked here for the RTLD_NEXT case (where skip_map == NULL):
858 if (__glibc_unlikely (current_value.s == NULL))
859 {
860 if ((*ref == NULL || ELFW(ST_BIND) ((*ref)->st_info) != STB_WEAK)
861 && skip_map == NULL
862 && !(GLRO(dl_debug_mask) & DL_DEBUG_UNUSED))
863 {
864 /* We could find no value for a strong reference. */
865 const char *reference_name = undef_map ? undef_map->l_name : "";
866 const char *versionstr = version ? ", version " : "";
867 const char *versionname = (version && version->name
868 ? version->name : "");
869
870 /* XXX We cannot translate the message. */
871 _dl_signal_cerror (0, DSO_FILENAME (reference_name),
872 N_("symbol lookup error"),
873 make_string ("undefined symbol: ", undef_name,
874 versionstr, versionname));
875 }
876 *ref = NULL;
877 return 0;
878 }
This was carried over from _dl_lookup_symbol_skip when the separate function was removed in upstream commit bdf4a4f1eabb2e085b0610b53bb37b5263f4728d. The original implementation of _dl_lookup_symbol_skip in commit 84384f5b6aaa622236ada8c9a7ff51f40b91fc20 did not have error reporting, either. Why this is so is unclear to me.
Solaris documentation implies that the dlerror return value changes if dlsym with RTLD_NEXT is unsuccessful. Therefore, I think we should change glibc behavior.
The return of NULL from dlsym or dlvsym is sufficient to indicate the symbol was not found.
Yet, there are two more cases of interest that I can see:
(1) Return alternate errors other than "not found"
This is one of the only reasonable reasons to want this fixed. The functions have run into a serious internal error and reporting it can be done via dlerror.
(2) Support NULL symbols.
One might argue that this doesn't support distinguishing between a true "null" symbol, a symbol whose address is 0x0, versus a not-found symbol, and that's true.
At present, such a symbol can only, as far as I know, be constructed artificially via a linker script (as a NOTYPE symbol via PROVIDE e.g. PROVIDE(null_symbol = 0x0);) or via special section directives and assembly.
Relocations against such symbols will fail today (abort ld.so) because the dynamic loader cannot handle such true "null" symbols.
e.g.
11127: symbol=null_symbol; lookup in file=./test [0]
11127: symbol=null_symbol; lookup in file=./libinterposer.so [0]
11127: symbol=null_symbol; lookup in file=/lib64/libdl.so.2 [0]
11127: symbol=null_symbol; lookup in file=/lib64/libc.so.6 [0]
11127: symbol=null_symbol; lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
11127: ./libinterposer.so: error: symbol lookup error: undefined symbol: null_symbol (fatal)
./test: symbol lookup error: ./libinterposer.so: undefined symbol: null_symbol
readelf -a -W libinterposer.so | grep null
0000000000600fd8 0000000d00000006 R_X86_64_GLOB_DAT 0000000000000000 null_symbol + 0
13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT ABS null_symbol
50: 0000000000000000 0 NOTYPE GLOBAL DEFAULT ABS null_symbol
Even if we fix the dynamic loader, the result returned from dlsym will be non-null because it will have the load offset added. Therefore the only way to get a true "null" symbol is to enable low addresses, and map the DSO at address zero for the symbol to exist. I see no useful reason to do this in a sensible application. Therefore if one sees a null return from dlsym et. al. then it means the symbol was not found, and barring (1), it really means "symbol not found".
It is most likely a QoI issue that we should fix, with 'not found' being returned in dlerror() being the highest quality implementation. However, this has been the case forever on Linux, and I expect the man page text is a holdover from Solaris or AIX where it might have been possible to get a valid NULL symbol. It's almost never the case that you'll have a valid NULL symbol in Linux (at lest no easily), but rather than change the man page we should adjust the dlsym and dlvsym code to improve the implementation. This has to go through upstream, and it will change the semantic behaviour of dlsym and dlvsym, which might impact some applciations. This needs testing on dlmopen also to test all the code paths. This is not going to fit into a rhel-7.3 timeframe, so this will have to be rhel-7.4 or later. Patch posted upstream for review: https://sourceware.org/ml/libc-alpha/2016-02/msg00172.html This bug fix has the potential to break Address Sanitizer: https://llvm.org/bugs/show_bug.cgi?id=27310 I think it's not really defined what ASAN is doing (you need to have a working malloc when you call dlsym), but the question is if this kind of breakage is worth fixing this bug. Typical error message:
==10293==AddressSanitizer CHECK failed: ../../../../libsanitizer/asan/asan_rtl.cc:556 "((!asan_init_is_running && "ASan init calls itself!")) != (0)" (0x0, 0x0)
<empty stack>
Unfortunately, we cannot address this issue in Red Hat Enterprise Linux 7 because Address Sanitizer (ASAN) depends on dlsym (RTLD_NEXT) not providing an error message (see comment 12). This affects both the Address Sanitizer version in GCC, and the version in LLVM/Clang. There is also at least one more application which is confused by the more accurate dlerror reporting (fakeoot, often used for building software packages). This means that the risk of introducing regressions is just too high to implement this change in Red Hat Enterprise Linux 7. We already address this issue in upstream glibc, so future versions of Red Hat Enterprise Linux will very likely address this issue. |
Description of problem: - Shouldn't the dlerror() return something meaningful instead of 0 after dlsym a non existent symbol? The current behavior also contradicts with the man page. Version-Release number of selected component (if applicable): How reproducible: - consistently Steps to Reproduce: This prints all zeros: std::cout << (void*)dlerror() << std::endl; std::cout << dlsym(RTLD_NEXT, "does_not_exist") << std::endl; std::cout << (void*)dlerror() << std::endl; std::cout << dlvsym(RTLD_NEXT, "pthread_cond_timedwait", "DOES_NOT_EXIST") << std::endl; std::cout << (void*)dlerror() << std::endl; /// a.C #include <iostream> #include <dlfcn.h> int main() { std::cout << (void*)dlerror() << std::endl; std::cout << dlsym(RTLD_NEXT, "does_not_exist") << std::endl; std::cout << (void*)dlerror() << std::endl; std::cout << dlvsym(RTLD_NEXT, "pthread_cond_timedwait", "DOES_NOT_EXIST") << std::endl; std::cout << (void*)dlerror() << std::endl; } Run commands: g++ a.C -ldl ./a.out $ ldd -r ./a.out linux-vdso.so.1 => (0x00007ffccbebe000) libdl.so.2 => /lib64/libdl.so.2 (0x0000003fa5400000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003fab400000) libm.so.6 => /lib64/libm.so.6 (0x0000003fa5800000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003fa8c00000) libc.so.6 => /lib64/libc.so.6 (0x0000003fa4c00000) /lib64/ld-linux-x86-64.so.2 (0x0000003fa4800000) [alanm@vmw102 C01555171]$ ./a.out 0 0 0 0 0 LD_DEBUG=symbols,bindings ./a.out 57410: symbol=_res; lookup in file=./a.out [0] 57410: symbol=_res; lookup in file=/lib64/libdl.so.2 [0] 57410: symbol=_res; lookup in file=/usr/lib64/libstdc++.so.6 [0] 57410: symbol=_res; lookup in file=/lib64/libm.so.6 [0] 57410: symbol=_res; lookup in file=/lib64/libgcc_s.so.1 [0] 57410: symbol=_res; lookup in file=/lib64/libc.so.6 [0] 57410: binding file /lib64/libc.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `_res' [GLIBC_2.2.5] 57410: symbol=_IO_file_close; lookup in file=./a.out [0] 57410: symbol=_IO_file_close; lookup in file=/lib64/libdl.so.2 [0] 57410: symbol=_IO_file_close; lookup in file=/usr/lib64/libstdc++.so.6 [0] 57410: symbol=_IO_file_close; lookup in file=/lib64/libm.so.6 [0] 57410: symbol=_IO_file_close; lookup in file=/lib64/libgcc_s.so.1 [0] 57410: symbol=_IO_file_close; lookup in file=/lib64/libc.so.6 [0] 57410: binding file /lib64/libc.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `_IO_file_close' [GLIBC_2.2.5] 57410: symbol=rpc_createerr; lookup in file=./a.out [0] 57410: symbol=rpc_createerr; lookup in file=/lib64/libdl.so.2 [0] 57410: symbol=rpc_createerr; lookup in file=/usr/lib64/libstdc++.so.6 [0] 57410: symbol=rpc_createerr; lookup in file=/lib64/libm.so.6 [0] 57410: symbol=rpc_createerr; lookup in file=/lib64/libgcc_s.so.1 [0] 57410: symbol=rpc_createerr; lookup in file=/lib64/libc.so.6 [0] ..... Where are you experiencing the behavior? What environment? Shouldn't the dlerror() return something meaningful instead of 0 after dlsym a non existent symbol? The current behavior also contradicts with the man page. Actual results: - returns 0 Expected results: - The posix spec says: If handle does not refer to a valid symbol table handle or if the symbol named by name cannot be found in the symbol table associated with handle, dlsym() shall return a null pointer. Additional info: I'm not sure about the expected behavior but I do see RTDL_DEFAULT behaving: RTDL_NEXT [...] 11751: binding file /lib64/libdl.so.2 [0] to /lib64/libc.so.6 [0]: normal symbol `_dl_sym' [GLIBC_PRIVATE] 11751: symbol=does_not_exist; lookup in file=/lib64/libdl.so.2 [0] 11751: symbol=does_not_exist; lookup in file=/usr/lib64/libstdc++.so.6 [0] 11751: symbol=does_not_exist; lookup in file=/lib64/libm.so.6 [0] 11751: symbol=does_not_exist; lookup in file=/lib64/libgcc_s.so.1 [0] 11751: symbol=does_not_exist; lookup in file=/lib64/libc.so.6 [0] 11751: symbol=does_not_exist; lookup in file=/lib64/ld-linux-x86-64.so.2 [0] 0 0 RTDL_DEFAULT 11790: binding file /lib64/libdl.so.2 [0] to /lib64/libc.so.6 [0]: normal symbol `_dl_sym' [GLIBC_PRIVATE] 11790: symbol=does_not_exist; lookup in file=./a.out [0] 11790: symbol=does_not_exist; lookup in file=/lib64/libdl.so.2 [0] 11790: symbol=does_not_exist; lookup in file=/usr/lib64/libstdc++.so.6 [0] 11790: symbol=does_not_exist; lookup in file=/lib64/libm.so.6 [0] 11790: symbol=does_not_exist; lookup in file=/lib64/libgcc_s.so.1 [0] 11790: symbol=does_not_exist; lookup in file=/lib64/libc.so.6 [0] 11790: symbol=does_not_exist; lookup in file=/lib64/ld-linux-x86-64.so.2 [0] 11790: ./a.out: error: symbol lookup error: undefined symbol: does_not_exist (fatal) 0 [...] 11793: symbol=free; lookup in file=/lib64/libc.so.6 [0] 11793: binding file /lib64/libdl.so.2 [0] to /lib64/libc.so.6 [0]: normal symbol `free' [GLIBC_2.2.5] 0x23b80c0 Didn't find anything on quick googling https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430732 Even in the latest version it's still noted that they > .. are reserved for future use as special values that applications may be allowed to use for handle. http://pubs.opengroup.org/onlinepubs/9699919799/ The posix spec says: If handle does not refer to a valid symbol table handle or if the symbol named by name cannot be found in the symbol table associated with handle, dlsym() shall return a null pointer. More detailed diagnostic information shall be available through dlerror(). and the dlerror() page says: If no dynamic linking errors have occurred since the last invocation of dlerror(), dlerror() shall return NULL. If successful, dlerror() shall return a null-terminated character string; otherwise, NULL shall be returned. In this specific case, the symbol does not exist, so it returns 0, which is good. But why dlerror() also returns NULL is a bit confusing.