Bug 1300049
Summary: | dlerror () returns NULL after dlsym (RTLD_NEXT) of a non-existent symbol | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Joe Wright <jwright> | |
Component: | glibc | Assignee: | Florian Weimer <fweimer> | |
Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-tools-bugs | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 7.3 | CC: | ashankar, cww, fweimer, mnewsome, pfrankli | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1333945 (view as bug list) | Environment: | ||
Last Closed: | 2016-05-12 14:14:22 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1333945 | |||
Bug Blocks: | 1203710 |
Description
Joe Wright
2016-01-19 20:24:48 UTC
Looking at _dl_lookup_symbol_x, this may indeed be a bug due to the way RTLD_NEXT is implemented: it continues after lookup errors, but this way, it never signals the error. On the other hand, the error is deliberately masked here for the RTLD_NEXT case (where skip_map == NULL): 858 if (__glibc_unlikely (current_value.s == NULL)) 859 { 860 if ((*ref == NULL || ELFW(ST_BIND) ((*ref)->st_info) != STB_WEAK) 861 && skip_map == NULL 862 && !(GLRO(dl_debug_mask) & DL_DEBUG_UNUSED)) 863 { 864 /* We could find no value for a strong reference. */ 865 const char *reference_name = undef_map ? undef_map->l_name : ""; 866 const char *versionstr = version ? ", version " : ""; 867 const char *versionname = (version && version->name 868 ? version->name : ""); 869 870 /* XXX We cannot translate the message. */ 871 _dl_signal_cerror (0, DSO_FILENAME (reference_name), 872 N_("symbol lookup error"), 873 make_string ("undefined symbol: ", undef_name, 874 versionstr, versionname)); 875 } 876 *ref = NULL; 877 return 0; 878 } This was carried over from _dl_lookup_symbol_skip when the separate function was removed in upstream commit bdf4a4f1eabb2e085b0610b53bb37b5263f4728d. The original implementation of _dl_lookup_symbol_skip in commit 84384f5b6aaa622236ada8c9a7ff51f40b91fc20 did not have error reporting, either. Why this is so is unclear to me. Solaris documentation implies that the dlerror return value changes if dlsym with RTLD_NEXT is unsuccessful. Therefore, I think we should change glibc behavior. The return of NULL from dlsym or dlvsym is sufficient to indicate the symbol was not found. Yet, there are two more cases of interest that I can see: (1) Return alternate errors other than "not found" This is one of the only reasonable reasons to want this fixed. The functions have run into a serious internal error and reporting it can be done via dlerror. (2) Support NULL symbols. One might argue that this doesn't support distinguishing between a true "null" symbol, a symbol whose address is 0x0, versus a not-found symbol, and that's true. At present, such a symbol can only, as far as I know, be constructed artificially via a linker script (as a NOTYPE symbol via PROVIDE e.g. PROVIDE(null_symbol = 0x0);) or via special section directives and assembly. Relocations against such symbols will fail today (abort ld.so) because the dynamic loader cannot handle such true "null" symbols. e.g. 11127: symbol=null_symbol; lookup in file=./test [0] 11127: symbol=null_symbol; lookup in file=./libinterposer.so [0] 11127: symbol=null_symbol; lookup in file=/lib64/libdl.so.2 [0] 11127: symbol=null_symbol; lookup in file=/lib64/libc.so.6 [0] 11127: symbol=null_symbol; lookup in file=/lib64/ld-linux-x86-64.so.2 [0] 11127: ./libinterposer.so: error: symbol lookup error: undefined symbol: null_symbol (fatal) ./test: symbol lookup error: ./libinterposer.so: undefined symbol: null_symbol readelf -a -W libinterposer.so | grep null 0000000000600fd8 0000000d00000006 R_X86_64_GLOB_DAT 0000000000000000 null_symbol + 0 13: 0000000000000000 0 NOTYPE GLOBAL DEFAULT ABS null_symbol 50: 0000000000000000 0 NOTYPE GLOBAL DEFAULT ABS null_symbol Even if we fix the dynamic loader, the result returned from dlsym will be non-null because it will have the load offset added. Therefore the only way to get a true "null" symbol is to enable low addresses, and map the DSO at address zero for the symbol to exist. I see no useful reason to do this in a sensible application. Therefore if one sees a null return from dlsym et. al. then it means the symbol was not found, and barring (1), it really means "symbol not found". It is most likely a QoI issue that we should fix, with 'not found' being returned in dlerror() being the highest quality implementation. However, this has been the case forever on Linux, and I expect the man page text is a holdover from Solaris or AIX where it might have been possible to get a valid NULL symbol. It's almost never the case that you'll have a valid NULL symbol in Linux (at lest no easily), but rather than change the man page we should adjust the dlsym and dlvsym code to improve the implementation. This has to go through upstream, and it will change the semantic behaviour of dlsym and dlvsym, which might impact some applciations. This needs testing on dlmopen also to test all the code paths. This is not going to fit into a rhel-7.3 timeframe, so this will have to be rhel-7.4 or later. Patch posted upstream for review: https://sourceware.org/ml/libc-alpha/2016-02/msg00172.html This bug fix has the potential to break Address Sanitizer: https://llvm.org/bugs/show_bug.cgi?id=27310 I think it's not really defined what ASAN is doing (you need to have a working malloc when you call dlsym), but the question is if this kind of breakage is worth fixing this bug. Typical error message: ==10293==AddressSanitizer CHECK failed: ../../../../libsanitizer/asan/asan_rtl.cc:556 "((!asan_init_is_running && "ASan init calls itself!")) != (0)" (0x0, 0x0) <empty stack> Unfortunately, we cannot address this issue in Red Hat Enterprise Linux 7 because Address Sanitizer (ASAN) depends on dlsym (RTLD_NEXT) not providing an error message (see comment 12). This affects both the Address Sanitizer version in GCC, and the version in LLVM/Clang. There is also at least one more application which is confused by the more accurate dlerror reporting (fakeoot, often used for building software packages). This means that the risk of introducing regressions is just too high to implement this change in Red Hat Enterprise Linux 7. We already address this issue in upstream glibc, so future versions of Red Hat Enterprise Linux will very likely address this issue. |