RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1300049 - dlerror () returns NULL after dlsym (RTLD_NEXT) of a non-existent symbol
Summary: dlerror () returns NULL after dlsym (RTLD_NEXT) of a non-existent symbol
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: glibc
Version: 7.3
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Florian Weimer
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On: 1333945
Blocks: 1203710
TreeView+ depends on / blocked
 
Reported: 2016-01-19 20:24 UTC by Joe Wright
Modified: 2019-09-12 09:47 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1333945 (view as bug list)
Environment:
Last Closed: 2016-05-12 14:14:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Sourceware 19509 0 None None None 2016-01-21 18:07:27 UTC

Description Joe Wright 2016-01-19 20:24:48 UTC
Description of problem:
- Shouldn't the dlerror() return something meaningful instead of 0 after dlsym a non existent symbol? The current behavior also contradicts with the man page.

Version-Release number of selected component (if applicable):


How reproducible:
- consistently

Steps to Reproduce:

This prints all zeros:
std::cout << (void*)dlerror() << std::endl;
std::cout << dlsym(RTLD_NEXT, "does_not_exist") << std::endl;
std::cout << (void*)dlerror() << std::endl;
std::cout << dlvsym(RTLD_NEXT, "pthread_cond_timedwait", "DOES_NOT_EXIST") << std::endl;
std::cout << (void*)dlerror() << std::endl;

/// a.C
#include <iostream>
#include <dlfcn.h>
int main()
{
  std::cout << (void*)dlerror() << std::endl;
  std::cout << dlsym(RTLD_NEXT, "does_not_exist") << std::endl;
  std::cout << (void*)dlerror() << std::endl;
  std::cout << dlvsym(RTLD_NEXT, "pthread_cond_timedwait", "DOES_NOT_EXIST") << std::endl;
  std::cout << (void*)dlerror() << std::endl;
}

Run commands:
  g++ a.C -ldl 
  ./a.out

$ ldd -r ./a.out
        linux-vdso.so.1 =>  (0x00007ffccbebe000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003fa5400000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003fab400000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003fa5800000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003fa8c00000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003fa4c00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003fa4800000)
[alanm@vmw102 C01555171]$ ./a.out
0
0
0
0
0
LD_DEBUG=symbols,bindings ./a.out


     57410:     symbol=_res;  lookup in file=./a.out [0]
     57410:     symbol=_res;  lookup in file=/lib64/libdl.so.2 [0]
     57410:     symbol=_res;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     57410:     symbol=_res;  lookup in file=/lib64/libm.so.6 [0]
     57410:     symbol=_res;  lookup in file=/lib64/libgcc_s.so.1 [0]
     57410:     symbol=_res;  lookup in file=/lib64/libc.so.6 [0]
     57410:     binding file /lib64/libc.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `_res' [GLIBC_2.2.5]
     57410:     symbol=_IO_file_close;  lookup in file=./a.out [0]
     57410:     symbol=_IO_file_close;  lookup in file=/lib64/libdl.so.2 [0]
     57410:     symbol=_IO_file_close;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     57410:     symbol=_IO_file_close;  lookup in file=/lib64/libm.so.6 [0]
     57410:     symbol=_IO_file_close;  lookup in file=/lib64/libgcc_s.so.1 [0]
     57410:     symbol=_IO_file_close;  lookup in file=/lib64/libc.so.6 [0]
     57410:     binding file /lib64/libc.so.6 [0] to /lib64/libc.so.6 [0]: normal symbol `_IO_file_close' [GLIBC_2.2.5]
     57410:     symbol=rpc_createerr;  lookup in file=./a.out [0]
     57410:     symbol=rpc_createerr;  lookup in file=/lib64/libdl.so.2 [0]
     57410:     symbol=rpc_createerr;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     57410:     symbol=rpc_createerr;  lookup in file=/lib64/libm.so.6 [0]
     57410:     symbol=rpc_createerr;  lookup in file=/lib64/libgcc_s.so.1 [0]
     57410:     symbol=rpc_createerr;  lookup in file=/lib64/libc.so.6 [0]

.....




Where are you experiencing the behavior?  What environment?

Shouldn't the dlerror() return something meaningful instead of 0 after dlsym a non existent symbol? The current behavior also contradicts with the man page.

Actual results:
- returns 0

Expected results:
- The posix spec says:
If handle does not refer to a valid symbol table handle or if the symbol named by name cannot be found in the symbol table associated with handle, dlsym() shall return a null pointer.

Additional info:

I'm not sure about the expected behavior but I do see RTDL_DEFAULT behaving:


RTDL_NEXT

  [...]
     11751:     binding file /lib64/libdl.so.2 [0] to /lib64/libc.so.6 [0]: normal symbol `_dl_sym' [GLIBC_PRIVATE]
     11751:     symbol=does_not_exist;  lookup in file=/lib64/libdl.so.2 [0]
     11751:     symbol=does_not_exist;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     11751:     symbol=does_not_exist;  lookup in file=/lib64/libm.so.6 [0]
     11751:     symbol=does_not_exist;  lookup in file=/lib64/libgcc_s.so.1 [0]
     11751:     symbol=does_not_exist;  lookup in file=/lib64/libc.so.6 [0]
     11751:     symbol=does_not_exist;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
0
0


RTDL_DEFAULT

     11790:     binding file /lib64/libdl.so.2 [0] to /lib64/libc.so.6 [0]: normal symbol `_dl_sym' [GLIBC_PRIVATE]
     11790:     symbol=does_not_exist;  lookup in file=./a.out [0]
     11790:     symbol=does_not_exist;  lookup in file=/lib64/libdl.so.2 [0]
     11790:     symbol=does_not_exist;  lookup in file=/usr/lib64/libstdc++.so.6 [0]
     11790:     symbol=does_not_exist;  lookup in file=/lib64/libm.so.6 [0]
     11790:     symbol=does_not_exist;  lookup in file=/lib64/libgcc_s.so.1 [0]
     11790:     symbol=does_not_exist;  lookup in file=/lib64/libc.so.6 [0]
     11790:     symbol=does_not_exist;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
     11790:     ./a.out: error: symbol lookup error: undefined symbol: does_not_exist (fatal)
0
   [...]
     11793:     symbol=free;  lookup in file=/lib64/libc.so.6 [0]
     11793:     binding file /lib64/libdl.so.2 [0] to /lib64/libc.so.6 [0]: normal symbol `free' [GLIBC_2.2.5]
0x23b80c0


Didn't find anything on quick googling 

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=430732

Even in the latest version it's still noted that they
> .. are reserved for future use as special values that applications may be allowed to use for handle.
http://pubs.opengroup.org/onlinepubs/9699919799/


The posix spec says:
If handle does not refer to a valid symbol table handle or if the symbol named by name cannot be found in the symbol table associated with handle, dlsym() shall return a null pointer.

More detailed diagnostic information shall be available through dlerror().
and the dlerror() page says:
If no dynamic linking errors have occurred since the last invocation of dlerror(), dlerror() shall return NULL.
If successful, dlerror() shall return a null-terminated character string; otherwise, NULL shall be returned.

In this specific case, the symbol does not exist, so it returns 0, which is good. But why dlerror() also returns NULL is a bit confusing.

Comment 3 Florian Weimer 2016-01-19 21:18:12 UTC
Looking at _dl_lookup_symbol_x, this may indeed be a bug due to the way RTLD_NEXT is implemented: it continues after lookup errors, but this way, it never signals the error.

Comment 4 Florian Weimer 2016-01-19 21:55:44 UTC
On the other hand, the error is deliberately masked here for the RTLD_NEXT case (where skip_map == NULL):

    858   if (__glibc_unlikely (current_value.s == NULL))
    859     {
    860       if ((*ref == NULL || ELFW(ST_BIND) ((*ref)->st_info) != STB_WEAK)
    861           && skip_map == NULL
    862           && !(GLRO(dl_debug_mask) & DL_DEBUG_UNUSED))
    863         {
    864           /* We could find no value for a strong reference.  */
    865           const char *reference_name = undef_map ? undef_map->l_name : "";
    866           const char *versionstr = version ? ", version " : "";
    867           const char *versionname = (version && version->name
    868                                      ? version->name : "");
    869 
    870           /* XXX We cannot translate the message.  */
    871           _dl_signal_cerror (0, DSO_FILENAME (reference_name),
    872                              N_("symbol lookup error"),
    873                              make_string ("undefined symbol: ", undef_name,
    874                                           versionstr, versionname));
    875         }
    876       *ref = NULL;
    877       return 0;
    878     }

This was carried over from _dl_lookup_symbol_skip when the separate function was removed in upstream commit  bdf4a4f1eabb2e085b0610b53bb37b5263f4728d.  The original implementation of _dl_lookup_symbol_skip in commit 84384f5b6aaa622236ada8c9a7ff51f40b91fc20 did not have error reporting, either.  Why this is so is unclear to me.

Solaris documentation implies that the dlerror return value changes if dlsym with RTLD_NEXT is unsuccessful.  Therefore, I think we should change glibc behavior.

Comment 8 Carlos O'Donell 2016-01-20 04:32:54 UTC
The return of NULL from dlsym or dlvsym is sufficient to indicate the symbol was not found.

Yet, there are two more cases of interest that I can see:

(1) Return alternate errors other than "not found"

This is one of the only reasonable reasons to want this fixed. The functions have run into a serious internal error and reporting it can be done via dlerror.

(2) Support NULL symbols.

One might argue that this doesn't support distinguishing between a true "null" symbol, a symbol whose address is 0x0, versus a not-found symbol, and that's true. 

At present, such a symbol can only, as far as I know, be constructed artificially via a linker script (as a NOTYPE symbol via PROVIDE e.g. PROVIDE(null_symbol = 0x0);) or via special section directives and assembly.

Relocations against such symbols will fail today (abort ld.so) because the dynamic loader cannot handle such true "null" symbols.

e.g.
     11127:	symbol=null_symbol;  lookup in file=./test [0]
     11127:	symbol=null_symbol;  lookup in file=./libinterposer.so [0]
     11127:	symbol=null_symbol;  lookup in file=/lib64/libdl.so.2 [0]
     11127:	symbol=null_symbol;  lookup in file=/lib64/libc.so.6 [0]
     11127:	symbol=null_symbol;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
     11127:	./libinterposer.so: error: symbol lookup error: undefined symbol: null_symbol (fatal)
./test: symbol lookup error: ./libinterposer.so: undefined symbol: null_symbol

readelf -a -W libinterposer.so | grep null
0000000000600fd8  0000000d00000006 R_X86_64_GLOB_DAT      0000000000000000 null_symbol + 0
    13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  ABS null_symbol
    50: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  ABS null_symbol

Even if we fix the dynamic loader, the result returned from dlsym will be non-null because it will have the load offset added. Therefore the only way to get a true "null" symbol is to enable low addresses, and map the DSO at address zero for the symbol to exist. I see no useful reason to do this in a sensible application. Therefore if one sees a null return from dlsym et. al. then it means the symbol was not found, and barring (1), it really means "symbol not found".

Comment 9 Carlos O'Donell 2016-02-09 04:01:20 UTC
It is most likely a QoI issue that we should fix, with 'not found' being returned in dlerror() being the highest quality implementation. However, this has been the case forever on Linux, and I expect the man page text is a holdover from Solaris or AIX where it might have been possible to get a valid NULL symbol. It's almost never the case that you'll have a valid NULL symbol in Linux (at lest no easily), but rather than change the man page we should adjust the dlsym and dlvsym code to improve the implementation.

This has to go through upstream, and it will change the semantic behaviour of dlsym and dlvsym, which might impact some applciations. This needs testing on dlmopen also to test all the code paths.

This is not going to fit into a rhel-7.3 timeframe, so this will have to be rhel-7.4 or later.

Comment 10 Florian Weimer 2016-02-09 14:32:23 UTC
Patch posted upstream for review:

  https://sourceware.org/ml/libc-alpha/2016-02/msg00172.html

Comment 12 Florian Weimer 2016-05-10 11:52:35 UTC
This bug fix has the potential to break Address Sanitizer:

  https://llvm.org/bugs/show_bug.cgi?id=27310

I think it's not really defined what ASAN is doing (you need to have a working malloc when you call dlsym), but the question is if this kind of breakage is worth fixing this bug.

Comment 13 Florian Weimer 2016-05-10 11:55:25 UTC
Typical error message:

==10293==AddressSanitizer CHECK failed: ../../../../libsanitizer/asan/asan_rtl.cc:556 "((!asan_init_is_running && "ASan init calls itself!")) != (0)" (0x0, 0x0)
    <empty stack>

Comment 14 Florian Weimer 2016-05-12 14:14:22 UTC
Unfortunately, we cannot address this issue in Red Hat Enterprise Linux 7 because Address Sanitizer (ASAN) depends on dlsym (RTLD_NEXT) not providing an error message (see comment 12).  This affects both the Address Sanitizer version in GCC, and the version in LLVM/Clang.

There is also at least one more application which is confused by the more accurate dlerror reporting (fakeoot, often used for building software packages).

This means that the risk of introducing regressions is just too high to implement this change in Red Hat Enterprise Linux 7.

We already address this issue in upstream glibc, so future versions of Red Hat Enterprise Linux will very likely address this issue.


Note You need to log in before you can comment on or make changes to this bug.