Bug 1748197

Summary: glibc: Reduce IFUNC resolver usage in libpthread and librt
Product: Red Hat Enterprise Linux 8 Reporter: Florian Weimer <fweimer>
Component: glibcAssignee: Florian Weimer <fweimer>
Status: CLOSED ERRATA QA Contact: qe-baseos-tools-bugs
Severity: unspecified Docs Contact: Zuzana Zoubkova <zzoubkov>
Priority: unspecified    
Version: 8.2CC: amahdal, ashankar, codonell, dj, fweimer, iboukris, lmanasko, mnewsome, pfrankli, sipoyare, skolosov
Target Milestone: rcKeywords: Patch, Triaged
Target Release: 8.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.28-123.el8 Doc Type: Bug Fix
Doc Text:
.`glibc` avoids certain failures caused by IFUNC resolver ordering Previously, the implementation of the `librt` and `libpthread` libraries of the GNU C Library `glibc` contained the indirect function (IFUNC) resolvers for the following functions: `clock_gettime`, `clock_getcpuclockid`, `clock_nanosleep`, `clock_settime`, `vfork`. In some cases, the IFUNC resolvers could execute before the `librt` and `libpthread` libraries were relocated. Consequently, applications would fail in the `glibc` dynamic loader during early program startup. With this release, the implementations of these functions have been moved into the `libc` component of `glibc`, which prevents the described problem from occurring.
Story Points: ---
Clone Of:
: 1754575 (view as bug list) Environment:
Last Closed: 2020-11-04 01:32:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1473680, 1748194, 1764231    
Bug Blocks: 1738779, 1819986    

Description Florian Weimer 2019-09-03 07:13:46 UTC
librt and libpthread both contain IFUNC resolvers which have relocation dependencies, for clock_gettime and other clock_* functions and vfork.

The vfork forwarder was removed in this upstream commit:

commit 41d6f74e6cb6a92ab428c11ee1e408b2a16aa1b0
Author: Florian Weimer <fweimer>
Date:   Tue Jul 2 15:12:20 2019 +0200

    nptl: Remove vfork IFUNC-based forwarder from libpthread [BZ #20188]
    
    With commit f0b2132b35248c1f4a80f62a2c38cddcc802aa8c ("ld.so:
    Support moving versioned symbols between sonames [BZ #24741]"), the
    dynamic linker will find the definition of vfork in libc and binds
    a vfork reference to that symbol, even if the soname in the version
    reference says that the symbol should be located in libpthread.
    
    As a result, the forwarder (whether it's IFUNC-based or a duplicate
    of the libc implementation) is no longer necessary.
    
    On older architectures, a placeholder symbol is required, to make sure
    that the GLIBC_2.1.2 symbol version does not go away, or is turned in
    to a weak symbol definition by the link editor.  (The symbol version
    needs to preserved so that the symbol coverage check in
    elf/dl-version.c does not fail for old binaries.)
    
    mips32 is an outlier: It defined __vfork@@GLIBC_2.2, but the
    baseline is GLIBC_2.0.  Since there are other @@GLIBC_2.2 symbols,
    the placeholder symbol is not needed there.

Removal of the librt forwarders is still pending upstream review.

Comment 4 Florian Weimer 2019-09-10 15:28:20 UTC
The upstream patch has been committed last week. It has been backported into Fedora 30; Fedora 29 and Fedora 31 updates are pending.

Comment 7 Florian Weimer 2019-09-10 20:22:17 UTC
My working theory is this:

nss_winbind fails to load and leaves an unrelocated mapped librt.so.1 behind because it is marked NODELETE.  The subsequent loads for libmount.so.1 and libnss_systemd.so.2 see the unrelocated librt.so.1 and issues the “Relink `/lib64/libmount.so.1' with `/lib64/librt.so.1' for IFUNC symbol `clock_gettime'” error.

I think this fits all the facts perfectly, including that libmount.so.1 and libnss_systemd.so.2 are in fact linked against librt.so.1. But it's a conjecture at this point.

If this is true, the crashes will likely go away only after the Samba dependencies are changed so that samba-winbind-modules and samba-winbind gets updated sooner in the transaction, or we find a way to unload unrelocated NODELETE modules.

Comment 8 Florian Weimer 2019-09-11 05:49:54 UTC
Reproducer without Samba:

(1) Build a non-loadable NSS module which links against librt.so.1:

gcc -shared -o linkmod.so -Wl,--soname=doesnotexist-bz1748197.so
gcc -shared -o /lib64/libnss_faulty.so.2 -Wl,--no-
as-needed -lrt ./linkmod.so

(2) Edit /etc/nsswitch.conf to include it:

passwd:      faulty sss files systemd                                          

(3) Trigger the bug:

# getent passwd does-not-exist                     
getent: Relink `/lib64/libmount.so.1' with `/lib64/librt.so.1' for IFUNC symbol
`clock_gettime'
getent: Relink `/lib64/libnss_systemd.so.2' with `/lib64/librt.so.1' for IFUNC symbol `clock_gettime'
Segmentation fault (core dumped)
# 


This is obviously a synthetic test case, so it's not guaranteed it matches the Samba update scenario. If this is indeed the trigger, it may be possible to work around this in an in-place upgrade scenario by editing /etc/nsswitch.conf around the RPM update transaction.

Carlos suggested the possibility that if we remove the IFUNC resolver, librt.so.1 gets re-relocated once needed by libmount.so.1 and libnss_systemd.so.2 are loaded and need it. In that case, the fix in this glibc bug here would be sufficient. I will try to verify that next.

Comment 9 Florian Weimer 2019-09-11 07:04:10 UTC
I tested the synthetic reproducer on Fedora 30. The IFUNC error message is gone, as expected. However, we still crash in dlopen:

#0  0x0000000000002950 in ?? ()
#1  0x00007f166a0cfe8a in call_init (l=<optimized out>, argc=argc@entry=3, 
    argv=argv@entry=0x7ffe034c74c8, env=env@entry=0x7ffe034c74e8)
    at dl-init.c:72
#2  0x00007f166a0cff91 in call_init (env=0x7ffe034c74e8, argv=0x7ffe034c74c8, 
    argc=3, l=<optimized out>) at dl-init.c:30
#3  _dl_init (main_map=main_map@entry=0x55ba10d8ea10, argc=3, 
    argv=0x7ffe034c74c8, env=0x7ffe034c74e8) at dl-init.c:119
#4  0x00007f166a0d3eee in dl_open_worker (a=a@entry=0x7ffe034c6f10)
    at dl-open.c:506
#5  0x00007f166a01f1f9 in __GI__dl_catch_exception (
    exception=exception@entry=0x7ffe034c6ef0, 
    operate=operate@entry=0x7f166a0d3b00 <dl_open_worker>, 
    args=args@entry=0x7ffe034c6f10) at dl-error-skeleton.c:196
#6  0x00007f166a0d376e in _dl_open (file=0x7ffe034c7190 "libnss_systemd.so.2", 
    mode=-2147483646, caller_dlopen=0x7f166a005ab4 <nss_load_library+356>, 
    nsid=-2, argc=3, argv=<optimized out>, env=0x7ffe034c74e8) at dl-open.c:588

Unfortunately, as can be seen below, librt.so.1 has not been re-relocated, and we crash once we try to execute its ELF constructors:

(gdb) up
#3  _dl_init (main_map=main_map@entry=0x55ba10d8ea10, argc=3, 
    argv=0x7ffe034c74c8, env=0x7ffe034c74e8) at dl-init.c:119
119         call_init (main_map->l_initfini[i], argc, argv, env);
(gdb) print main_map->l_initfini[0]->l_name
$5 = 0x55ba10d8d210 "/lib64/libnss_systemd.so.2"
(gdb) print main_map->l_initfini[1]->l_name
$6 = 0x55ba10d8d240 "/lib64/librt.so.1"
(gdb) print main_map->l_initfini[1]->l_relocated
$7 = 0

This is another instance of bug 1500128.

This means that this bug is not sufficient to fix the actual in-place upgrade failure.

Comment 11 Florian Weimer 2019-09-11 16:54:16 UTC
nss_systemd is unconditionally added to /etc/nsswitch.conf by a systemd-libs scriptlet:

function mod_nss() {
    if [ -f "$1" ] ; then
        # sed-fu to add myhostanme to hosts line
        grep -E -q '^hosts:.* myhostname' "$1" ||
        sed -i.bak -e '
                /^hosts:/ !b
                /\<myhostname\>/ b
                s/[[:blank:]]*$/ myhostname/
                ' "$1" &>/dev/null || :

        # Add nss-systemd to passwd and group
        grep -E -q '^(passwd|group):.* systemd' "$1" ||
        sed -i.bak -r -e '
                s/^(passwd|group):(.*)/\1: \2 systemd/
                ' "$1" &>/dev/null || :
    fi
}

FILE="$(readlink /etc/nsswitch.conf || echo /etc/nsswitch.conf)"
mod_nss "$FILE"

This is new in Red Hat Enterprise Linux 8; systemd-libs-219-67.el7.x86_64 does not do this.

Comment 14 Florian Weimer 2019-09-26 08:11:46 UTC
We should also backport this upstream commit:

commit b2b3b7598ae51c714b5fd0d0406d435e66f3624b
Author: Adhemerval Zanella <adhemerval.zanella>
Date:   Wed Sep 25 22:10:00 2019 +0000

    Set the expects flags to clock_nanosleep

    It moves the missing CFLAGS from rt/Makefile to time/Makefile missing
    from 7b5af2d8f2a2b (Finish move of clock_* functions to libc. [BZ #24959]).

    Checked on powerpc64le-linux-gnu.

        * rt/Makefile (CFLAGS-clock_nanosleep.c): Move to ...
        * time/Makefile (CFLAGS-clock_nanosleep.c): ... here.

Comment 19 Florian Weimer 2019-11-19 15:36:09 UTC
We should include this followup in the backport:

commit b2b3b7598ae51c714b5fd0d0406d435e66f3624b
Author: Adhemerval Zanella <adhemerval.zanella>
Date:   Wed Sep 25 22:10:00 2019 +0000

    Set the expects flags to clock_nanosleep
    
    It moves the missing CFLAGS from rt/Makefile to time/Makefile missing
    from 7b5af2d8f2a2b (Finish move of clock_* functions to libc. [BZ #24959]).
    
    Checked on powerpc64le-linux-gnu.
    
            * rt/Makefile (CFLAGS-clock_nanosleep.c): Move to ...
            * time/Makefile (CFLAGS-clock_nanosleep.c): ... here.

Comment 20 Florian Weimer 2020-04-08 11:16:41 UTC
This is the commit which removes the librt IFUNC redirectors:

commit 7b5af2d8f2a2b858319a792678b15a0db08764c7
Author: Zack Weinberg <zackw>
Date:   Wed Sep 4 08:18:57 2019 +0200

    Finish move of clock_* functions to libc. [BZ #24959]
    
    In glibc 2.17, the functions clock_getcpuclockid, clock_getres,
    clock_gettime, clock_nanosleep, and clock_settime were moved from
    librt.so to libc.so, leaving compatibility stubs behind.  Now that the
    dynamic linker no longer insists on finding versioned symbols in the
    same library that originally defined them, we do not need the stubs
    anymore, and this means we don't need GLIBC_PRIVATE __-prefix aliases
    for most of the functions anymore either.  (clock_gettime still needs
    one.)  For ports added before 2.17, libc.so needs to provide two
    symbol versions for each, the default at GLIBC_2.17 plus a compat
    version matching what librt had.
    
    While I'm at it, move the clock_*.c files and their tests from rt/ to
    time/.

This commit removes some of the GLIBC_PRIVATE internal aliases (which are not part of the external run-time ABI, so programs that use them are invalid).

Comment 25 Sergey Kolosov 2020-09-23 18:10:25 UTC
Verified, libpthread and librt don't contain indirect functions.

Comment 28 errata-xmlrpc 2020-11-04 01:32:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: glibc security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4444