Description of problem: As originally reported by Daniel Rusek, C++ applications using libcurl crash at startup on ppc64. The underlying cause is that PR_Init() crashes on ppc64 if it is invoked from C++ run-time. The attached reproducer only crashes if it also uses std::string in its source code despite its instantiation is not reachable from main(). Version-Release number of selected component (if applicable): nspr-4.21.0-1.el7.ppc64 Steps to Reproduce: 1. run the attached reproducer Actual results: The program terminates abnormally on SIGSEGV. Expected results: The program terminates successfully. Additional info: See the output of valgrind.
Created attachment 1651859 [details] rhbz1790475-reproducer.cc
# curl -JO 'https://bugzilla.redhat.com/attachment.cgi?id=1651859' # bash -x rhbz1790475-reproducer.cc ++ pkg-config nspr --cflags --libs + g++ rhbz1790475-reproducer.cc -I/usr/include/nspr4 -lplds4 -lplc4 -lnspr4 -lpthread -ldl -O0 -ggdb + ./a.out rhbz1790475-reproducer.cc: line 3: 22238 Segmentation fault ./a.out + exit 139 # valgrind ./a.out ==22239== Memcheck, a memory error detector ==22239== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==22239== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info ==22239== Command: ./a.out ==22239== ==22239== Jump to the invalid address stated on the next line ==22239== at 0x0: ??? ==22239== by 0x41B9E47: __pthread_once_slow (pthread_once.c:117) ==22239== by 0x42021BB: _dlerror_run (dlerror.c:129) ==22239== by 0x42018AF: dlopen@@GLIBC_2.3 (dlopen.c:87) ==22239== by 0x415289B: pr_FindSymbolInProg (prmem.c:98) ==22239== by 0x415289B: _PR_InitZones (prmem.c:154) ==22239== by 0x41595A7: _PR_InitStuff (prinit.c:144) ==22239== by 0x10000987: main (rhbz1790475-reproducer.cc:17) ==22239== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==22239== ==22239== ==22239== Process terminating with default action of signal 11 (SIGSEGV) ==22239== Bad permissions for mapped region at address 0x0 ==22239== at 0x0: ??? ==22239== by 0x41B9E47: __pthread_once_slow (pthread_once.c:117) ==22239== by 0x42021BB: _dlerror_run (dlerror.c:129) ==22239== by 0x42018AF: dlopen@@GLIBC_2.3 (dlopen.c:87) ==22239== by 0x415289B: pr_FindSymbolInProg (prmem.c:98) ==22239== by 0x415289B: _PR_InitZones (prmem.c:154) ==22239== by 0x41595A7: _PR_InitStuff (prinit.c:144) ==22239== by 0x10000987: main (rhbz1790475-reproducer.cc:17) ==22239== ==22239== HEAP SUMMARY: ==22239== in use at exit: 0 bytes in 0 blocks ==22239== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==22239== ==22239== All heap blocks were freed -- no leaks are possible ==22239== ==22239== For lists of detected and suppressed errors, rerun with: -s ==22239== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Segmentation fault
This seems to be caused by: glibc-2.17-307.el7.ppc64 After downgrade to: glibc-2.17-292.el7.ppc64 ... the reproducer does not crash any more. Even the binary complied against -307 works fine if -292 is used at run time. I am switching the component to glibc.
The reproducer does not crash with -293 but it crashes with -294. It is likely caused by the fix for bug #1406732.
Created attachment 1652045 [details] Adjust-security-hardening-changes-for-64-bit-POWER-BE.patch
You shouldn't be getting .dynbss copies of function symbols. R_PPC64_COPY on __pthread_key_create is the reason for this crash. With BIND_NOW, PLT entries in shared libraries will be initialized from the .dynbss copy *before* the copy itself is initialized, ie. you'll get a PLT entry of zeros.
Thank you for taking quick action on this!
Reproducer without NSPR dependency: cat >shared.c <<EOF #include <dlfcn.h> #include <pthread.h> static void *force_linking = pthread_create; void call_dlopen (void) { dlopen ("", 0); } EOF cat >main.cc <<EOF #include <string> void use_string () { std::string unused; } extern "C" void call_dlopen (); int main () { call_dlopen (); return 0; } EOF gcc -fPIC -shared -o libshared.so shared.c -ldl -lpthread g++ -Wl,-rpath,. -L. -lshared -o main main.cc ./main Expected outcome: No output. Actual outcome: Segmentation fault.
I've filed a binutils bug with a reproducer which does not depend on how glibc was built: https://sourceware.org/bugzilla/show_bug.cgi?id=25384 (Note: This does not mean that we will fix this regression with a binutils update.)
The upstream binutils fix introduces a text relocation. This is required because GCC mistakenly puts the reference to __pthread_key_create into .rodata, and not .data.relro as it should. I'm trying to verify if this has already been fixed in GCC.
I filed bug 1791321 for the underlying GCC bug (which I think is the ultimate problem here, but fixing GCC is definitely not the way to address the regression).
The plan right now for glibc is to back out some of the 64-bit power be hardening to remove the problematic mixture of copy relocation and BIND_NOW. Overall we are still moving in the right direction of enabling more hardening as time goes on, we just can't enable _all_ the hardening we wanted. This is not a problem for RHEL 8.
Verified with the reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1790475#c14
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0989