Bug 1790475 - glibc: BIND_NOW is incompatible with copy relocations on ppc64
Summary: glibc: BIND_NOW is incompatible with copy relocations on ppc64
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: glibc
Version: 7.8
Hardware: ppc64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: glibc team
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1793853
TreeView+ depends on / blocked
 
Reported: 2020-01-13 12:58 UTC by Kamil Dudka
Modified: 2020-10-05 08:28 UTC (History)
14 users (show)

Fixed In Version: glibc-2.17-307.el7.1
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1793853 (view as bug list)
Environment:
Last Closed: 2020-03-31 19:08:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rhbz1790475-reproducer.cc (246 bytes, text/plain)
2020-01-13 13:00 UTC, Kamil Dudka
no flags Details
Adjust-security-hardening-changes-for-64-bit-POWER-BE.patch (1.76 KB, patch)
2020-01-14 03:32 UTC, Carlos O'Donell
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1791321 0 unspecified CLOSED gcc: weakref attribute introduces the need for copy or text relocations on ppc64 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2020:0989 0 None None None 2020-03-31 19:08:51 UTC
Sourceware 25384 0 P2 RESOLVED Copy relocations and BIND_NOW on POWER ELFv1 results in crashes 2020-04-28 19:22:28 UTC

Internal Links: 1791321

Description Kamil Dudka 2020-01-13 12:58:20 UTC
Description of problem:
As originally reported by Daniel Rusek, C++ applications using libcurl crash at startup on ppc64.  The underlying cause is that PR_Init() crashes on ppc64 if it is invoked from C++ run-time.  The attached reproducer only crashes if it also uses std::string in its source code despite its instantiation is not reachable from main().


Version-Release number of selected component (if applicable):
nspr-4.21.0-1.el7.ppc64


Steps to Reproduce:
1. run the attached reproducer


Actual results:
The program terminates abnormally on SIGSEGV.


Expected results:
The program terminates successfully.


Additional info:
See the output of valgrind.

Comment 3 Kamil Dudka 2020-01-13 13:00:47 UTC
Created attachment 1651859 [details]
rhbz1790475-reproducer.cc

Comment 4 Kamil Dudka 2020-01-13 13:02:06 UTC
# curl -JO 'https://bugzilla.redhat.com/attachment.cgi?id=1651859'
# bash -x rhbz1790475-reproducer.cc
++ pkg-config nspr --cflags --libs
+ g++ rhbz1790475-reproducer.cc -I/usr/include/nspr4 -lplds4 -lplc4 -lnspr4 -lpthread -ldl -O0 -ggdb
+ ./a.out
rhbz1790475-reproducer.cc: line 3: 22238 Segmentation fault      ./a.out
+ exit 139

# valgrind ./a.out 
==22239== Memcheck, a memory error detector
==22239== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22239== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==22239== Command: ./a.out
==22239== 
==22239== Jump to the invalid address stated on the next line
==22239==    at 0x0: ???
==22239==    by 0x41B9E47: __pthread_once_slow (pthread_once.c:117)
==22239==    by 0x42021BB: _dlerror_run (dlerror.c:129)
==22239==    by 0x42018AF: dlopen@@GLIBC_2.3 (dlopen.c:87)
==22239==    by 0x415289B: pr_FindSymbolInProg (prmem.c:98)
==22239==    by 0x415289B: _PR_InitZones (prmem.c:154)
==22239==    by 0x41595A7: _PR_InitStuff (prinit.c:144)
==22239==    by 0x10000987: main (rhbz1790475-reproducer.cc:17)
==22239==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==22239== 
==22239== 
==22239== Process terminating with default action of signal 11 (SIGSEGV)
==22239==  Bad permissions for mapped region at address 0x0
==22239==    at 0x0: ???
==22239==    by 0x41B9E47: __pthread_once_slow (pthread_once.c:117)
==22239==    by 0x42021BB: _dlerror_run (dlerror.c:129)
==22239==    by 0x42018AF: dlopen@@GLIBC_2.3 (dlopen.c:87)
==22239==    by 0x415289B: pr_FindSymbolInProg (prmem.c:98)
==22239==    by 0x415289B: _PR_InitZones (prmem.c:154)
==22239==    by 0x41595A7: _PR_InitStuff (prinit.c:144)
==22239==    by 0x10000987: main (rhbz1790475-reproducer.cc:17)
==22239== 
==22239== HEAP SUMMARY:
==22239==     in use at exit: 0 bytes in 0 blocks
==22239==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==22239== 
==22239== All heap blocks were freed -- no leaks are possible
==22239== 
==22239== For lists of detected and suppressed errors, rerun with: -s
==22239== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault

Comment 5 Kamil Dudka 2020-01-13 13:47:10 UTC
This seems to be caused by:

    glibc-2.17-307.el7.ppc64

After downgrade to:

    glibc-2.17-292.el7.ppc64

... the reproducer does not crash any more.  Even the binary complied against -307 works fine if -292 is used at run time.  I am switching the component to glibc.

Comment 6 Kamil Dudka 2020-01-13 14:17:55 UTC
The reproducer does not crash with -293 but it crashes with -294.  It is likely caused by the fix for bug #1406732.

Comment 8 Carlos O'Donell 2020-01-14 03:32:04 UTC
Created attachment 1652045 [details]
Adjust-security-hardening-changes-for-64-bit-POWER-BE.patch

Comment 11 Alan Modra 2020-01-14 08:09:26 UTC
You shouldn't be getting .dynbss copies of function symbols.  R_PPC64_COPY on __pthread_key_create is the reason for this crash.  With BIND_NOW, PLT entries in shared libraries will be initialized from the .dynbss copy *before* the copy itself is initialized, ie. you'll get a PLT entry of zeros.

Comment 12 Kamil Dudka 2020-01-14 08:31:02 UTC
Thank you for taking quick action on this!

Comment 14 Florian Weimer 2020-01-14 12:58:51 UTC
Reproducer without NSPR dependency:

cat >shared.c <<EOF
#include <dlfcn.h>
#include <pthread.h>

static void *force_linking = pthread_create;

void
call_dlopen (void)
{
  dlopen ("", 0);
}
EOF
cat >main.cc <<EOF
#include <string>

void
use_string ()
{
  std::string unused;
}

extern "C" void call_dlopen ();

int
main ()
{
  call_dlopen ();
  return 0;
}
EOF

gcc -fPIC -shared -o libshared.so shared.c -ldl -lpthread
g++ -Wl,-rpath,. -L. -lshared -o main main.cc
./main

Expected outcome: No output.
Actual outcome: Segmentation fault.

Comment 15 Florian Weimer 2020-01-14 13:52:59 UTC
I've filed a binutils bug with a reproducer which does not depend on how glibc was built:

  https://sourceware.org/bugzilla/show_bug.cgi?id=25384

(Note: This does not mean that we will fix this regression with a binutils update.)

Comment 18 Florian Weimer 2020-01-15 13:56:44 UTC
The upstream binutils fix introduces a text relocation. This is required because GCC mistakenly puts the reference to __pthread_key_create into .rodata, and not .data.relro as it should.

I'm  trying to verify if this has already been fixed in GCC.

Comment 19 Florian Weimer 2020-01-15 14:14:25 UTC
I filed bug 1791321 for the underlying GCC bug (which I think is the ultimate problem here, but fixing GCC is definitely not the way to address the regression).

Comment 20 Carlos O'Donell 2020-01-15 15:48:55 UTC
The plan right now for glibc is to back out some of the 64-bit power be hardening to remove the problematic mixture of copy relocation and BIND_NOW. Overall we are still moving in the right direction of enabling more hardening as time goes on, we just can't enable _all_ the hardening we wanted. This is not a problem for RHEL 8.

Comment 23 Sergey Kolosov 2020-01-24 08:48:53 UTC
Verified with the reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=1790475#c14

Comment 26 errata-xmlrpc 2020-03-31 19:08:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0989


Note You need to log in before you can comment on or make changes to this bug.