Bug 1312462

Summary: tcmalloc does not follow ifunc constraints (leads to SIGSEGV on certain architectures)
Product: [Fedora] Fedora Reporter: Richard W.M. Jones <rjones>
Component: gperftoolsAssignee: Tom "spot" Callaway <tcallawa>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: alkondratenko, amit.shah, arjun.is, berrange, cfergeau, codonell, davejohansen, dj, dwmw2, fweimer, itamar, jakub, jwakely, law, mfabian, mpolacek, mskalick, pbonzini, pbrobinson, pfrankli, rjones, siddhesh, tcallawa, than, virt-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: armv7hl   
OS: Unspecified   
Whiteboard:
Fixed In Version: gperftools-2.4.91-1.fc24 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-26 18:16:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 245418, 910269, 1314483    
Attachments:
Description Flags
ifuncstrlen.c none

Description Richard W.M. Jones 2016-02-26 18:25:19 UTC
Description of problem:

The command

  /usr/bin/qemu-system-arm -help

segfaults when run on armv7 host at the moment.  This happens
in both Rawhide and the f24 branch.

Version-Release number of selected component (if applicable):

qemu 2:2.5.0-4.fc24 armv7hl

How reproducible:

100%

Steps to Reproduce:
1. Run command as above.

Additional info:

https://kojipkgs.fedoraproject.org//work/tasks/8982/13138982/root.log
https://kojipkgs.fedoraproject.org//work/tasks/8982/13138982/build.log

(possibly more to follow if I can reproduce this locally)

Comment 1 Richard W.M. Jones 2016-02-26 20:16:28 UTC
Stack trace is a bit of a weird one, but here we go:

#0  0x0000f9cc in ?? ()
#1  0xb262031c in GetenvBeforeMain(char const*) () from /lib/libtcmalloc.so.4
#2  0xb2600858 in sized_delete_enabled() [clone .part.3] ()
   from /lib/libtcmalloc.so.4
#3  0xb2602a64 in operator delete(void*, unsigned int) ()
   from /lib/libtcmalloc.so.4
#4  0xb64e1054 in _dl_relocate_object () from /lib/ld-linux-armhf.so.3
#5  0xb64d6718 in dl_main () from /lib/ld-linux-armhf.so.3
#6  0xb64ed27c in _dl_sysdep_start () from /lib/ld-linux-armhf.so.3
#7  0xb64d8424 in _dl_start () from /lib/ld-linux-armhf.so.3
#8  0xb64d3ad0 in _start () from /lib/ld-linux-armhf.so.3

Comment 2 Richard W.M. Jones 2016-02-27 08:08:18 UTC
This bug affects any program linked to tcmalloc, even trivial ones.  Example:

$ cat test.c
#include <stdio.h>
#include <stdlib.h>

int
main (int argc, char *argv[])
{
  printf ("hello\n");
  exit (0);
}

$ gcc -g -Wall test.c -ltcmalloc -o test
$ ./test 
Segmentation fault (core dumped)

(gdb) bt

#0  0x0000f9cc in ?? ()
#1  0xb6f9431c in GetenvBeforeMain (
    name=0xb6f995e8 "TCMALLOC_ENABLE_SIZED_DELETE") at src/base/sysinfo.cc:158
#2  0xb6f74858 in sized_delete_enabled ()
    at src/libc_override_gcc_and_weak.h:103
#3  0xb6f76a64 in sized_delete_enabled ()
    at src/libc_override_gcc_and_weak.h:114
#4  resolve_delete_sized () at src/libc_override_gcc_and_weak.h:110
#5  0xb6fda054 in elf_ifunc_invoke (addr=3069667864)
    at ../sysdeps/arm/dl-irel.h:33
#6  elf_machine_rel (skip_ifunc=<optimized out>, reloc_addr_arg=0xb6fb0b20, 
    version=<optimized out>, sym=0xb6f659c8, reloc=0xb6f71490, map=0xb6ffc658)
    at ../sysdeps/arm/dl-machine.h:389
#7  elf_dynamic_do_Rel (skip_ifunc=<optimized out>, lazy=<optimized out>, 
    nrelative=<optimized out>, relsize=<optimized out>, 
    reladdr=<optimized out>, map=0xb6ffc658) at do-rel.h:137
#8  _dl_relocate_object (scope=<optimized out>, reloc_mode=<optimized out>, 
    consider_profiling=<optimized out>, consider_profiling@entry=0)
    at dl-reloc.c:258
#9  0xb6fcf718 in dl_main (phdr=<optimized out>, phnum=<optimized out>, 
    user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2066
#10 0xb6fe627c in _dl_sysdep_start (
    start_argptr=start_argptr@entry=0xbefff490, dl_main=0xb6fcd2dc <dl_main>)
    at ../elf/dl-sysdep.c:249
#11 0xb6fd1424 in _dl_start_final (arg=0xbefff490) at rtld.c:305
#12 _dl_start (arg=0xbefff490) at rtld.c:411
#13 0xb6fccad0 in _start () from /lib/ld-linux-armhf.so.3

Comment 3 Richard W.M. Jones 2016-02-29 12:28:31 UTC
I spent ages getting a reasonable armv7 environment up, but now I have
one running virtualized on top of aarch64, so it's nice and fast :-)

Anyway, the first clue is that upstream gperftools (which is at the
identical version number) does not fail when compiled out of the box.
It seems to be something to do with our Fedora build process.

Unfortunately I have not been able to reproduce this with the upstream
sources.  I used:

CFLAGS="-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -march=armv7-a -mfpu=vfpv3-d16  -mfloat-abi=hard -fno-strict-aliasing -Wno-unused-local-typedefs -DTCMALLOC_LARGE_PAGES" \
CXXFLAGS="-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -march=armv7-a -mfpu=vfpv3-d16  -mfloat-abi=hard -fno-strict-aliasing -Wno-unused-local-typedefs -DTCMALLOC_LARGE_PAGES" \
LDFLAGS="-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld" \
./configure --disable-static

but the upstream library built this way works fine, so I'm a bit
confused right now.

Comment 4 Richard W.M. Jones 2016-02-29 12:52:33 UTC
Our %{configure} macro edits ltmain.sh to add the
`-specs=/usr/lib/rpm/redhat/redhat-hardened-ld' flag.  Doing the
same thing to the upstream ltmain.sh reproduces the problem.

So .. it's hardened build which is the problem.

I added the following patch to my copy, and that fixes the problem
for me, but it's a bit of a big hammer so I didn't push it to
dist-git.

diff --git a/gperftools.spec b/gperftools.spec
index 6ae1652..2b2dee8 100644
--- a/gperftools.spec
+++ b/gperftools.spec
@@ -1,10 +1,15 @@
 # This package used to be called "google-perftools", but it was renamed on 2012-02-03.
 
+%ifarch %{arm}
+# https://bugzilla.redhat.com/show_bug.cgi?id=1312462
+%undefine _hardened_build
+%endif
+
 %{!?_pkgdocdir: %global _pkgdocdir %{_docdir}/%{name}-%{version}}
 
 Name:		gperftools
 Version:	2.4.90
-Release:	1%{?dist}
+Release:	1%{?dist}.bz1312462.1
 License:	BSD
 Group:		Development/Tools
 Summary:	Very fast malloc and performance analysis tools
-- 
2.5.0

Comment 5 Tom "spot" Callaway 2016-02-29 13:35:25 UTC
Yeah, go ahead and swing that hammer and commit that change and rebuild in f24/f25 (bump the release to 2). It would be nice to understand what in the hardened ld is causing the issue, but this is a bit more than I know how to debug.

Comment 6 Richard W.M. Jones 2016-02-29 17:39:26 UTC
git bisect speaks, we listen ...

6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7 is the first bad commit
commit 6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7
Author: Aliaksey Kandratsenka <alkondratenko>
Date:   Sat Oct 24 23:16:45 2015 -0700

    implemented enabling sized-delete support at runtime
    
    Under gcc 4.5 or greater we're using ifunc function attribute to resolve
    sized delete operator to either plain delete implementation (default) or
    to sized delete (if enabled via environment variable
    TCMALLOC_ENABLE_SIZED_DELETE).

:100644 100644 321a0f3e0f3e4e0c0fad20fb290c57fba463151f 6e25cf1c1e2122d7077c338de54165223d40dd4d M	configure.ac
:040000 040000 5503e1c33e0d97c44c3082e2d409998b05eb593f b3dd1c69d67d51cf2b709b6142c7429292a4d4fd M	src

https://github.com/gperftools/gperftools/commit/6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7

Comment 7 Aliaksei Kandratsenka 2016-03-01 03:53:17 UTC
Thanks for looping me in. If commit 6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7 is affected then I have a guess why -z now breaks it.

Try adding __attribute__((visibility("hidden")) to TCMallocGetenvSafe,
alternatively try building with ./configure --disable-dynamic-sized-delete-support

Comment 8 Aliaksei Kandratsenka 2016-03-01 04:24:51 UTC
Also does gcc on fedora/arm supports ifunc? On my debian/arm unstable it is not supported.

Comment 9 Richard W.M. Jones 2016-03-01 14:48:13 UTC
(In reply to Aliaksei Kandratsenka from comment #7)
> Thanks for looping me in. If commit 6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7
> is affected then I have a guess why -z now breaks it.
> 
> Try adding __attribute__((visibility("hidden")) to TCMallocGetenvSafe,

I made the change below but it still segfaults.

--- a/src/getenv_safe.h
+++ b/src/getenv_safe.h
@@ -54,7 +54,7 @@ extern "C" {
  * NOTE: this is version of GetenvBeforeMain that's usable from
  * C. Implementation is in sysinfo.cc
  */
-const char* TCMallocGetenvSafe(const char* name);
+const char* TCMallocGetenvSafe(const char* name) __attribute__((visibility("hidden")));
 
 #ifdef __cplusplus
 }

> alternatively try building with ./configure
> --disable-dynamic-sized-delete-support

Yes this works.  This appears to be a fairly obscure, non-standard feature
of C++ if I'm reading this page right?
https://isocpp.org/files/papers/n3778.html

(In reply to Aliaksei Kandratsenka from comment #8)
> Also does gcc on fedora/arm supports ifunc? On my debian/arm unstable it is
> not supported.

Yes it is.  I verified it exists and is working with a small
demo library+program.  Quite a nice feature which I wasn't aware
of before.

We are using:

gcc-c++-6.0.0-0.13.fc25.armv7hl
binutils-2.26-13.fc25.armv7hl

Comment 10 Aliaksei Kandratsenka 2016-03-01 19:06:11 UTC
Thanks for update.

I actually suspected ifunc & -z now interaction to be a cause. I could be wrong.

Regarding sized-delete, it is actually a standard feature as of C++14. It's not without challenges (breaks some apps there were legal n C++11 and before), but for some apps it is significant help on free/delete performance.

In order to see if that is due to sized-delete or due to ifunc, may I ask you to post backtraces of the crash ? Also does it also crash when CXXFLAGS and CFLAGS is set to something like '-O0 -fno-inline -ggdb3' ? If so please include backtrace from such crash as well since it'll have more precise line numbers.

Comment 11 Richard W.M. Jones 2016-03-01 19:08:46 UTC
See comment 2 above.  I'll see if compiling with less
optimization produces better backtraces shortly.

Comment 12 Aliaksei Kandratsenka 2016-03-01 19:33:42 UTC
No need to get better backtraces.

It indeed looks like ifunc & -z now interaction.

There seem to be a bit of chicken & egg problem. We've asked linker to resolve all symbols eagerly. But for dynamic sized delete it means calling ifunc which calls other things (in this case strlen) which might be not ready yet. Not sure about arm, but on x86 strlen is ifunc-ed as well.

I'm not linking expert, but it seems possible for linker to support resolving ifunc-func strlen while ifunc-ful sized delete is resolved. So perhaps something to raise to glibc people ?

You can disable dynamic sized delete for now.

Comment 13 Richard W.M. Jones 2016-03-01 20:48:08 UTC
Here's the enhanced stack trace anyway, with all optimization
turned off and full dwarf.

(gdb) bt
#0  0x0003bf48 in ?? ()
#1  0xb6f912e0 in GetenvBeforeMain (
    name=0xb6f94954 "TCMALLOC_ENABLE_SIZED_DELETE") at src/base/sysinfo.cc:158
#2  0xb6f913e4 in TCMallocGetenvSafe (
    name=0xb6f94954 "TCMALLOC_ENABLE_SIZED_DELETE") at src/base/sysinfo.cc:174
#3  0xb6f4e114 in sized_delete_enabled ()
    at src/libc_override_gcc_and_weak.h:103
#4  0xb6f4e158 in resolve_delete_sized ()
    at src/libc_override_gcc_and_weak.h:110
#5  0xb6fda054 in elf_ifunc_invoke (addr=3069501764)
    at ../sysdeps/arm/dl-irel.h:33
#6  elf_machine_rel (skip_ifunc=<optimized out>, reloc_addr_arg=0xb6fb030c, 
    version=<optimized out>, sym=0xb6f15880, reloc=0xb6f47a44, map=0xb6ffc718)
    at ../sysdeps/arm/dl-machine.h:389
#7  elf_dynamic_do_Rel (skip_ifunc=<optimized out>, lazy=<optimized out>, 
    nrelative=<optimized out>, relsize=<optimized out>, 
    reladdr=<optimized out>, map=0xb6ffc718) at do-rel.h:137
#8  _dl_relocate_object (scope=<optimized out>, reloc_mode=<optimized out>, 
    consider_profiling=<optimized out>, consider_profiling@entry=0)
    at dl-reloc.c:258
#9  0xb6fcf718 in dl_main (phdr=<optimized out>, phnum=<optimized out>, 
    user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2066
#10 0xb6fe6284 in _dl_sysdep_start (
    start_argptr=start_argptr@entry=0xbefff490, dl_main=0xb6fcd2dc <dl_main>)
    at ../elf/dl-sysdep.c:249
#11 0xb6fd1424 in _dl_start_final (arg=0xbefff490) at rtld.c:305
#12 _dl_start (arg=0xbefff490) at rtld.c:411
#13 0xb6fccad0 in _start () from /lib/ld-linux-armhf.so.3
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) frame 1
#1  0xb6f912e0 in GetenvBeforeMain (
    name=0xb6f94954 "TCMALLOC_ENABLE_SIZED_DELETE") at src/base/sysinfo.cc:158
158	  const int namelen = strlen(name);
(gdb) print name
$1 = 0xb6f94954 "TCMALLOC_ENABLE_SIZED_DELETE"

Comment 14 Richard W.M. Jones 2016-03-01 20:54:21 UTC
(In reply to Aliaksei Kandratsenka from comment #12)
> There seem to be a bit of chicken & egg problem. We've asked linker to
> resolve all symbols eagerly. But for dynamic sized delete it means calling
> ifunc which calls other things (in this case strlen) which might be not
> ready yet. Not sure about arm, but on x86 strlen is ifunc-ed as well.

On the Fedora armv7 machine with this problem:

$ objdump -t /lib/libc.so.6 | grep strlen
00000000 l    df *ABS*	00000000              strlen.os
00082d80 l     F .text	000000dc              __GI_strlen
00082d80 g     F .text	000000dc              strlen

Apparently 'I' or 'i' would mean ifunc.

For comparison on my x86-64 laptop:

$ objdump -t /lib/libc.so.6 | grep strlen
00082e30 l     F .text	000000bb              __GI_strlen
00082e30 l     F .text	000000bb              __strlen_ia32
001681f0 l     F .text	0000031c              __strlen_sse2
0008d4a0 l     F .text	000000c8              __strlen_sse2_bsf
0008c4b0 g     F .text	00000015              __strlen_g
00082df0 g   i   .text	00000039              strlen

Comment 15 Richard W.M. Jones 2016-03-01 23:04:27 UTC
Hmmm, spot the mistake.  On my x86-64 laptop, this time
selecting the 64 bit library:

$ objdump -t /lib64/libc.so.6 | grep strlen
00000000000904e0 l     F .text	000000000000019c              __GI_strlen
00000000000904e0 g     F .text	000000000000019c              strlen

Comment 16 Florian Weimer 2016-03-01 23:06:27 UTC
(In reply to Richard W.M. Jones from comment #6)

> https://github.com/gperftools/gperftools/commit/
> 6fdfc5a7f40ebcff3fdaada1a2994ff54be2f9c7

This does far too much work in an IFUNC handler.  In particular, you can't assume that strlen and memcmp have already been relocated.

To some degree, this is a missing glibc feature.  The IFUNC facility is exposed externally, but there is no good documentation on how to use it, and it is difficult to find scenarios which are outside glibc itself and well-defined with the current implementation.

Comment 17 Florian Weimer 2016-03-01 23:14:34 UTC
To expand a bit on comment 16, the core issue here is the execution order of IFUNC resolvers.  This is something that the current design does not address at all, and it makes it difficult to write useful IFUNC resolvers that work reliable (not just in the presence of BIND_NOW, but it obviously exacerbates the ordering issue).

Comment 18 Aliaksei Kandratsenka 2016-03-01 23:22:09 UTC
I see. BTW ifunc is documented in gcc's manual. It was also advertised in gcc release notes when gcc added this support.

I'm curious what is exact reason why calling strlen from ifunc resolver is bad. I can see how mutual recursion is a problem (i.e. if foo's ifunc resolver calls bar and bar's ifunc resolver calls foo). But it isn't the case here.

gperftools' sized delete ifunc handler calls strlen@plt which, when strlen isn't resolved yet, calls to linker, which should just work. I'm assuming that linker's plt resolver code is able to deal with recusions. But I see no reason why it can't.

Can somebody elaborate exactly why it is not just fixable glibc bug, but a problem in gpertfools expectations?

Comment 19 Paolo Bonzini 2016-03-02 16:41:33 UTC
GetenvBeforeMain is already special, why not open code it completely?

Something like this:

diff --git a/src/base/sysinfo.cc b/src/base/sysinfo.cc
index cad751b..b0afe50 100644
--- a/src/base/sysinfo.cc
+++ b/src/base/sysinfo.cc
@@ -122,13 +122,17 @@
 const char* GetenvBeforeMain(const char* name) {
 #if defined(HAVE___ENVIRON)   // if we have it, it's declared in unistd.h
   if (__environ) {            // can exist but be NULL, if statically linked
-    const int namelen = strlen(name);
     for (char** p = __environ; *p; p++) {
-      if (strlen(*p) < namelen) {
-        continue;
+      for (int i = 0; ; i++) {
+	if ((*p)[i] == name[i]) {
+          if (name[i] == 0)
+            break;
+	} else {
+          if (name[i] == 0 && (*p)[i] == '=') // it's a match
+            return *p + i+1;                  // point after =
+          break;
+	}
       }
-      if (!memcmp(*p, name, namelen) && (*p)[namelen] == '=')  // it's a match
-        return *p + namelen+1;                                 // point after =
     }
     return NULL;
   }

Comment 20 Richard W.M. Jones 2016-03-02 18:53:44 UTC
The previous patch doesn't work because we don't HAVE__ENVIRON.

Unfortunately a fix in the code which is actually run is a lot
harder because we'd need replacements for open(2), read(2), close(2),
since presumably those calls could be randomly affected by the
ifunc/linker ordering problem (although only strlen is affected
right now, so maybe we can get away with just patching strlen).

Comment 21 Aliaksei Kandratsenka 2016-03-02 19:18:21 UTC
Hi Paolo & Richard.

I'm certainly open to workarounds. But before we go down that path can you have somebody from dynamic linker side take a quick look at this and figure out exactly why call to strlen causes bogus jump?

If this is some hard to fix limitation in glibc, then we'll work around. Or even change away from ifunc approach. But if that is simply some fixable glibc bug, then we won't have to do any workarounds. My naive and largely ignorant understanding of how dynamic linker works makes me think that this is not hopeless.

Comment 22 Richard W.M. Jones 2016-03-03 13:43:52 UTC
Created attachment 1132790 [details]
ifuncstrlen.c

I thought it would be useful to have a simple reproducer of
the "bug" (if this is indeed a bug).

If you compile and run the attached program on armv7 with:

  $ gcc -Wall ifuncstrlen.c -o ifuncstrlen -Wl,-z,now
  $ ./ifuncstrlen
  Segmentation fault (core dumped)

as you can see it fails.

If you remove the -z now parameter, then it doesn't fail.

If you modify the program to remove the call to strlen, then
it also doesn't fail (even with -z now).

If you compile the program on x86-64, then it doesn't fail
(the bug is only triggered on armv7).

Comment 23 Richard W.M. Jones 2016-03-03 13:46:16 UTC
The version of gcc I'm using is:

gcc-6.0.0-0.13.fc25.armv7hl

Additional data point: does NOT fail on aarch64.

Comment 24 Jakub Jelinek 2016-03-03 13:52:38 UTC
Why have you reassigned this to gcc?  It has nothing to do with gcc.
The fact that on some architectures you are significantly limited in what you actually can (if anything) call from ifunc handlers is a glibc limitation, related to the ordering of relocation processing and making sure that what you call is actually already relocated.  gcc has nothing to say here.

Comment 25 Richard W.M. Jones 2016-03-03 14:03:29 UTC
Let's try glibc then.

Comment 26 Carlos O'Donell 2016-03-03 17:49:29 UTC
Same issue but for ppc64 in F24.
https://bugzilla.redhat.com/show_bug.cgi?id=1314483

Comment 27 Carlos O'Donell 2016-03-03 21:20:30 UTC
This is an unsupported use of GNU IFUNC.

See: https://sourceware.org/glibc/wiki/GNU_IFUNC

If gperftools wishes to use IFUNC and support -Wl,z,now, they must not call any library functions in the ifunc resolver. That is a strict limitation of the process startup. Such a limitation can be lifted, but it is a cross-platform project to standardize the startup ordering to facilitate this. It happens to work on x86_64 given the way the relocations were handled, but it will not work on ARM and POWER.

Moving to gperftools to remove the invalid use of IFUNCs coupled with -Wl,z,now. Alternatively someone should file an RFE to have the Red Hat tools team do the cross-platform work to make IFUNCs usable in this use case for applications.

Comment 28 Aliaksei Kandratsenka 2016-03-04 01:20:21 UTC
Thanks for update Carlos. Is there any way I can detect bind-now condition from inside ifunc handler ?

Comment 29 Carlos O'Donell 2016-03-04 16:43:10 UTC
(In reply to Aliaksei Kandratsenka from comment #28)
> Thanks for update Carlos. Is there any way I can detect bind-now condition
> from inside ifunc handler ?

There is presently no way for the application to determine if it was started with immediate symbol resolution (LD_BIND_NOW=1) or lazy symbol resolution (default). Even if you had an API, it is not clear that you could prevent an optimizing compiler from emitting a library call where you didn't want one e.g. optimizing a store loop into a call to memcpy because you asked for space optimizations via -Os. Therefore, in cases like this, where you are implementing a foreign function to be called by the dynamic loader at early startup, you really really have to write simple code or it won't work. We don't presently have any compiler enforcement of the rules, because the rules were not clear.

I've filed an upstream gcc bug to put limits on functions marked with the ifunc attribute:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70082

I've filed an upstream glibc bug to enhance x86_64 to support having AT_HWCAPS passed into the resolver so it can do something useful:
https://sourceware.org/bugzilla/show_bug.cgi?id=19766

I've raised the issue of passing the environment to the ifunc resolvers upstream:
https://www.sourceware.org/ml/libc-alpha/2016-03/msg00102.html

In the meantime you need to keep your IFUNC resolvers simple, without disabling hardening :-)

Comment 30 Aliaksei Kandratsenka 2016-03-05 18:37:46 UTC
Thanks Carlos. I think you forgot most important thing: to file glibc bug for making ifunc work :) From your link I understand that it's not inherent issue in ifunc design, but rather glibc limitation (and e.g. other libc implementations might support it properly already).

Regarding gperftools. fedora version should certainly compile it with --disable-dynamic-sized-delete-support. As for upstream, I'm not yet sure, but we'll likely disable it by default as well.

Comment 31 Jakub Jelinek 2016-03-05 18:40:55 UTC
The problem is inherent issue to the ifunc design.  You can perhaps improve some selected cases, but generally, ifunc resolver often has to run before all needed relocation processing has been done.

Comment 32 Aliaksei Kandratsenka 2016-03-05 18:42:58 UTC
Interesting. I'm perhaps misunderstanding then.

Can you please elaborate? My understanding is that if ifunc resolver for bar needs to call foo and foo isn't yet relocated, then runtime linker will simply receive call for dl_fixup and can relocate foo as needed. No?

Comment 33 Jakub Jelinek 2016-03-05 18:53:19 UTC
That is the case only during lazy binding.  With LD_BIND_NOW=1, or -Wl,-z,now, etc., the relocations need to be resolved far earlier.  The data relocations might not be applied yet, either for the current library, or e.g. if you attempt to call a function in some other library, data relocations in that other library might not be relocated yet.  Similarly, for PLT relocations, even in the current library, most architectures don't have PLT slots set up for lazy binding without any extra action.  If it depends only on relative relocations, you might be lucky and those might be already relocated, but you might not be as lucky.  Usually lazy binding is also not set up if the library is being resolved with LD_BIND_NOW=1.  Then you can have 2 PLT slots, one say for a call to IFUNC function, another for a call that the IFUNC resolver needs to call.  If you are lucky, the other PLT slot is resolved first and IFUNC resolver can call it, but again, it might be the other way around, which really depends on the linker ordering of things.  The lazy binding case is of course the easy one, as long as it isn't called too early (e.g. from other IFUNC resolver that is not bound lazily).

Comment 34 Aliaksei Kandratsenka 2016-03-05 19:07:40 UTC
Thanks, Jakub. I'm still unable to see "inherent".

My model of LD_BIND_NOW is (and possibly too native; feel free to correct me) is that: everything works same as lazy relocations case; but dynamic linker simply forces relocations for all functions early.

I.e. I see no inherent reason why eager binding has to prevent linker from supporting lazy relocations during eager binding phase.

Comment 35 Aliaksei Kandratsenka 2016-03-05 19:07:53 UTC
Thanks, Jakub. I'm still unable to see "inherent".

My model of LD_BIND_NOW is (and possibly too naive; feel free to correct me) is that: everything works same as lazy relocations case; but dynamic linker simply forces relocations for all functions early.

I.e. I see no inherent reason why eager binding has to prevent linker from supporting lazy relocations during eager binding phase.

Comment 36 Jakub Jelinek 2016-03-05 19:34:22 UTC
The relocation processing needs to be fast.  So, relocating e.g. every PLT slot twice, once to set it up for lazy binding and then once again, is too costly.
IFUNCs were added for picking best implementation of important functions for the particular HW, so just check hwcaps or cpuid, return some function pointer based on that, not running significant amount of code in there.

Comment 37 Aliaksei Kandratsenka 2016-03-05 20:25:37 UTC
Thanks.

That is not implemented and not intended to support more general case is clear.

I think libc can still support fully general ifunc even with eager binding. I.e. by relocating non-ifunc symbols eagerly (and efficiently), then setting up ifunc symbols for lazy relocation and forcing them one by one.

With all that said, I can see how and why glibc can simply decide to narrow down ifunc case to "you cannot call anything at all".

So we can close this ticket as "won't fix" or something along this lines. I'll leave this code in gperftools, at least for now, but I'll disable it by default.

For gperftools it would be really great if glibc could support more general case. Here is full reasoning behind gperftools attempt to use ifunc.

So gcc 5 added support for sized deallocation. Which is, for delete statements when deleted object's size is known, compiler will call sized delete operator. Which in addition to object pointer also receives size. For tcmalloc it enable slightly faster deletion since it doesn't have to look up size class from pointer. This feature however has few ugly corner cases. Most serious is classes with flexible length (i.e. struct Vector {size_t len; char data[1]};), where previously valid code (delete vecp;) has to be changed to deal with sized delete (i.e. explicitly calling non-sized global operator delete).

Because of this issues, it would be unwise if tcmalloc simply unconditionally supported sized delete. Since it would likely crash many apps built with gcc 5 and -std=c++14.

So I've added --enable-sized-delete, which is off by default. And which enable unconditional sized delete. It is mostly for folks who link tcmalloc statically and know that their code is safe. Programs that want all performance from malloc, will link it statically anyways to save on plt indirections. This is how tcmalloc is used in google.

And I've added ability to enable sized delete support at runtime. So that distros can ship tcmalloc.so with functional, but disabled by default sized-delete support. And both programs (via weak function) and users (via environment variable) can opt-in at runtime for sized-delete. There seem to be no way to support zero-cost runtime switch for sized delete, other than ifunc. I.e. even simple load-enabledness-flag, test & conditionally branch is IMHO too costly, since size class lookup savings aren't huge.

BTW I've found some relevant post on musl libc. And they want ifunc's scope to be reduced. Doesn't mean their voice has to count (and I find some of their arguments incorrect), but that's another opinion.

-- start quote --

One feature musl intentionally does not yet support is "IFUNC"
resolvers in the dynamic linker. These are a feature by which a
program can arrange for a symbol to resolve to different versions of a
function at runtime depending on cpu capabilities or similar. What's
blocking IFUNC support in musl is a specification for the constraints
on the resolver function. Since it runs in a context that happens
prior to execution of global ctors for the containing module, in a
context where relocations have not yet completed, and with locks held
in the dynamic linker, there are A LOT of implementation-imposed
constraints on what such a function can do. And sweeping those under
the rug and just saying "you can do whatever seems to work" is not a
reasonable approach for a high-quality implementation; from my
perspective, it's better not to have the feature at all than to have a
version where internal changes might break something that seems
legitimate.

-- end quote --

Comment 38 Marek Skalický 2016-03-08 14:05:28 UTC
Could this bug be somehow related to bug 1315745 ?

Comment 39 Carlos O'Donell 2016-03-08 14:20:10 UTC
*** Bug 1314483 has been marked as a duplicate of this bug. ***

Comment 40 Carlos O'Donell 2016-03-08 14:20:18 UTC
*** Bug 1315745 has been marked as a duplicate of this bug. ***

Comment 41 Carlos O'Donell 2016-03-08 14:20:52 UTC
Fix needed on F24 and Rawhide.

Comment 42 Tom "spot" Callaway 2016-03-08 23:48:08 UTC
gperftools-2.4.91-1.fc24 and gperftools-2.4.91-1.fc25 are now built. Upstream has disabled sized delete by default, so I have re-enabled hardening builds for all arches. Please test and reopen this bug if things are not resolved.

Comment 43 Marek Skalický 2016-03-09 08:14:46 UTC
Probably build for F24 does not finished before activation of Bodhi for F24 - today. Because it is not in Koji for other builds...

Could you check it please?

Comment 44 Fedora Update System 2016-03-09 14:47:25 UTC
gperftools-2.4.91-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-d2dc9a4039

Comment 45 Fedora Update System 2016-03-10 01:55:33 UTC
gperftools-2.4.91-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-d2dc9a4039

Comment 46 Fedora Update System 2016-03-26 18:16:02 UTC
gperftools-2.4.91-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 47 Richard W.M. Jones 2017-05-22 09:27:41 UTC
It appears this bug (or something like it) has reappeared.
See: https://bugzilla.redhat.com/show_bug.cgi?id=1452813