Bug 1470115 - gcc: Invalid IFUNC resolver in libgcc calls getauxval, leading to ppc64le relocation crash
Summary: gcc: Invalid IFUNC resolver in libgcc calls getauxval, leading to ppc64le rel...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Developer Toolset
Classification: Red Hat
Component: gcc
Version: DTS 7.0 RHEL 7
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: alpha
: 7.0
Assignee: Marek Polacek
QA Contact: Martin Cermak
URL:
Whiteboard:
Depends On: 1467526
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-12 12:14 UTC by Florian Weimer
Modified: 2018-04-30 16:46 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1467526
Environment:
Last Closed: 2018-04-30 16:46:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1427670 1 None None None 2021-01-20 06:05:38 UTC
Sourceware 21707 0 None None None 2019-04-28 23:57:48 UTC

Internal Links: 1427670

Description Florian Weimer 2017-07-12 12:14:46 UTC
A fix in DTS would be nice because it enables us to use DTS for glibc development on POWER.

+++ This bug was initially created as a clone of Bug #1467526 +++

The invalid IFUNC resolver is in libgcc, and probably needs to be fixed there.  Peter Bergner already suggested a patch:

https://sourceware.org/ml/libc-alpha/2017-06/msg01383.html

Afterwards, we need to rebuild glibc with the fixed gcc package.

+++ This bug was initially created as a clone of Bug #1467518 +++

Upstream glibc master started linking in have_ieee_hw_p from libgcc on ppc64le.  This leads to a crash on the last line because getauxval uses data which has not been initialized yet at this point.  The crash is at the last line of the disassembly.

00000000001c3380 <have_ieee_hw_p>:
  1c3380:       08 00 4c 3c     addis   r2,r12,8
  1c3384:       80 3d 42 38     addi    r2,r2,15744
  1c3388:       f8 ff e1 fb     std     r31,-8(r1)
  1c338c:       a0 8c e2 eb     ld      r31,-29536(r2)
  1c3390:       d1 ff 21 f8     stdu    r1,-48(r1)
  1c3394:       02 00 3f e9     lwa     r9,0(r31)
  1c3398:       00 00 89 2f     cmpwi   cr7,r9,0
  1c339c:       14 00 9c 41     blt     cr7,1c33b0 <have_ieee_hw_p+0x30>
  1c33a0:       30 00 21 38     addi    r1,r1,48
  1c33a4:       78 4b 23 7d     mr      r3,r9
  1c33a8:       f8 ff e1 eb     ld      r31,-8(r1)
  1c33ac:       20 00 80 4e     blr
  1c33b0:       a6 02 08 7c     mflr    r0
  1c33b4:       0f 00 60 38     li      r3,15
  1c33b8:       40 00 01 f8     std     r0,64(r1)
  1c33bc:       15 fc e5 4b     bl      22fd0 <00000036.plt_call.__getauxval>
  1c33c0:       18 00 41 e8     ld      r2,24(r1)

So far, this happens only with --enable-bind-now builds.  I'll disable that on ppc64le as an immediate workaround, but we'll need an upstream fix for this (in glibc or GCC).

--- Additional comment from Florian Weimer on 2017-07-07 12:13:10 CEST ---

Upstream patch submission:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00348.html

--- Additional comment from Carlos O'Donell on 2017-07-07 22:26:09 CEST ---

I've reached out to Jakub/Marek to see what we can do between gcc/glibc to fix this quickly because it looks like the s390x import and the Go 1.9 dependent rebuilds need the mass rebuild so we have to get this fixed.

--- Additional comment from Alexander Bokovoy on 2017-07-10 16:27:43 CEST ---

This blocks building FreeIPA in rawhide because java crashes when run as part of freeipa build process on ppc64le. I reproduced this in a mock chroot on  ppc64le-test.fedorainfracloud.org  when investigating ppc64le build failure for https://koji.fedoraproject.org/koji/taskinfo?taskID=20438824

(gdb) set args -Xss512k -classpath /usr/share/java/js.jar org.mozilla.javascript.tools.shell.Main /builddir/build/BUILD/freeipa-4.5.2/install/ui/util/build/build.js baseUrl=/builddir/build/BUILD/freeipa-4.5.2/install/ui/util/build load=build profile=/builddir/build/BUILD/freeipa-4.5.2/install/ui/util/../src/webui.profile.js
(gdb) run
Starting program: /usr/bin/java -Xss512k -classpath /usr/share/java/js.jar org.mozilla.javascript.tools.shell.Main /builddir/build/BUILD/freeipa-4.5.2/install/ui/util/build/build.js baseUrl=/builddir/build/BUILD/freeipa-4.5.2/install/ui/util/build load=build profile=/builddir/build/BUILD/freeipa-4.5.2/install/ui/util/../src/webui.profile.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
Missing separate debuginfos, use: dnf debuginfo-install zlib-1.2.11-2.fc26.ppc64le
(gdb) bt full
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#1  0x00003fffb6cb2380 in ?? ()
No symbol table info available.
#2  0x00003fffb6cb2838 in ?? ()
No symbol table info available.
#3  0x00003fffb7fba73c in resolve_ifunc (sym_map=<optimized out>, map=<optimized out>, value=70367515977760) at ../sysdeps/powerpc/powerpc64/dl-machine.h:674
No locals.
#4  elf_machine_rela (skip_ifunc=<optimized out>, reloc_addr_arg=0x3fffb6d40098, version=<optimized out>, sym=<optimized out>, reloc=0x3fffb6bf8c48, map=0x20030c10)
    at ../sysdeps/powerpc/powerpc64/dl-machine.h:729
        refsym = 0x3fffb6bf1d00
        value = 70367515977760
        reloc_addr = 0x3fffb6d40098
        r_type = 248
        sym_map = <optimized out>
#5  elf_dynamic_do_Rela (skip_ifunc=<optimized out>, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=<optimized out>) at do-rel.h:137
        ndx = <optimized out>
        version = 0x3fffb6bf6d2a
        symtab = 0x3fffb6bf1d00
        relative = <optimized out>
        r = 0x3fffb6bf8c48
#6  _dl_relocate_object (scope=0x20030f88, reloc_mode=<optimized out>, consider_profiling=<optimized out>) at dl-reloc.c:259
        ranges = {{start = 7022344884575826688, size = 4044295413358932590, nrelative = 2321676217711866176, lazy = 959594552}, {start = 279172874248, size = 8097881642258923523, 
            nrelative = 162659009062003, lazy = 0}}
        textrels = <optimized out>
        errstring = 0x0
        lazy = <optimized out>
        skip_ifunc = <optimized out>
#7  0x0000003c00000008 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x3140382039403810
(gdb)

Comment 1 Carlos O'Donell 2017-07-12 12:47:18 UTC
I think a fix in DTS7 is absolutely required.

In order for us to stress test RHEL7 + DTS7 we build upstream glibc and report build status upstream using these tools.

On top of that we need to be ready at a moments notice to use DTS7 internally on all of our architectures in the event we need a newer compiler to solve a customer issue.

Comment 2 Marek Polacek 2017-07-12 12:54:17 UTC
Is this patch all that needs to be done in GCC7?
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00348.html

Comment 3 Carlos O'Donell 2017-07-12 12:58:54 UTC
(In reply to Marek Polacek from comment #2)
> Is this patch all that needs to be done in GCC7?
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00348.html

AFAIK, Yes, that's the patch.

Comment 4 Florian Weimer 2017-07-12 13:02:39 UTC
glibc 2.17 did not have the HWCAP fields in the TCB for __builtin_cpu_supports, though.  That came in glibc 2.23 only.  We could backport that, but then we run again into difficulties with RPM dependency management.

Comment 5 Carlos O'Donell 2017-07-13 16:38:15 UTC
(In reply to Florian Weimer from comment #4)
> glibc 2.17 did not have the HWCAP fields in the TCB for
> __builtin_cpu_supports, though.  That came in glibc 2.23 only.  We could
> backport that, but then we run again into difficulties with RPM dependency
> management.

You are absolutely right.

We would need to provide __parse_hwcap_and_convert_at_platform.

The new symbol is used by gcc to ensure that when compiling you get a reference to the new feature symbol, and the application won't run if you move it to a system with an older libc.

I spoke with Jakub about this and the conclusion is as follows:

(a) Low priority

Without proper glibc support for float128 users will not be interested in using DTS7 on ppc64le using float128. Therefore this has to be low priority.

(b) Ported to stub libgcc.a

DTS7 uses the system libgcc_s.so, but provides it's own libgcc.a, so that will have an impact in supporting this configuration.

(c) Forces binaries to require a newer version of glibc.

We can't require binaries use a newer version of glibc, because rpm doesn't understand how the new symbol creates a new dependency.

However, users would get an error trying to start the application and we would have to document that this means you need a new glibc.

We could force DTS7 gcc to require that new glibc on ppc64le in order to get this working on developer workstations, but it goes against the idea of DTS7.

You could argue that this is "the first release for ppc64le" and so can require the latest glibc, but that's a fragile requirement.

In summary:
==========
Given (a), (b) and (c), we probably have to not enable support for hardware float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come naturally in later RHEL as the core components are updated.

Thoughts?

Comment 6 Carlos O'Donell 2017-07-13 17:21:28 UTC
(In reply to Carlos O'Donell from comment #5)
> In summary:
> ==========
> Given (a), (b) and (c), we probably have to not enable support for hardware
> float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come
> naturally in later RHEL as the core components are updated.
> 
> Thoughts?

Some alternatives exist:

(1) Don't use IFUNC in gcc.

Change all of the DTS7 HW/SW redirects for float128 to do the redirection at runtime and verify that none of those calls are earlier than when getauxval() data is setup.

(2) Use a new POWER9 multilib.

Create a POWER9 multilib for gcc which assumes HW float128, and is selected by ld.so based on AT_PLATFORM, and then have the POWER8 multilib assume SW float128.

This seems like a lot of work at this point for a partial feature we can't fully support in glibc.

Comment 7 Florian Weimer 2017-07-13 17:44:02 UTC
(In reply to Carlos O'Donell from comment #5)
> Given (a), (b) and (c), we probably have to not enable support for hardware
> float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come
> naturally in later RHEL as the core components are updated.

Do you propose to hardwire __builtin_cpu_supports ("ieee128") to 0 in the compiler, without calling getauxval or considering the HWCAP bits?  I expect that this would work for glibc.

Comment 8 Carlos O'Donell 2017-07-13 17:45:21 UTC
(In reply to Florian Weimer from comment #7)
> (In reply to Carlos O'Donell from comment #5)
> > Given (a), (b) and (c), we probably have to not enable support for hardware
> > float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come
> > naturally in later RHEL as the core components are updated.
> 
> Do you propose to hardwire __builtin_cpu_supports ("ieee128") to 0 in the
> compiler, without calling getauxval or considering the HWCAP bits?  I expect
> that this would work for glibc.

Yes, but you need (2), a new multilib for POWER9, which allows you to make that assumption.

Comment 9 Carlos O'Donell 2017-07-13 17:46:12 UTC
(In reply to Carlos O'Donell from comment #8)
> (In reply to Florian Weimer from comment #7)
> > (In reply to Carlos O'Donell from comment #5)
> > > Given (a), (b) and (c), we probably have to not enable support for hardware
> > > float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come
> > > naturally in later RHEL as the core components are updated.
> > 
> > Do you propose to hardwire __builtin_cpu_supports ("ieee128") to 0 in the
> > compiler, without calling getauxval or considering the HWCAP bits?  I expect
> > that this would work for glibc.
> 
> Yes, but you need (2), a new multilib for POWER9, which allows you to make
> that assumption.

Oh, I see, yes, you could always set it to 0 and get softfp support, but the point of POWER9 is get the hardware support :-)

Comment 10 Carlos O'Donell 2017-07-13 18:02:53 UTC
(In reply to Carlos O'Donell from comment #9)
> (In reply to Carlos O'Donell from comment #8)
> > (In reply to Florian Weimer from comment #7)
> > > (In reply to Carlos O'Donell from comment #5)
> > > > Given (a), (b) and (c), we probably have to not enable support for hardware
> > > > float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come
> > > > naturally in later RHEL as the core components are updated.
> > > 
> > > Do you propose to hardwire __builtin_cpu_supports ("ieee128") to 0 in the
> > > compiler, without calling getauxval or considering the HWCAP bits?  I expect
> > > that this would work for glibc.
> > 
> > Yes, but you need (2), a new multilib for POWER9, which allows you to make
> > that assumption.
> 
> Oh, I see, yes, you could always set it to 0 and get softfp support, but the
> point of POWER9 is get the hardware support :-)

OK, by default this will happen:

(a) configure.ac will detect glibc < 2.23
(b) rs6000.c will configure out the builtins that are used by the IFUNC and have them return 0.
(c) Because of (b) all the ifuncs will default to SW support.

So there isn't anything we need to do.

I think this ticket can be CLOSED/WONTFIX, we'll just have softp float128 support in ppc64le, even on POWER9 hardware which could in theory support it.

If the request ever materializes it would be for a fully supported implementation of float128 with glibc functions that accompany it and then we'd have to recommend something else.

Comment 11 Marek Polacek 2018-04-30 16:46:05 UTC
Closing as per Comment 10.


Note You need to log in before you can comment on or make changes to this bug.