A fix in DTS would be nice because it enables us to use DTS for glibc development on POWER. +++ This bug was initially created as a clone of Bug #1467526 +++ The invalid IFUNC resolver is in libgcc, and probably needs to be fixed there. Peter Bergner already suggested a patch: https://sourceware.org/ml/libc-alpha/2017-06/msg01383.html Afterwards, we need to rebuild glibc with the fixed gcc package. +++ This bug was initially created as a clone of Bug #1467518 +++ Upstream glibc master started linking in have_ieee_hw_p from libgcc on ppc64le. This leads to a crash on the last line because getauxval uses data which has not been initialized yet at this point. The crash is at the last line of the disassembly. 00000000001c3380 <have_ieee_hw_p>: 1c3380: 08 00 4c 3c addis r2,r12,8 1c3384: 80 3d 42 38 addi r2,r2,15744 1c3388: f8 ff e1 fb std r31,-8(r1) 1c338c: a0 8c e2 eb ld r31,-29536(r2) 1c3390: d1 ff 21 f8 stdu r1,-48(r1) 1c3394: 02 00 3f e9 lwa r9,0(r31) 1c3398: 00 00 89 2f cmpwi cr7,r9,0 1c339c: 14 00 9c 41 blt cr7,1c33b0 <have_ieee_hw_p+0x30> 1c33a0: 30 00 21 38 addi r1,r1,48 1c33a4: 78 4b 23 7d mr r3,r9 1c33a8: f8 ff e1 eb ld r31,-8(r1) 1c33ac: 20 00 80 4e blr 1c33b0: a6 02 08 7c mflr r0 1c33b4: 0f 00 60 38 li r3,15 1c33b8: 40 00 01 f8 std r0,64(r1) 1c33bc: 15 fc e5 4b bl 22fd0 <00000036.plt_call.__getauxval> 1c33c0: 18 00 41 e8 ld r2,24(r1) So far, this happens only with --enable-bind-now builds. I'll disable that on ppc64le as an immediate workaround, but we'll need an upstream fix for this (in glibc or GCC). --- Additional comment from Florian Weimer on 2017-07-07 12:13:10 CEST --- Upstream patch submission: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00348.html --- Additional comment from Carlos O'Donell on 2017-07-07 22:26:09 CEST --- I've reached out to Jakub/Marek to see what we can do between gcc/glibc to fix this quickly because it looks like the s390x import and the Go 1.9 dependent rebuilds need the mass rebuild so we have to get this fixed. --- Additional comment from Alexander Bokovoy on 2017-07-10 16:27:43 CEST --- This blocks building FreeIPA in rawhide because java crashes when run as part of freeipa build process on ppc64le. I reproduced this in a mock chroot on ppc64le-test.fedorainfracloud.org when investigating ppc64le build failure for https://koji.fedoraproject.org/koji/taskinfo?taskID=20438824 (gdb) set args -Xss512k -classpath /usr/share/java/js.jar org.mozilla.javascript.tools.shell.Main /builddir/build/BUILD/freeipa-4.5.2/install/ui/util/build/build.js baseUrl=/builddir/build/BUILD/freeipa-4.5.2/install/ui/util/build load=build profile=/builddir/build/BUILD/freeipa-4.5.2/install/ui/util/../src/webui.profile.js (gdb) run Starting program: /usr/bin/java -Xss512k -classpath /usr/share/java/js.jar org.mozilla.javascript.tools.shell.Main /builddir/build/BUILD/freeipa-4.5.2/install/ui/util/build/build.js baseUrl=/builddir/build/BUILD/freeipa-4.5.2/install/ui/util/build load=build profile=/builddir/build/BUILD/freeipa-4.5.2/install/ui/util/../src/webui.profile.js [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () Missing separate debuginfos, use: dnf debuginfo-install zlib-1.2.11-2.fc26.ppc64le (gdb) bt full #0 0x0000000000000000 in ?? () No symbol table info available. #1 0x00003fffb6cb2380 in ?? () No symbol table info available. #2 0x00003fffb6cb2838 in ?? () No symbol table info available. #3 0x00003fffb7fba73c in resolve_ifunc (sym_map=<optimized out>, map=<optimized out>, value=70367515977760) at ../sysdeps/powerpc/powerpc64/dl-machine.h:674 No locals. #4 elf_machine_rela (skip_ifunc=<optimized out>, reloc_addr_arg=0x3fffb6d40098, version=<optimized out>, sym=<optimized out>, reloc=0x3fffb6bf8c48, map=0x20030c10) at ../sysdeps/powerpc/powerpc64/dl-machine.h:729 refsym = 0x3fffb6bf1d00 value = 70367515977760 reloc_addr = 0x3fffb6d40098 r_type = 248 sym_map = <optimized out> #5 elf_dynamic_do_Rela (skip_ifunc=<optimized out>, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, map=<optimized out>) at do-rel.h:137 ndx = <optimized out> version = 0x3fffb6bf6d2a symtab = 0x3fffb6bf1d00 relative = <optimized out> r = 0x3fffb6bf8c48 #6 _dl_relocate_object (scope=0x20030f88, reloc_mode=<optimized out>, consider_profiling=<optimized out>) at dl-reloc.c:259 ranges = {{start = 7022344884575826688, size = 4044295413358932590, nrelative = 2321676217711866176, lazy = 959594552}, {start = 279172874248, size = 8097881642258923523, nrelative = 162659009062003, lazy = 0}} textrels = <optimized out> errstring = 0x0 lazy = <optimized out> skip_ifunc = <optimized out> #7 0x0000003c00000008 in ?? () No symbol table info available. Backtrace stopped: Cannot access memory at address 0x3140382039403810 (gdb)
I think a fix in DTS7 is absolutely required. In order for us to stress test RHEL7 + DTS7 we build upstream glibc and report build status upstream using these tools. On top of that we need to be ready at a moments notice to use DTS7 internally on all of our architectures in the event we need a newer compiler to solve a customer issue.
Is this patch all that needs to be done in GCC7? https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00348.html
(In reply to Marek Polacek from comment #2) > Is this patch all that needs to be done in GCC7? > https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00348.html AFAIK, Yes, that's the patch.
glibc 2.17 did not have the HWCAP fields in the TCB for __builtin_cpu_supports, though. That came in glibc 2.23 only. We could backport that, but then we run again into difficulties with RPM dependency management.
(In reply to Florian Weimer from comment #4) > glibc 2.17 did not have the HWCAP fields in the TCB for > __builtin_cpu_supports, though. That came in glibc 2.23 only. We could > backport that, but then we run again into difficulties with RPM dependency > management. You are absolutely right. We would need to provide __parse_hwcap_and_convert_at_platform. The new symbol is used by gcc to ensure that when compiling you get a reference to the new feature symbol, and the application won't run if you move it to a system with an older libc. I spoke with Jakub about this and the conclusion is as follows: (a) Low priority Without proper glibc support for float128 users will not be interested in using DTS7 on ppc64le using float128. Therefore this has to be low priority. (b) Ported to stub libgcc.a DTS7 uses the system libgcc_s.so, but provides it's own libgcc.a, so that will have an impact in supporting this configuration. (c) Forces binaries to require a newer version of glibc. We can't require binaries use a newer version of glibc, because rpm doesn't understand how the new symbol creates a new dependency. However, users would get an error trying to start the application and we would have to document that this means you need a new glibc. We could force DTS7 gcc to require that new glibc on ppc64le in order to get this working on developer workstations, but it goes against the idea of DTS7. You could argue that this is "the first release for ppc64le" and so can require the latest glibc, but that's a fragile requirement. In summary: ========== Given (a), (b) and (c), we probably have to not enable support for hardware float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come naturally in later RHEL as the core components are updated. Thoughts?
(In reply to Carlos O'Donell from comment #5) > In summary: > ========== > Given (a), (b) and (c), we probably have to not enable support for hardware > float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come > naturally in later RHEL as the core components are updated. > > Thoughts? Some alternatives exist: (1) Don't use IFUNC in gcc. Change all of the DTS7 HW/SW redirects for float128 to do the redirection at runtime and verify that none of those calls are earlier than when getauxval() data is setup. (2) Use a new POWER9 multilib. Create a POWER9 multilib for gcc which assumes HW float128, and is selected by ld.so based on AT_PLATFORM, and then have the POWER8 multilib assume SW float128. This seems like a lot of work at this point for a partial feature we can't fully support in glibc.
(In reply to Carlos O'Donell from comment #5) > Given (a), (b) and (c), we probably have to not enable support for hardware > float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come > naturally in later RHEL as the core components are updated. Do you propose to hardwire __builtin_cpu_supports ("ieee128") to 0 in the compiler, without calling getauxval or considering the HWCAP bits? I expect that this would work for glibc.
(In reply to Florian Weimer from comment #7) > (In reply to Carlos O'Donell from comment #5) > > Given (a), (b) and (c), we probably have to not enable support for hardware > > float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come > > naturally in later RHEL as the core components are updated. > > Do you propose to hardwire __builtin_cpu_supports ("ieee128") to 0 in the > compiler, without calling getauxval or considering the HWCAP bits? I expect > that this would work for glibc. Yes, but you need (2), a new multilib for POWER9, which allows you to make that assumption.
(In reply to Carlos O'Donell from comment #8) > (In reply to Florian Weimer from comment #7) > > (In reply to Carlos O'Donell from comment #5) > > > Given (a), (b) and (c), we probably have to not enable support for hardware > > > float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come > > > naturally in later RHEL as the core components are updated. > > > > Do you propose to hardwire __builtin_cpu_supports ("ieee128") to 0 in the > > compiler, without calling getauxval or considering the HWCAP bits? I expect > > that this would work for glibc. > > Yes, but you need (2), a new multilib for POWER9, which allows you to make > that assumption. Oh, I see, yes, you could always set it to 0 and get softfp support, but the point of POWER9 is get the hardware support :-)
(In reply to Carlos O'Donell from comment #9) > (In reply to Carlos O'Donell from comment #8) > > (In reply to Florian Weimer from comment #7) > > > (In reply to Carlos O'Donell from comment #5) > > > > Given (a), (b) and (c), we probably have to not enable support for hardware > > > > float128 for POWER9 from DTS7 (or RHEL7 at all). This support will come > > > > naturally in later RHEL as the core components are updated. > > > > > > Do you propose to hardwire __builtin_cpu_supports ("ieee128") to 0 in the > > > compiler, without calling getauxval or considering the HWCAP bits? I expect > > > that this would work for glibc. > > > > Yes, but you need (2), a new multilib for POWER9, which allows you to make > > that assumption. > > Oh, I see, yes, you could always set it to 0 and get softfp support, but the > point of POWER9 is get the hardware support :-) OK, by default this will happen: (a) configure.ac will detect glibc < 2.23 (b) rs6000.c will configure out the builtins that are used by the IFUNC and have them return 0. (c) Because of (b) all the ifuncs will default to SW support. So there isn't anything we need to do. I think this ticket can be CLOSED/WONTFIX, we'll just have softp float128 support in ppc64le, even on POWER9 hardware which could in theory support it. If the request ever materializes it would be for a fully supported implementation of float128 with glibc functions that accompany it and then we'd have to recommend something else.
Closing as per Comment 10.