If a shared library uses fixed size buffers as thread-local storage in the global-dynamic or local-dynamic model, and its compiled object files have been linked using ld.bfd with the -fno-plt flag for the ppc64le architecture, accessing those buffers will result in a SIGSEV. It makes no difference if the buffers are defined in global scope or function scope; even in function scope access from within the same function will trigger SIGSEGV. This does not happen when compiling / linking with gcc / ld.gold, nor with clang / lld. Please see the attached testcase for further information. Reproducible: Always, with any version of binutils in any active release of Fedora. Steps to Reproduce: 1. Try attached testcase on ppc64le. 2. SIGSEGV incoming. Actual Results: Program is terminated with SIGSEGV. Expected Results: Program should be running as expected.
Created attachment 2056369 [details] Minimal testcase
How confident are you in attributing this to the linker? GCC generates code like this: addis 12,2,0 .reloc .-4,R_PPC64_TLSLD,buf.1 .reloc .-4,R_PPC64_PLT16_HA,__tls_get_addr addi 31,31,.LC0@toc@l addi 3,3,buf.1@got@tlsld@l std 0,16(1) stdu 1,-48(1) .cfi_def_cfa_offset 48 .cfi_offset 65, 16 std 2,24(1) ld 12,0(12) .reloc .-4,R_PPC64_TLSLD,buf.1 .reloc .-4,R_PPC64_PLT16_LO_DS,__tls_get_addr mtctr 12 .reloc .-4,R_PPC64_TLSLD,buf.1 .reloc .-4,R_PPC64_PLTSEQ,__tls_get_addr .reloc .,R_PPC64_TLSLD,buf.1 .reloc .,R_PPC64_PLTCALL,__tls_get_addr bctrl And the ABI only mentions R_PPC64_REL24 for the __tls_get_addr relocation.
(In reply to Florian Weimer from comment #2) > How confident are you in attributing this to the linker? Pretty much certain. When linking with ld.bfd *and* explicit "-Wl,--no-tls-get-addr-optimize" flag there is no segfault and the "__tls_get_addr" function is used; so it seems the problems arises from the optimization performed by ld.bfd, as ld.gold implictly defaults - like ld.bdf does - to "Wl,--tls-get-addr-optimize" resulting in "__tls_get_addr_opt" being used in the resulting DSO.
CC'ing amodra, as they authored ld.bfd and gcc commits related to this topic.
Created attachment 2056900 [details] Updated testcase Updated the minimal testcase to show ld.gold or linking with ld.bfd using the '-Wl,--no-tls-get-addr-optimize' flags works. @amodra, can you please have a look, if you can find the reason behind the segfault with '__tls_get_addr_opt' and ld.bfd when compiling a DSO with '-fno-plt' using gcc?
Do you have any further ideas or solutions about this? Shall I forward my testcase to the Sourceware Bugzilla instance?
(In reply to Björn 'besser82' Esser from comment #6) > Do you have any further ideas or solutions about this? Shall I forward my > testcase to the Sourceware Bugzilla instance? Someone who knows the ABI needs to look at the GCC-generated code and decide if it's correct. The linker should not produce crashing binaries, but it's not clear to me at this point if the primary fix will be in GCC or binutils. I'll raise this with IBM.
Thanks for pointing me to this Florian. I do know we SEGV in strncpy from the local-dynamic version of the function. The first two calls look fine: Breakpoint 2, 0x00002000002d9a50 in __strncpy_power9 () from /lib64/glibc-hwcaps/power10/libc.so.6 (gdb) info registers r3 r4 r5 r3 0x2000000648f8 35184372500728 r4 0x2000000f0b70 35184373074800 r5 0x64 100 (gdb) c Continuing. _Thread_local __attribute ((tls_model("local-exec"))) works Breakpoint 2, 0x00002000002d9a50 in __strncpy_power9 () from /lib64/glibc-hwcaps/power10/libc.so.6 (gdb) info registers r3 r4 r5 r3 0x200000064890 35184372500624 r4 0x2000000f0bb0 35184373074864 r5 0x64 100 (gdb) c Continuing. _Thread_local __attribute ((tls_model("initial-exec"))) works The call from the local-dynamic version passes a bogus buf pointer which will cause us to SEGV: Breakpoint 2, 0x00002000002d9a50 in __strncpy_power9 () from /lib64/glibc-hwcaps/power10/libc.so.6 (gdb) info registers r3 r4 r5 r3 0x1069 4201 r4 0x2000000f0bf0 35184373074928 r5 0x64 100 I'll see if I can track down why that is happening.
I opened upstream bug: https://sourceware.org/PR32387 The basic problem is the inline PLT call branches directly to __tls_get_addr rather than going through the PLT call stub and __tls_get_addr's PLT call stub is "special" in that is contains extra code to handle the module id == 0 case (ie, local-dynamic optimized) and return directly to the caller rather than branching to __tls_get_addr. The __tls_get_addr routine is unequipped to handle the module id == 0 special case and that leads to our bug. Using --no-tls-get-addr-optimize just disables the local-dynamic optimization, so when we branch directly to __tls_get-addr, we have a normal module id != 0 value which it is equipped to handle. I've suggested a few "solutions" in the upstream bug, but I would like Alan's thoughts given he designed and implemented the entirety of PowerPC's TLS support across all of the the toolchain components.
Yes, the inline plt code emitted by gcc is incompatible with the linker/ld.so --tls-get-addr-optimize scheme. (This is the runtime optimisation where the first call to __tls_get_addr results in __tls_get_addr updating the tls_index pair, then the special linker stub using that to short-circuit second and subsequent calls for a given tls symbol. Enabled by default when the linker sees __tls_get_addr_opt is preseent, and enabled in ld.so when DT_PPC64_OPT has PPC64_OPT_TLS set. Note that this is distinct from link-time tls optimisation.) Clearly, the linker should not be setting PPC64_OPT_TLS when detecting inline plt code calling __tls_get_addr. This can be done by adding some code to ppc64_elf_check_relocs. Peter's other solutions should work too.
I've created pull requests [1,2,3] targeting all maintained Fedora releases to add the fix as commited upstream. It would be nice if Nick or Florian can have a look into them. Many thanks to Peter and Alan for dealing that quickly with this issue! [1] https://src.fedoraproject.org/rpms/binutils/pull-request/59 [2] https://src.fedoraproject.org/rpms/binutils/pull-request/60 [3] https://src.fedoraproject.org/rpms/binutils/pull-request/61
Hi Björn The pull requests all appear to be showing build failures or test failures. I will take a look into them and try to resolve the problems. I may just close pull-request 59 however as I am going to rebase the rawhide binutils to a snapshot of today's upstream sources, which will bring in the patch. (I have been doing this on a bi-weekly basis in order to help detect problems with the new sources as early as possible). Cheers Nick
Update: Rawhide: I have closed the merge request and instead rebased the sources to a snapshot of the upstream development sources. The new linker should be available in the binutils-2.43.50-9 build. The build was stuck in gating due to some kind of test harness problem, so I have waived the result and the new build should be reaching the buildroot soon. F41: I have accepted the merge request, but the binutils package is not completing its build due to an unrelated linker testsuite problem. (The problem is related to an update to gcc in F41). I have a patch to fix the linker tests involved and a new build is currently underway. F40: Just like F41 I have accepted the merge request but the same linker testsuite problem was stopping the build from completing. I have committed the update that resolves this issue and the new build - binutils-2.41-38.fc40 - is complete. I have filed a Bodhi update request for this build, and it is currently undergoing testing.
@nickc, the fixing updates to f40 and f41 for binutils are currently hanging in Bodhi on thier way to stable, as some required test have failed (for likely unrelated reasons), and need some intervention / investigation.
Gating has completed. The fixed builds are now in the buildroots for F40, F41 and F42.
Thank you! The update to F41 will go stable with next compose; the other update have hit stable already. Closing here, as the bug is fixed.