2324491 – {global,local}-dynamic TLS causes SEGFAULTS in DSO with -fno-plt when ld.bfd is used on ppc64le

Bug 2324491 - {global,local}-dynamic TLS causes SEGFAULTS in DSO with -fno-plt when ld.bfd is used on ppc64le

Summary: {global,local}-dynamic TLS causes SEGFAULTS in DSO with -fno-plt when ld.bfd ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	binutils
Sub Component:
Version:	rawhide
Hardware:	ppc64le
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Nick Clifton
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-11-07 21:47 UTC by Björn Esser (besser82)
Modified:	2024-12-03 08:41 UTC (History)
CC List:	14 users (show)
Fixed In Version:	binutils-2.43.50-9.fc42, binutils-2.43.1-4.fc41, binutils-2.41-38.fc40
Clone Of:
Environment:
Last Closed:	2024-12-03 08:41:50 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Minimal testcase (1.62 KB, application/x-shellscript) 2024-11-07 21:49 UTC, Björn Esser (besser82)	no flags	Details
Updated testcase (3.39 KB, application/x-shellscript) 2024-11-10 18:32 UTC, Björn Esser (besser82)	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Sourceware	32387	0	P2	RESOLVED	ppc64 TLS optimization bug when built with -fno-plt	2024-11-26 11:30:40 UTC

Description Björn Esser (besser82) 2024-11-07 21:47:32 UTC

If a shared library uses fixed size buffers as thread-local storage in the global-dynamic or local-dynamic model, and its compiled object files have been linked using ld.bfd with the -fno-plt flag for the ppc64le architecture, accessing those buffers will result in a SIGSEV.

It makes no difference if the buffers are defined in global scope or function scope; even in function scope access from within the same function will trigger SIGSEGV.

This does not happen when compiling / linking with gcc / ld.gold, nor with clang / lld.

Please see the attached testcase for further information.

Reproducible: Always, with any version of binutils in any active release of Fedora.

Steps to Reproduce:
1. Try attached testcase on ppc64le.
2. SIGSEGV incoming.

Actual Results:  
Program is terminated with SIGSEGV.

Expected Results:  
Program should be running as expected.

Comment 1 Björn Esser (besser82) 2024-11-07 21:49:29 UTC

Created attachment 2056369 [details]
Minimal testcase

Comment 2 Florian Weimer 2024-11-08 11:11:38 UTC

How confident are you in attributing this to the linker? GCC generates code like this:

        addis 12,2,0
        .reloc .-4,R_PPC64_TLSLD,buf.1
        .reloc .-4,R_PPC64_PLT16_HA,__tls_get_addr
        addi 31,31,.LC0@toc@l
        addi 3,3,buf.1@got@tlsld@l
        std 0,16(1)
        stdu 1,-48(1)
        .cfi_def_cfa_offset 48
        .cfi_offset 65, 16
        std 2,24(1)
        ld 12,0(12)
        .reloc .-4,R_PPC64_TLSLD,buf.1
        .reloc .-4,R_PPC64_PLT16_LO_DS,__tls_get_addr
        mtctr 12
        .reloc .-4,R_PPC64_TLSLD,buf.1
        .reloc .-4,R_PPC64_PLTSEQ,__tls_get_addr
        .reloc .,R_PPC64_TLSLD,buf.1
        .reloc .,R_PPC64_PLTCALL,__tls_get_addr
        bctrl

And the ABI only mentions R_PPC64_REL24 for the __tls_get_addr relocation.

Comment 3 Björn Esser (besser82) 2024-11-08 12:47:13 UTC

(In reply to Florian Weimer from comment #2)
> How confident are you in attributing this to the linker?

Pretty much certain.  When linking with ld.bfd *and* explicit "-Wl,--no-tls-get-addr-optimize" flag there is no segfault and the "__tls_get_addr" function is used; so it seems the problems arises from the optimization performed by ld.bfd, as ld.gold implictly defaults - like ld.bdf does - to "Wl,--tls-get-addr-optimize" resulting in "__tls_get_addr_opt" being used in the resulting DSO.

Comment 4 Björn Esser (besser82) 2024-11-08 16:20:42 UTC

CC'ing amodra, as they authored ld.bfd and gcc commits related to this topic.

Comment 5 Björn Esser (besser82) 2024-11-10 18:32:04 UTC

Created attachment 2056900 [details]
Updated testcase

Updated the minimal testcase to show ld.gold or linking with ld.bfd using the '-Wl,--no-tls-get-addr-optimize' flags works.

@amodra, can you please have a look, if you can find the reason behind the segfault with '__tls_get_addr_opt' and ld.bfd when compiling a DSO with '-fno-plt' using gcc?

Comment 6 Björn Esser (besser82) 2024-11-21 12:27:01 UTC

Do you have any further ideas or solutions about this?  Shall I forward my testcase to the Sourceware Bugzilla instance?

Comment 7 Florian Weimer 2024-11-21 12:31:03 UTC

(In reply to Björn 'besser82' Esser from comment #6)
> Do you have any further ideas or solutions about this?  Shall I forward my
> testcase to the Sourceware Bugzilla instance?

Someone who knows the ABI needs to look at the GCC-generated code and decide if it's correct. The linker should not produce crashing binaries, but it's not clear to me at this point if the primary fix will be in GCC or binutils. I'll raise this with IBM.

Comment 8 Peter Bergner 2024-11-21 20:10:55 UTC

Thanks for pointing me to this Florian.

I do know we SEGV in strncpy from the local-dynamic version of the function.
The first two calls look fine:

Breakpoint 2, 0x00002000002d9a50 in __strncpy_power9 () from /lib64/glibc-hwcaps/power10/libc.so.6
(gdb) info registers r3 r4 r5
r3             0x2000000648f8      35184372500728
r4             0x2000000f0b70      35184373074800
r5             0x64                100
(gdb) c
Continuing.
_Thread_local __attribute ((tls_model("local-exec"))) works

Breakpoint 2, 0x00002000002d9a50 in __strncpy_power9 () from /lib64/glibc-hwcaps/power10/libc.so.6
(gdb) info registers r3 r4 r5
r3             0x200000064890      35184372500624
r4             0x2000000f0bb0      35184373074864
r5             0x64                100
(gdb) c
Continuing.
_Thread_local __attribute ((tls_model("initial-exec"))) works

The call from the local-dynamic version passes a bogus buf pointer which will cause us to SEGV:

Breakpoint 2, 0x00002000002d9a50 in __strncpy_power9 () from /lib64/glibc-hwcaps/power10/libc.so.6
(gdb) info registers r3 r4 r5
r3             0x1069              4201
r4             0x2000000f0bf0      35184373074928
r5             0x64                100


I'll see if I can track down why that is happening.

Comment 9 Peter Bergner 2024-11-23 15:37:53 UTC

I opened upstream bug: https://sourceware.org/PR32387

The basic problem is the inline PLT call branches directly to __tls_get_addr rather than going through the PLT call stub and __tls_get_addr's PLT call stub is "special" in that is contains extra code to handle the module id == 0 case (ie, local-dynamic optimized) and return directly to the caller rather than branching to __tls_get_addr.  The __tls_get_addr routine is unequipped to handle the module id == 0 special case and that leads to our bug.

Using --no-tls-get-addr-optimize just disables the local-dynamic optimization, so when we branch directly to __tls_get-addr, we have a normal module id != 0 value which it is equipped to handle.

I've suggested a few "solutions" in the upstream bug, but I would like Alan's thoughts given he designed and implemented the entirety of PowerPC's TLS support across all of the the toolchain components.

Comment 10 Alan Modra 2024-11-25 21:21:02 UTC

Yes, the inline plt code emitted by gcc is incompatible with the linker/ld.so --tls-get-addr-optimize scheme.  (This is the runtime optimisation where the first call to __tls_get_addr results in __tls_get_addr updating the tls_index pair, then the special linker stub using that to short-circuit second and subsequent calls for a given tls symbol.  Enabled by default when the linker sees __tls_get_addr_opt is preseent, and enabled in ld.so when DT_PPC64_OPT has PPC64_OPT_TLS set.  Note that this is distinct from link-time tls optimisation.)

Clearly, the linker should not be setting PPC64_OPT_TLS when detecting inline plt code calling __tls_get_addr.  This can be done by adding some code to ppc64_elf_check_relocs.  Peter's other solutions should work too.

Comment 11 Björn Esser (besser82) 2024-11-26 11:27:39 UTC

I've created pull requests [1,2,3] targeting all maintained Fedora releases to add the fix as commited upstream.  It would be nice if Nick or Florian can have a look into them.

Many thanks to Peter and Alan for dealing that quickly with this issue!


[1]  https://src.fedoraproject.org/rpms/binutils/pull-request/59
[2]  https://src.fedoraproject.org/rpms/binutils/pull-request/60
[3]  https://src.fedoraproject.org/rpms/binutils/pull-request/61

Comment 12 Nick Clifton 2024-11-26 15:04:14 UTC

Hi Björn

  The pull requests all appear to be showing build failures or test failures.  I will take a look into them and try to resolve the problems.

  I may just close pull-request 59 however as I am going to rebase the rawhide binutils to a snapshot of today's upstream sources, which will bring in the patch.  (I have been doing this on a bi-weekly basis in order to help detect problems with the new sources as early as possible).

Cheers
  Nick

Comment 13 Nick Clifton 2024-11-27 09:54:28 UTC

Update:

  Rawhide: I have closed the merge request and instead rebased the sources to a snapshot of the upstream development sources.  The new linker should be available in the binutils-2.43.50-9 build.  The build was stuck in gating due to some kind of test harness problem, so I have waived the result and the new build should be reaching the buildroot soon.

  F41: I have accepted the merge request, but the binutils package is not completing its build due to an unrelated linker testsuite problem.  (The problem is related to an update to gcc in F41).  I have a patch to fix the linker tests involved and a new build is currently underway.

  F40: Just like F41 I have accepted the merge request but the same linker testsuite problem was stopping the build from completing.  I have committed the update that resolves this issue and the new build - binutils-2.41-38.fc40 - is complete.  I have filed a Bodhi update request for this build, and it is currently undergoing testing.

Comment 14 Björn Esser (besser82) 2024-12-01 16:20:23 UTC

@nickc, the fixing updates to f40 and f41 for binutils are currently hanging in Bodhi on thier way to stable, as some required test have failed (for likely unrelated reasons), and need some intervention / investigation.

Comment 15 Nick Clifton 2024-12-03 07:53:24 UTC

Gating has completed.  The fixed builds are now in the buildroots for F40, F41 and F42.

Comment 16 Björn Esser (besser82) 2024-12-03 08:41:50 UTC

Thank you!  The update to F41 will go stable with next compose; the other update have hit stable already.  Closing here, as the bug is fixed.

Note You need to log in before you can comment on or make changes to this bug.