Bug 2241902 - binutils: Broken AArch64 BTI veneers.
Summary: binutils: Broken AArch64 BTI veneers.
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: binutils
Version: 40
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nick Clifton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-10-03 10:08 UTC by Julian Sikorski
Modified: 2024-04-16 13:55 UTC (History)
16 users (show)

Fixed In Version: binutils-2.41-12.fc40
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-04-16 13:55:26 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Sourceware 30930 0 P2 NEW Broken BTI veneers: ld-2.41 links mame in a way which gets stuck on aarch64 2023-11-07 14:03:25 UTC

Description Julian Sikorski 2023-10-03 10:08:38 UTC
Description of problem:
Starting with binutils-2.41, mame executable linked on aarch64 gets stuck when running -validate (and in other cases too probably). As a result, RPM build %check stage fails. In order to reproduce:

1. fedpkg clone --anonymous mame
2. cd mame
3. fedpkg switch-branch f39 (this is because f40 uses lld as a workaround)
4. fedpkg srpm
5. mock -r fedora-rawhide-aarch64 mame-0.259-1.fc39.src.rpm
6. wait

gdb reveals the following backtrace for the stuck executable:

#0  0x0000aaaab5bddb08 in ___ZN4bgfx12VertexLayoutC1Ev_bti_veneer ()
#1  0x0000fffff5870b2c in call_init (env=<optimized out>, argv=0xfffffffff388, argc=1) at ../csu/libc-start.c:145
#2  __libc_start_main_impl (main=0xaaaaaeedadc0 <main()>, argc=1, argv=0xfffffffff388, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:347
#3  0x0000aaaaaef01570 in _start ()

I reported this to binutils upstream and got recommended to seek help here first. Please feel free to reassign as appropriate.

Comment 1 Nick Clifton 2023-10-03 10:30:03 UTC
Note - this issue has also been reported on the GNU Binutils bugzilla system here:

  https://sourceware.org/bugzilla/show_bug.cgi?id=30930

Whilst the linker might be to blame, it is unclear at the moment precisely what is causing the problem.  Since the issue appears to be related to the program's init sequence, and possibly BTI enablement, I recommended that a glibc ticket be filed so that you guys could have a look at the problem too.

Comment 2 Julian Sikorski 2023-10-03 12:00:35 UTC
With a non-mock, fedpkg compile build on Fedora rawhide aarch running on OCI the backtrace is slightly different:

#0  0x0000aaaab5bd4fb0 in ___ZN3emu6detail16device_registrar15register_deviceERNS0_21device_type_impl_baseE_bti_veneer ()
#1  0x0000aaaaaec52368 in device_type_impl_base<z88_impexp_device, &(anonymous namespace)::Z88_IMPEXP_device_traits::shortname, &(anonymous namespace)::Z88_IMPEXP_device_traits::fullname, &(anonymous namespace)::Z88_IMPEXP_device_traits::source> ()
    at ../../../../../src/emu/device.h:240
#2  device_type_impl<z88_impexp_device, &(anonymous namespace)::Z88_IMPEXP_device_traits::shortname, &(anonymous namespace)::Z88_IMPEXP_device_traits::fullname, &(anonymous namespace)::Z88_IMPEXP_device_traits::source> () at ../../../../../src/emu/device.h:283
#3  __static_initialization_and_destruction_0 () at ../../../../../src/mame/acorn/z88_impexp.cpp:34
#4  _GLOBAL__sub_I_Z88_IMPEXP () at ../../../../../src/mame/acorn/z88_impexp.cpp:278
#5  0x0000fffff5870b2c in call_init (env=<optimized out>, argv=0xfffffffff258, argc=2) at ../csu/libc-start.c:145
#6  __libc_start_main_impl (main=0xaaaaaeedadc0 <main()>, argc=2, argv=0xfffffffff258, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:347
#7  0x0000aaaaaef01570 in _start

Comment 3 Carlos O'Donell 2023-10-03 14:16:14 UTC
The BTI veneers are created by the static linker for stubs with indirect jumps that could break with BTI enabled.

What would be interesting to see is the list of relocations in the application when built with 2.40 and then with 2.41.

The code in question was added early in January 2023 here:

commit 15b4f66b0a9a3be6caf1898d22a13c39e662006f
Author: Szabolcs Nagy <szabolcs.nagy>
Date:   Wed Jan 18 12:56:46 2023 +0000

    bfd: aarch64: Fix stubs that may break BTI PR30076
    
    Insert two stubs in a BTI enabled binary when fixing long calls: The
    first is near the call site and uses an indirect jump like before,
    but it targets the second stub that is near the call target site and
    uses a direct jump.
    
    This is needed when a single stub breaks BTI compatibility.
    
    The stub layout is kept fixed between sizing and building the stubs,
    so the location of the second stub is known at build time, this may
    introduce padding between stubs when those are relaxed.  Stub layout
    with BTI disabled is unchanged.

These are probably the first uses of this code at a large scale.

I don't think there is anything wrong here in glibc that I can tell.

Comment 4 Carlos O'Donell 2023-10-10 13:23:47 UTC
Upstream has confirmed this is an issue with the binutils support for BTI veneers.

I'm moving this to binutils.

Comment 5 Nick Clifton 2023-11-07 13:43:51 UTC
Fixed in binutils-2.41-12.fc40.

Comment 6 Aoife Moloney 2024-02-15 22:58:44 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 40 development cycle.
Changing version to 40.


Note You need to log in before you can comment on or make changes to this bug.