Description of problem: Starting with binutils-2.41, mame executable linked on aarch64 gets stuck when running -validate (and in other cases too probably). As a result, RPM build %check stage fails. In order to reproduce: 1. fedpkg clone --anonymous mame 2. cd mame 3. fedpkg switch-branch f39 (this is because f40 uses lld as a workaround) 4. fedpkg srpm 5. mock -r fedora-rawhide-aarch64 mame-0.259-1.fc39.src.rpm 6. wait gdb reveals the following backtrace for the stuck executable: #0 0x0000aaaab5bddb08 in ___ZN4bgfx12VertexLayoutC1Ev_bti_veneer () #1 0x0000fffff5870b2c in call_init (env=<optimized out>, argv=0xfffffffff388, argc=1) at ../csu/libc-start.c:145 #2 __libc_start_main_impl (main=0xaaaaaeedadc0 <main()>, argc=1, argv=0xfffffffff388, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:347 #3 0x0000aaaaaef01570 in _start () I reported this to binutils upstream and got recommended to seek help here first. Please feel free to reassign as appropriate.
Note - this issue has also been reported on the GNU Binutils bugzilla system here: https://sourceware.org/bugzilla/show_bug.cgi?id=30930 Whilst the linker might be to blame, it is unclear at the moment precisely what is causing the problem. Since the issue appears to be related to the program's init sequence, and possibly BTI enablement, I recommended that a glibc ticket be filed so that you guys could have a look at the problem too.
With a non-mock, fedpkg compile build on Fedora rawhide aarch running on OCI the backtrace is slightly different: #0 0x0000aaaab5bd4fb0 in ___ZN3emu6detail16device_registrar15register_deviceERNS0_21device_type_impl_baseE_bti_veneer () #1 0x0000aaaaaec52368 in device_type_impl_base<z88_impexp_device, &(anonymous namespace)::Z88_IMPEXP_device_traits::shortname, &(anonymous namespace)::Z88_IMPEXP_device_traits::fullname, &(anonymous namespace)::Z88_IMPEXP_device_traits::source> () at ../../../../../src/emu/device.h:240 #2 device_type_impl<z88_impexp_device, &(anonymous namespace)::Z88_IMPEXP_device_traits::shortname, &(anonymous namespace)::Z88_IMPEXP_device_traits::fullname, &(anonymous namespace)::Z88_IMPEXP_device_traits::source> () at ../../../../../src/emu/device.h:283 #3 __static_initialization_and_destruction_0 () at ../../../../../src/mame/acorn/z88_impexp.cpp:34 #4 _GLOBAL__sub_I_Z88_IMPEXP () at ../../../../../src/mame/acorn/z88_impexp.cpp:278 #5 0x0000fffff5870b2c in call_init (env=<optimized out>, argv=0xfffffffff258, argc=2) at ../csu/libc-start.c:145 #6 __libc_start_main_impl (main=0xaaaaaeedadc0 <main()>, argc=2, argv=0xfffffffff258, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:347 #7 0x0000aaaaaef01570 in _start
The BTI veneers are created by the static linker for stubs with indirect jumps that could break with BTI enabled. What would be interesting to see is the list of relocations in the application when built with 2.40 and then with 2.41. The code in question was added early in January 2023 here: commit 15b4f66b0a9a3be6caf1898d22a13c39e662006f Author: Szabolcs Nagy <szabolcs.nagy> Date: Wed Jan 18 12:56:46 2023 +0000 bfd: aarch64: Fix stubs that may break BTI PR30076 Insert two stubs in a BTI enabled binary when fixing long calls: The first is near the call site and uses an indirect jump like before, but it targets the second stub that is near the call target site and uses a direct jump. This is needed when a single stub breaks BTI compatibility. The stub layout is kept fixed between sizing and building the stubs, so the location of the second stub is known at build time, this may introduce padding between stubs when those are relaxed. Stub layout with BTI disabled is unchanged. These are probably the first uses of this code at a large scale. I don't think there is anything wrong here in glibc that I can tell.
Upstream has confirmed this is an issue with the binutils support for BTI veneers. I'm moving this to binutils.
Fixed in binutils-2.41-12.fc40.
This bug appears to have been reported against 'rawhide' during the Fedora Linux 40 development cycle. Changing version to 40.