Description of problem: elfutils when built -mbranch-protection=standard is experiencing a unit test failure on aarch64. Further it appears that its also causing debuginfo extraction problems in other packages. When built on aarch64: ============================================================================ Testsuite summary for elfutils 0.180 ============================================================================ # TOTAL: 219 # PASS: 213 # SKIP: 5 # XFAIL: 0 # FAIL: 1 # XPASS: 0 # ERROR: 0 ============================================================================ See tests/test-suite.log Please report to https://sourceware.org/bugzilla ============================================================================ FAIL: run-backtrace-native-core.sh ================================== /usr/bin/coredumpctl PID: 7477 (backtrace-child) UID: 0 (root) GID: 0 (root) Signal: 6 (ABRT) Timestamp: Tue 2020-07-28 11:01:25 EDT (2s ago) Command Line: /root/t/elfutils/elfutils-0.180/tests/backtrace-child --gencore Executable: /root/t/elfutils/elfutils-0.180/tests/backtrace-child Control Group: /user.slice/user-0.slice/session-3.scope Unit: session-3.scope Slice: user-0.slice Session: 3 Owner UID: 0 (root) Boot ID: e42abccd30874f80a5904ce3a8e2c9f1 Machine ID: e4e16166188344d5acacabe5d9d3dd3c Hostname: localhost.localdomain Storage: /var/lib/systemd/coredump/core.backtrace-child.0.e42abccd30874f80a5904ce3a8e2c9f1.7477.1595948485000000000000.zst Message: Process 7477 (backtrace-child) of user 0 dumped core. Stack trace of thread 7482: #0 0x0000ffffa733aaf8 raise (libpthread.so.0 + 0x13af8) #1 0x0000aaaaafa2de4c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xe4c) #2 0x0000aaaaafa2de4c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xe4c) #3 0x0000aaaaafa2df2c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf2c) #4 0x0000aaaaafa2df44 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf44) #5 0x0000aaaaafa2df54 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf54) #6 0x0000ffffa732ef74 start_thread (libpthread.so.0 + 0x7f74) Stack trace of thread 7477: #0 0x0000ffffa73303c0 __pthread_clockjoin_ex (libpthread.so.0 + 0x93c0) #1 0x0000aaaaafa2dc34 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xc34) #2 0x0000aaaaafa2dc34 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xc34) #3 0x0000ffffa71c5878 __libc_start_main (libc.so.6 + 0x24878) backtrace: backtrace.c:144: callback_verify: Assertion `symname != NULL && strcmp (symname, "backtracegen") == 0' failed. ./test-subr.sh: line 84: 8904 Aborted (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" $VALGRIND_CMD "$@" backtrace-child-core.7477: no main rmdir: failed to remove 'test-7404': Directory not empty FAIL run-backtrace-native-core.sh (exit status: 1) Version-Release number of selected component (if applicable): 0.180 How reproducible: at the moment 100% Steps to Reproduce: 1. Acquire rawhide/f33 with gcc 10.2.1+recent binutils 2. build elfutils on that machine with `fedpkg local` Actual results: As seen above backtrace: backtrace.c:144: callback_verify: Assertion `symname != NULL && strcmp (symname, "backtracegen") == 0' failed. (glibc failure caused by elfutils) ++ /usr/lib/rpm/debugedit -b /root/t/glibc -d /usr/src/debug -i -l ./debugsources.list /root/rpmbuild/BUILDROOT/glibc-2.31.9000-21.fc33.aarch64/usr/bin/gencat Failed to update file: invalid section entry size Expected results: Additional info:
Created attachment 1702690 [details] exe and corefile from unit test failure.
Created attachment 1702691 [details] gencat from debuginfo extraction failure
So this is really 2 bugs. 1) elfutils backtrace failing when building with -mbranch-protection=standard 2) rpm debugedit (which used elfutils libelf) not being able to update a file because of "invalid section entry size". I can replicate 1) by building upstream elfutils with CFLAGS="-g -O2 -mbranch-protection=standard" CXXFLAGS="$CFLAGS" In that case both run-backtrace-native.sh and run-backtrace-native-core.sh fail. They succeed without -mbranch-protection=standard Issue 2) can be shown with the gencat ELF file attachment: # eu-elflint --gnu ./gencat section [14] '.plt': size not multiple of entry size section [23] '.dynamic': entry 22: unknown tag And indeed, the .plt section is bad: [14] .plt PROGBITS 0000000000401140 00001140 00000410 24 AX 0 0 16 410 hex = 1040 is not dividable by the entry size 24 (it looks like there are 43 entries and then 8 extra bytes) I'll try to figure out issue 1. But issue 2 must be somewhere else, probably binutils ld which generated the .plt section.
> section [23] '.dynamic': entry 22: unknown tag BTW. This is <unknown>: 0x70000001 000000000000000000 If someone knows what d_tag type 0x70000001 (DT_LOPROC + 1) is, that would be appreciated. It isn't listed in glibc /usr/include/elf.h (which is what elfutils uses). The only entry for aarch64 is #define DT_AARCH64_VARIANT_PCS (DT_LOPROC + 5)
Note that this does NOT seem to impact the mass rebuild going on. As far as I can see builds on aarch64 are fine, elfutils itself got rebuild without showing any failures: https://kojipkgs.fedoraproject.org//packages/elfutils/0.180/6.fc33/data/logs/aarch64/build.log It does look like it is using -mbranch-protection=standard But I also see SKIP: run-backtrace-native-core.sh which means no core file was generated on the koji builder. Same for glibc, I don't see any debugedit failures in the aarch64 build.log: https://kojipkgs.fedoraproject.org//work/tasks/5655/47975655/build.log
This issue may also trigger during an aarch64 rebuild of glibc if PAC+BTI is enabled: extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/lib64/libutil-2.31.9000.so explicitly decompress any DWARF compressed ELF sections in /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/sbin/ldconfig extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/sbin/ldconfig explicitly decompress any DWARF compressed ELF sections in /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat Failed to update file: invalid section entry size error: Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install) Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install) My guess: We do not see it more widely because glibc in the buildroot is built without PAC+BTI. The link editor does not produce the problematic output as a result, masking any elfutils problems that may exist.
/* Processor specific dynamic array tags. */ #define DT_AARCH64_BTI_PLT (DT_LOPROC + 1) #define DT_AARCH64_PAC_PLT (DT_LOPROC + 3) #define DT_AARCH64_VARIANT_PCS (DT_LOPROC + 5) is what binutils sources have.
(In reply to Jakub Jelinek from comment #7) > /* Processor specific dynamic array tags. */ > #define DT_AARCH64_BTI_PLT (DT_LOPROC + 1) > #define DT_AARCH64_PAC_PLT (DT_LOPROC + 3) > #define DT_AARCH64_VARIANT_PCS (DT_LOPROC + 5) > is what binutils sources have. Ah, great, so this does seem to confirm that something is up with the .plt section. Is there any documentation on what it means to have those tags in the dynamic array? I looked to the change request at https://fedoraproject.org/wiki/Changes/Aarch64_PointerAuthentication and asked around, but nobody seems to know anything about any ELF, DWARF or gabi changes. But I guess there must be seeing the issues with the dynamic tags, .plt section and the fact that unwinding seems broken. Can we merge them into glibc elf.h to expose them to other tools?
(In reply to Florian Weimer from comment #6) > This issue may also trigger during an aarch64 rebuild of glibc if PAC+BTI is > enabled: > > extracting debug info from > /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat > Failed to update file: invalid section entry size > error: Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install) > Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install) This issue is analyzed a bit in comment #3. You can also see this running eu-elflint on gencat: section [14] '.plt': size not multiple of entry size Given some of the other observations, might it be that the linker somehow creates .plt entries of different sizes when creating gencat? That would cause sh_size % sh_entsize != 0 which makes debugedit/libelf throw an error when it encounters such an .plt section.
GDB does seem able to unwind through the core file, but eu-stack doesn't: # gdb --core tests/test-187673/core.187694 tests/backtrace-child (gdb) thread apply all bt Thread 2 (Thread 0xffff9777e010 (LWP 187694)): #0 0x0000ffff97726610 in __pthread_clockjoin_ex () from /lib64/libpthread.so.0 #1 0x0000aaaad1523b3c in main (argc=<optimized out>, argv=<optimized out>) at backtrace-child.c:241 Thread 1 (Thread 0xffff975a6110 (LWP 187695)): #0 0x0000ffff97730d48 in raise () from /lib64/libpthread.so.0 #1 0x0000aaaad1523d4c in sigusr2 (signo=<optimized out>) at backtrace-child.c:132 #2 0x0000aaaad1523e2c in stdarg (f=<optimized out>) at backtrace-child.c:176 #3 0x0000aaaad1523e44 in backtracegen () at backtrace-child.c:190 #4 0x0000aaaad1523e54 in start (arg=<optimized out>) at backtrace-child.c:205 #5 0x0000ffff97725294 in start_thread () from /lib64/libpthread.so.0 #6 0x0000ffff9767d27c in thread_start () from /lib64/libc.so.6 # eu-stack -v --core tests/test-187673/core.187694 --exec tests/backtrace-child PID 187694 - core TID 187695: #0 0x0000ffff97730d48 raise - libpthread.so.0 #1 0x0000aaaad1523d4c - 1 sigusr2 - backtrace-child /root/elfutils/tests/backtrace-child.c:132:3 #2 0x0000aaaad1523e2c - 1 stdarg - backtrace-child /root/elfutils/tests/backtrace-child.c:176:3 #3 0x0000ffff9774c000 - 1 - libpthread.so.0 eu-stack: dwfl_thread_getframes tid 187695 at 0xffff9774bfff in libpthread.so.0: No DWARF information found TID 187694: #0 0x0000ffff97726610 __pthread_clockjoin_ex - libpthread.so.0 #1 0x0000aaaad1523b3c - 1 main - backtrace-child /root/elfutils/tests/backtrace-child.c:241:5 #2 0x0000ffff975cb838 - 1 __libc_start_main - libc.so.6 #3 0xf00000f4a90153f3 - 1 #4 0xf00000f4a90153f3 - 1 eu-stack: dwfl_thread_getframes tid 187694 at 0xf00000f4a90153f2 in <unknown>: No DWARF information found
Note that most backtraces actually work. Unless it goes through a signal frame. Is there anything about PAC that changes how one unwinds through a signal frame?
Regarding the gencat problem, the PLT0 entry for gencat has a different size than the other PLT entries: Disassembly of section .plt: 0000000000401140 <.plt>: 401140: d503245f bti c 401144: a9bf7bf0 stp x16, x30, [sp, #-16]! 401148: d00000f0 adrp x16, 41f000 <__FRAME_END__+0x1abd4> 40114c: f9474a11 ldr x17, [x16, #3728] 401150: 913a4210 add x16, x16, #0xe90 401154: d61f0220 br x17 401158: d503201f nop 40115c: d503201f nop 0000000000401160 <memcpy@plt>: 401160: d503245f bti c 401164: d00000f0 adrp x16, 41f000 <__FRAME_END__+0x1abd4> 401168: f9474e11 ldr x17, [x16, #3736] 40116c: 913a6210 add x16, x16, #0xe98 401170: d61f0220 br x17 401174: d503201f nop 0000000000401178 <strlen@plt>: 401178: d503245f bti c 40117c: d00000f0 adrp x16, 41f000 <__FRAME_END__+0x1abd4> 401180: f9475211 ldr x17, [x16, #3744] 401184: 913a8210 add x16, x16, #0xea0 401188: d61f0220 br x17 40118c: d503201f nop I don't think that's valid ELF. Another oddity is that the binary has just an AARCH64_BTI_PLT entry: Dynamic section at offset 0xfc60 contains 29 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x0000000000000001 (NEEDED) Shared library: [ld-linux-aarch64.so.1] 0x000000000000000c (INIT) 0x401120 0x000000000000000d (FINI) 0x403868 0x0000000000000019 (INIT_ARRAY) 0x41fc40 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 0x000000000000001a (FINI_ARRAY) 0x41fc48 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 0x0000000000000004 (HASH) 0x400330 0x000000006ffffef5 (GNU_HASH) 0x400498 0x0000000000000005 (STRTAB) 0x400990 0x0000000000000006 (SYMTAB) 0x4004e0 0x000000000000000a (STRSZ) 575 (bytes) 0x000000000000000b (SYMENT) 24 (bytes) 0x0000000000000015 (DEBUG) 0x0 0x0000000000000003 (PLTGOT) 0x41fe80 0x0000000000000002 (PLTRELSZ) 1008 (bytes) 0x0000000000000014 (PLTREL) RELA 0x0000000000000017 (JMPREL) 0x400d30 0x0000000000000007 (RELA) 0x400c88 0x0000000000000008 (RELASZ) 168 (bytes) 0x0000000000000009 (RELAENT) 24 (bytes) 0x0000000070000001 (AARCH64_BTI_PLT) 0x0000000000000018 (BIND_NOW) 0x000000006ffffffb (FLAGS_1) Flags: NOW 0x000000006ffffffe (VERNEED) 0x400c38 0x000000006fffffff (VERNEEDNUM) 2 0x000000006ffffff0 (VERSYM) 0x400bd0 0x0000000000000000 (NULL) 0x0 But it enables both BTI *and* PAC: Displaying notes found in: .note.gnu.property Owner Data size Description GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0 Properties: AArch64 feature: BTI, PAC Maybe ld got confused in some way? I'm going to file a binutils bug once I have a few more details.
So, the arm-elf document https://developer.arm.com/documentation/ihi0056/g/ describes the elf related changes. In reference to #11 i remember there was a tweak around general exception handling, which affected libc (and that patch landed a year or so again IIRC), but I need to dig up the details.
(In reply to Jeremy Linton (ARM) from comment #13) > So, the arm-elf document https://developer.arm.com/documentation/ihi0056/g/ > describes the elf related changes. > > In reference to #11 i remember there was a tweak around general exception > handling, which affected libc (and that patch landed a year or so again > IIRC), but I need to dig up the details. Thanks. That is very useful. Looks like there were actually various ELF changes to support this. Lets keep this bug to update elfutils for PAC/BTI. I opened a separate bug for the binutils/ld issue (rpm debugedit being unable to process some files) as https://bugzilla.redhat.com/show_bug.cgi?id=1862110
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle. Changing version to 33.
The new way to unwind aarch64 pac with signed (mangled) return addresses is described in this gdb patch: https://sourceware.org/legacy-ml/gdb-patches/2017-08/msg00171.html
(In reply to Mark Wielaard from comment #16) > The new way to unwind aarch64 pac with signed (mangled) return addresses is > described in this gdb patch: > https://sourceware.org/legacy-ml/gdb-patches/2017-08/msg00171.html Note that it took 1.5 years for the patch to actually show up: https://sourceware.org/legacy-ml/gdb-patches/2019-03/msg00084.html
I posted a couple of patches upstream: https://sourceware.org/pipermail/elfutils-devel/2020q3/date.html libelf: Sync elf.h from glibc backends: Implement aarch64_dynamic_tag_name and aarch64_dynamic_tag_check libebl: Handle aarch64 bti, pac bits in gnu property note libdw,readelf: Recognize DW_CFA_AARCH64_negate_ra_state It recognizes the various bits (and resolves the confusion wrt DW_CFA_GNU_window_save) which allows unwinding again when the hardware doesn't actually do any pointer authentication. To actually handle mangled return addresses we need a bit more code. But for now this should give us a fully green testsuite again on most aarch64 (< ARMv8.3) hardware.
FEDORA-2020-d63f2a2d61 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-d63f2a2d61
FEDORA-2020-820ac199ba has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-820ac199ba
FEDORA-2020-820ac199ba has been pushed to the Fedora 32 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-820ac199ba` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-820ac199ba See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2020-d63f2a2d61 has been pushed to the Fedora 33 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-d63f2a2d61` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-d63f2a2d61 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2020-820ac199ba has been pushed to the Fedora 32 stable repository. If problem still persists, please make note of it in this bug report.
FEDORA-2020-d63f2a2d61 has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.