This clone of the bug is against binutils, and is for: Part 2) rpm debugedit (which used elfutils libelf) not being able to update a file because of "invalid section entry size". +++ This bug was initially created as a clone of Bug #1861423 +++ Description of problem: elfutils when built -mbranch-protection=standard is experiencing a unit test failure on aarch64. Further it appears that its also causing debuginfo extraction problems in other packages. When built on aarch64: ============================================================================ Testsuite summary for elfutils 0.180 ============================================================================ # TOTAL: 219 # PASS: 213 # SKIP: 5 # XFAIL: 0 # FAIL: 1 # XPASS: 0 # ERROR: 0 ============================================================================ See tests/test-suite.log Please report to https://sourceware.org/bugzilla ============================================================================ FAIL: run-backtrace-native-core.sh ================================== /usr/bin/coredumpctl PID: 7477 (backtrace-child) UID: 0 (root) GID: 0 (root) Signal: 6 (ABRT) Timestamp: Tue 2020-07-28 11:01:25 EDT (2s ago) Command Line: /root/t/elfutils/elfutils-0.180/tests/backtrace-child --gencore Executable: /root/t/elfutils/elfutils-0.180/tests/backtrace-child Control Group: /user.slice/user-0.slice/session-3.scope Unit: session-3.scope Slice: user-0.slice Session: 3 Owner UID: 0 (root) Boot ID: e42abccd30874f80a5904ce3a8e2c9f1 Machine ID: e4e16166188344d5acacabe5d9d3dd3c Hostname: localhost.localdomain Storage: /var/lib/systemd/coredump/core.backtrace-child.0.e42abccd30874f80a5904ce3a8e2c9f1.7477.1595948485000000000000.zst Message: Process 7477 (backtrace-child) of user 0 dumped core. Stack trace of thread 7482: #0 0x0000ffffa733aaf8 raise (libpthread.so.0 + 0x13af8) #1 0x0000aaaaafa2de4c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xe4c) #2 0x0000aaaaafa2de4c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xe4c) #3 0x0000aaaaafa2df2c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf2c) #4 0x0000aaaaafa2df44 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf44) #5 0x0000aaaaafa2df54 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf54) #6 0x0000ffffa732ef74 start_thread (libpthread.so.0 + 0x7f74) Stack trace of thread 7477: #0 0x0000ffffa73303c0 __pthread_clockjoin_ex (libpthread.so.0 + 0x93c0) #1 0x0000aaaaafa2dc34 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xc34) #2 0x0000aaaaafa2dc34 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xc34) #3 0x0000ffffa71c5878 __libc_start_main (libc.so.6 + 0x24878) backtrace: backtrace.c:144: callback_verify: Assertion `symname != NULL && strcmp (symname, "backtracegen") == 0' failed. ./test-subr.sh: line 84: 8904 Aborted (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" $VALGRIND_CMD "$@" backtrace-child-core.7477: no main rmdir: failed to remove 'test-7404': Directory not empty FAIL run-backtrace-native-core.sh (exit status: 1) Version-Release number of selected component (if applicable): 0.180 How reproducible: at the moment 100% Steps to Reproduce: 1. Acquire rawhide/f33 with gcc 10.2.1+recent binutils 2. build elfutils on that machine with `fedpkg local` Actual results: As seen above backtrace: backtrace.c:144: callback_verify: Assertion `symname != NULL && strcmp (symname, "backtracegen") == 0' failed. (glibc failure caused by elfutils) ++ /usr/lib/rpm/debugedit -b /root/t/glibc -d /usr/src/debug -i -l ./debugsources.list /root/rpmbuild/BUILDROOT/glibc-2.31.9000-21.fc33.aarch64/usr/bin/gencat Failed to update file: invalid section entry size Expected results: Additional info: --- Additional comment from Jeremy Linton on 2020-07-28 16:43:24 UTC --- --- Additional comment from Jeremy Linton on 2020-07-28 16:45:19 UTC --- --- Additional comment from Mark Wielaard on 2020-07-28 21:38:44 UTC --- So this is really 2 bugs. 1) elfutils backtrace failing when building with -mbranch-protection=standard 2) rpm debugedit (which used elfutils libelf) not being able to update a file because of "invalid section entry size". I can replicate 1) by building upstream elfutils with CFLAGS="-g -O2 -mbranch-protection=standard" CXXFLAGS="$CFLAGS" In that case both run-backtrace-native.sh and run-backtrace-native-core.sh fail. They succeed without -mbranch-protection=standard Issue 2) can be shown with the gencat ELF file attachment: # eu-elflint --gnu ./gencat section [14] '.plt': size not multiple of entry size section [23] '.dynamic': entry 22: unknown tag And indeed, the .plt section is bad: [14] .plt PROGBITS 0000000000401140 00001140 00000410 24 AX 0 0 16 410 hex = 1040 is not dividable by the entry size 24 (it looks like there are 43 entries and then 8 extra bytes) I'll try to figure out issue 1. But issue 2 must be somewhere else, probably binutils ld which generated the .plt section. --- Additional comment from Mark Wielaard on 2020-07-28 21:45:33 UTC --- > section [23] '.dynamic': entry 22: unknown tag BTW. This is <unknown>: 0x70000001 000000000000000000 If someone knows what d_tag type 0x70000001 (DT_LOPROC + 1) is, that would be appreciated. It isn't listed in glibc /usr/include/elf.h (which is what elfutils uses). The only entry for aarch64 is #define DT_AARCH64_VARIANT_PCS (DT_LOPROC + 5) --- Additional comment from Mark Wielaard on 2020-07-29 10:16:07 UTC --- Note that this does NOT seem to impact the mass rebuild going on. As far as I can see builds on aarch64 are fine, elfutils itself got rebuild without showing any failures: https://kojipkgs.fedoraproject.org//packages/elfutils/0.180/6.fc33/data/logs/aarch64/build.log It does look like it is using -mbranch-protection=standard But I also see SKIP: run-backtrace-native-core.sh which means no core file was generated on the koji builder. Same for glibc, I don't see any debugedit failures in the aarch64 build.log: https://kojipkgs.fedoraproject.org//work/tasks/5655/47975655/build.log --- Additional comment from Florian Weimer on 2020-07-29 10:28:08 UTC --- This issue may also trigger during an aarch64 rebuild of glibc if PAC+BTI is enabled: extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/lib64/libutil-2.31.9000.so explicitly decompress any DWARF compressed ELF sections in /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/sbin/ldconfig extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/sbin/ldconfig explicitly decompress any DWARF compressed ELF sections in /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat Failed to update file: invalid section entry size error: Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install) Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install) My guess: We do not see it more widely because glibc in the buildroot is built without PAC+BTI. The link editor does not produce the problematic output as a result, masking any elfutils problems that may exist. --- Additional comment from Jakub Jelinek on 2020-07-29 10:37:17 UTC --- /* Processor specific dynamic array tags. */ #define DT_AARCH64_BTI_PLT (DT_LOPROC + 1) #define DT_AARCH64_PAC_PLT (DT_LOPROC + 3) #define DT_AARCH64_VARIANT_PCS (DT_LOPROC + 5) is what binutils sources have. --- Additional comment from Mark Wielaard on 2020-07-29 10:44:43 UTC --- (In reply to Jakub Jelinek from comment #7) > /* Processor specific dynamic array tags. */ > #define DT_AARCH64_BTI_PLT (DT_LOPROC + 1) > #define DT_AARCH64_PAC_PLT (DT_LOPROC + 3) > #define DT_AARCH64_VARIANT_PCS (DT_LOPROC + 5) > is what binutils sources have. Ah, great, so this does seem to confirm that something is up with the .plt section. Is there any documentation on what it means to have those tags in the dynamic array? I looked to the change request at https://fedoraproject.org/wiki/Changes/Aarch64_PointerAuthentication and asked around, but nobody seems to know anything about any ELF, DWARF or gabi changes. But I guess there must be seeing the issues with the dynamic tags, .plt section and the fact that unwinding seems broken. Can we merge them into glibc elf.h to expose them to other tools? --- Additional comment from Mark Wielaard on 2020-07-29 10:58:08 UTC --- (In reply to Florian Weimer from comment #6) > This issue may also trigger during an aarch64 rebuild of glibc if PAC+BTI is > enabled: > > extracting debug info from > /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat > Failed to update file: invalid section entry size > error: Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install) > Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install) This issue is analyzed a bit in comment #3. You can also see this running eu-elflint on gencat: section [14] '.plt': size not multiple of entry size Given some of the other observations, might it be that the linker somehow creates .plt entries of different sizes when creating gencat? That would cause sh_size % sh_entsize != 0 which makes debugedit/libelf throw an error when it encounters such an .plt section. --- Additional comment from Mark Wielaard on 2020-07-29 11:13:46 UTC --- GDB does seem able to unwind through the core file, but eu-stack doesn't: # gdb --core tests/test-187673/core.187694 tests/backtrace-child (gdb) thread apply all bt Thread 2 (Thread 0xffff9777e010 (LWP 187694)): #0 0x0000ffff97726610 in __pthread_clockjoin_ex () from /lib64/libpthread.so.0 #1 0x0000aaaad1523b3c in main (argc=<optimized out>, argv=<optimized out>) at backtrace-child.c:241 Thread 1 (Thread 0xffff975a6110 (LWP 187695)): #0 0x0000ffff97730d48 in raise () from /lib64/libpthread.so.0 #1 0x0000aaaad1523d4c in sigusr2 (signo=<optimized out>) at backtrace-child.c:132 #2 0x0000aaaad1523e2c in stdarg (f=<optimized out>) at backtrace-child.c:176 #3 0x0000aaaad1523e44 in backtracegen () at backtrace-child.c:190 #4 0x0000aaaad1523e54 in start (arg=<optimized out>) at backtrace-child.c:205 #5 0x0000ffff97725294 in start_thread () from /lib64/libpthread.so.0 #6 0x0000ffff9767d27c in thread_start () from /lib64/libc.so.6 # eu-stack -v --core tests/test-187673/core.187694 --exec tests/backtrace-child PID 187694 - core TID 187695: #0 0x0000ffff97730d48 raise - libpthread.so.0 #1 0x0000aaaad1523d4c - 1 sigusr2 - backtrace-child /root/elfutils/tests/backtrace-child.c:132:3 #2 0x0000aaaad1523e2c - 1 stdarg - backtrace-child /root/elfutils/tests/backtrace-child.c:176:3 #3 0x0000ffff9774c000 - 1 - libpthread.so.0 eu-stack: dwfl_thread_getframes tid 187695 at 0xffff9774bfff in libpthread.so.0: No DWARF information found TID 187694: #0 0x0000ffff97726610 __pthread_clockjoin_ex - libpthread.so.0 #1 0x0000aaaad1523b3c - 1 main - backtrace-child /root/elfutils/tests/backtrace-child.c:241:5 #2 0x0000ffff975cb838 - 1 __libc_start_main - libc.so.6 #3 0xf00000f4a90153f3 - 1 #4 0xf00000f4a90153f3 - 1 eu-stack: dwfl_thread_getframes tid 187694 at 0xf00000f4a90153f2 in <unknown>: No DWARF information found --- Additional comment from Mark Wielaard on 2020-07-29 12:01:49 UTC --- Note that most backtraces actually work. Unless it goes through a signal frame. Is there anything about PAC that changes how one unwinds through a signal frame? --- Additional comment from Florian Weimer on 2020-07-29 12:04:01 UTC --- Regarding the gencat problem, the PLT0 entry for gencat has a different size than the other PLT entries: Disassembly of section .plt: 0000000000401140 <.plt>: 401140: d503245f bti c 401144: a9bf7bf0 stp x16, x30, [sp, #-16]! 401148: d00000f0 adrp x16, 41f000 <__FRAME_END__+0x1abd4> 40114c: f9474a11 ldr x17, [x16, #3728] 401150: 913a4210 add x16, x16, #0xe90 401154: d61f0220 br x17 401158: d503201f nop 40115c: d503201f nop 0000000000401160 <memcpy@plt>: 401160: d503245f bti c 401164: d00000f0 adrp x16, 41f000 <__FRAME_END__+0x1abd4> 401168: f9474e11 ldr x17, [x16, #3736] 40116c: 913a6210 add x16, x16, #0xe98 401170: d61f0220 br x17 401174: d503201f nop 0000000000401178 <strlen@plt>: 401178: d503245f bti c 40117c: d00000f0 adrp x16, 41f000 <__FRAME_END__+0x1abd4> 401180: f9475211 ldr x17, [x16, #3744] 401184: 913a8210 add x16, x16, #0xea0 401188: d61f0220 br x17 40118c: d503201f nop I don't think that's valid ELF. Another oddity is that the binary has just an AARCH64_BTI_PLT entry: Dynamic section at offset 0xfc60 contains 29 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] 0x0000000000000001 (NEEDED) Shared library: [ld-linux-aarch64.so.1] 0x000000000000000c (INIT) 0x401120 0x000000000000000d (FINI) 0x403868 0x0000000000000019 (INIT_ARRAY) 0x41fc40 0x000000000000001b (INIT_ARRAYSZ) 8 (bytes) 0x000000000000001a (FINI_ARRAY) 0x41fc48 0x000000000000001c (FINI_ARRAYSZ) 8 (bytes) 0x0000000000000004 (HASH) 0x400330 0x000000006ffffef5 (GNU_HASH) 0x400498 0x0000000000000005 (STRTAB) 0x400990 0x0000000000000006 (SYMTAB) 0x4004e0 0x000000000000000a (STRSZ) 575 (bytes) 0x000000000000000b (SYMENT) 24 (bytes) 0x0000000000000015 (DEBUG) 0x0 0x0000000000000003 (PLTGOT) 0x41fe80 0x0000000000000002 (PLTRELSZ) 1008 (bytes) 0x0000000000000014 (PLTREL) RELA 0x0000000000000017 (JMPREL) 0x400d30 0x0000000000000007 (RELA) 0x400c88 0x0000000000000008 (RELASZ) 168 (bytes) 0x0000000000000009 (RELAENT) 24 (bytes) 0x0000000070000001 (AARCH64_BTI_PLT) 0x0000000000000018 (BIND_NOW) 0x000000006ffffffb (FLAGS_1) Flags: NOW 0x000000006ffffffe (VERNEED) 0x400c38 0x000000006fffffff (VERNEEDNUM) 2 0x000000006ffffff0 (VERSYM) 0x400bd0 0x0000000000000000 (NULL) 0x0 But it enables both BTI *and* PAC: Displaying notes found in: .note.gnu.property Owner Data size Description GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0 Properties: AArch64 feature: BTI, PAC Maybe ld got confused in some way? I'm going to file a binutils bug once I have a few more details. --- Additional comment from Jeremy Linton (ARM) on 2020-07-29 15:11:35 UTC --- So, the arm-elf document https://developer.arm.com/documentation/ihi0056/g/ describes the elf related changes. In reference to #11 i remember there was a tweak around general exception handling, which affected libc (and that patch landed a year or so again IIRC), but I need to dig up the details.
The upstream bug is https://sourceware.org/bugzilla/show_bug.cgi?id=26312 which has a proposed patch at https://sourceware.org/pipermail/binutils/2020-July/112643.html
Fixed in: binutils-2.35-7.fc33
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle. Changing version to 33.
This message is a reminder that Fedora 33 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '33'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 33 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
This is AFAIK working now, the efultils test in question is passing. The only failures are in debuginfod-find which fails in the same way on x86 with a local build.
(In reply to Jeremy Linton from comment #5) > This is AFAIK working now, the efultils test in question is passing. The > only failures are in debuginfod-find which fails in the same way on x86 with > a local build. ears perking up, what are you observing with debuginfod-find?
(In reply to Jeremy Linton from comment #5) > This is AFAIK working now, the efultils test in question is passing. The > only failures are in debuginfod-find which fails in the same way on x86 with > a local build. Yes, https://sourceware.org/bugzilla/show_bug.cgi?id=26312 has been resolved. But debuginfod-find shouldn't fail (on any arch). What failure are you seeing exactly?
Well whatever it was, its gone now: estsuite summary for elfutils 0.187 ============================================================================ # TOTAL: 257 # PASS: 253 # SKIP: 4 # XFAIL: 0 # FAIL: 0 # XPASS: 0 # ERROR: 0