Bug 1861423 - F33 has PAC/BTI enabled for rawhide but this is causing an elfutils build failure during 'make check' on aarch64
Summary: F33 has PAC/BTI enabled for rawhide but this is causing an elfutils build fai...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: elfutils
Version: 33
Hardware: aarch64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Mark Wielaard
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: ARMTracker 1847148 1862110
TreeView+ depends on / blocked
 
Reported: 2020-07-28 15:06 UTC by Jeremy Linton
Modified: 2020-09-25 16:51 UTC (History)
9 users (show)

Fixed In Version: elfutils-0.181-1.fc32 elfutils-0.181-1.fc33
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1862110 (view as bug list)
Environment:
Last Closed: 2020-09-10 17:31:32 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
exe and corefile from unit test failure. (36.24 KB, application/gzip)
2020-07-28 16:43 UTC, Jeremy Linton
no flags Details
gencat from debuginfo extraction failure (39.13 KB, application/gzip)
2020-07-28 16:45 UTC, Jeremy Linton
no flags Details

Description Jeremy Linton 2020-07-28 15:06:26 UTC
Description of problem: elfutils when built -mbranch-protection=standard is experiencing a unit test failure on aarch64. Further it appears that its also causing debuginfo extraction problems in other packages.

When built on aarch64:

============================================================================
Testsuite summary for elfutils 0.180
============================================================================
# TOTAL: 219
# PASS:  213
# SKIP:  5
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See tests/test-suite.log
Please report to https://sourceware.org/bugzilla
============================================================================
FAIL: run-backtrace-native-core.sh
==================================

/usr/bin/coredumpctl
           PID: 7477 (backtrace-child)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Tue 2020-07-28 11:01:25 EDT (2s ago)
  Command Line: /root/t/elfutils/elfutils-0.180/tests/backtrace-child --gencore
    Executable: /root/t/elfutils/elfutils-0.180/tests/backtrace-child
 Control Group: /user.slice/user-0.slice/session-3.scope
          Unit: session-3.scope
         Slice: user-0.slice
       Session: 3
     Owner UID: 0 (root)
       Boot ID: e42abccd30874f80a5904ce3a8e2c9f1
    Machine ID: e4e16166188344d5acacabe5d9d3dd3c
      Hostname: localhost.localdomain
       Storage: /var/lib/systemd/coredump/core.backtrace-child.0.e42abccd30874f80a5904ce3a8e2c9f1.7477.1595948485000000000000.zst
       Message: Process 7477 (backtrace-child) of user 0 dumped core.
                
                Stack trace of thread 7482:
                #0  0x0000ffffa733aaf8 raise (libpthread.so.0 + 0x13af8)
                #1  0x0000aaaaafa2de4c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xe4c)
                #2  0x0000aaaaafa2de4c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xe4c)
                #3  0x0000aaaaafa2df2c n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf2c)
                #4  0x0000aaaaafa2df44 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf44)
                #5  0x0000aaaaafa2df54 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xf54)
                #6  0x0000ffffa732ef74 start_thread (libpthread.so.0 + 0x7f74)
                
                Stack trace of thread 7477:
                #0  0x0000ffffa73303c0 __pthread_clockjoin_ex (libpthread.so.0 + 0x93c0)
                #1  0x0000aaaaafa2dc34 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xc34)
                #2  0x0000aaaaafa2dc34 n/a (/root/t/elfutils/elfutils-0.180/tests/backtrace-child + 0xc34)
                #3  0x0000ffffa71c5878 __libc_start_main (libc.so.6 + 0x24878)
backtrace: backtrace.c:144: callback_verify: Assertion `symname != NULL && strcmp (symname, "backtracegen") == 0' failed.
./test-subr.sh: line 84:  8904 Aborted                 (core dumped) LD_LIBRARY_PATH="${built_library_path}${LD_LIBRARY_PATH:+:}$LD_LIBRARY_PATH" $VALGRIND_CMD "$@"
backtrace-child-core.7477: no main
rmdir: failed to remove 'test-7404': Directory not empty
FAIL run-backtrace-native-core.sh (exit status: 1)


Version-Release number of selected component (if applicable):
0.180

How reproducible: at the moment 100% 


Steps to Reproduce:
1. Acquire rawhide/f33 with gcc 10.2.1+recent binutils
2. build elfutils on that machine with `fedpkg local`


Actual results:
As seen above

backtrace: backtrace.c:144: callback_verify: Assertion `symname != NULL && strcmp (symname, "backtracegen") == 0' failed.


(glibc failure caused by elfutils)

++ /usr/lib/rpm/debugedit -b /root/t/glibc -d /usr/src/debug -i -l ./debugsources.list /root/rpmbuild/BUILDROOT/glibc-2.31.9000-21.fc33.aarch64/usr/bin/gencat
Failed to update file: invalid section entry size


Expected results:

Additional info:

Comment 1 Jeremy Linton 2020-07-28 16:43:24 UTC
Created attachment 1702690 [details]
exe and corefile from unit test failure.

Comment 2 Jeremy Linton 2020-07-28 16:45:19 UTC
Created attachment 1702691 [details]
gencat from debuginfo extraction failure

Comment 3 Mark Wielaard 2020-07-28 21:38:44 UTC
So this is really 2 bugs.

1) elfutils backtrace failing when building with -mbranch-protection=standard

2) rpm debugedit (which used elfutils libelf) not being able to update a file because of "invalid section entry size".

I can replicate 1) by building upstream elfutils with CFLAGS="-g -O2 -mbranch-protection=standard" CXXFLAGS="$CFLAGS"
In that case both run-backtrace-native.sh and run-backtrace-native-core.sh fail. They succeed without -mbranch-protection=standard

Issue 2) can be shown with the gencat ELF file attachment:
# eu-elflint --gnu ./gencat 
section [14] '.plt': size not multiple of entry size
section [23] '.dynamic': entry 22: unknown tag

And indeed, the .plt section is bad:
[14] .plt                 PROGBITS     0000000000401140 00001140 00000410 24 AX     0   0 16

410 hex = 1040 is not dividable by the entry size 24
(it looks like there are 43 entries and then 8 extra bytes)

I'll try to figure out issue 1. But issue 2 must be somewhere else, probably binutils ld which generated the .plt section.

Comment 4 Mark Wielaard 2020-07-28 21:45:33 UTC
> section [23] '.dynamic': entry 22: unknown tag

BTW. This is   <unknown>: 0x70000001 000000000000000000
If someone knows what d_tag type 0x70000001 (DT_LOPROC + 1) is, that would be appreciated.
It isn't listed in glibc /usr/include/elf.h (which is what elfutils uses).
The only entry for aarch64 is #define DT_AARCH64_VARIANT_PCS  (DT_LOPROC + 5)

Comment 5 Mark Wielaard 2020-07-29 10:16:07 UTC
Note that this does NOT seem to impact the mass rebuild going on.
As far as I can see builds on aarch64 are fine, elfutils itself got rebuild without showing any failures:
https://kojipkgs.fedoraproject.org//packages/elfutils/0.180/6.fc33/data/logs/aarch64/build.log

It does look like it is using -mbranch-protection=standard
But I also see SKIP: run-backtrace-native-core.sh which means no core file was generated on the koji builder.

Same for glibc, I don't see any debugedit failures in the aarch64 build.log:
https://kojipkgs.fedoraproject.org//work/tasks/5655/47975655/build.log

Comment 6 Florian Weimer 2020-07-29 10:28:08 UTC
This issue may also trigger during an aarch64 rebuild of glibc if PAC+BTI is enabled:

extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/lib64/libutil-2.31.9000.so
explicitly decompress any DWARF compressed ELF sections in /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/sbin/ldconfig
extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/sbin/ldconfig
explicitly decompress any DWARF compressed ELF sections in /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat
extracting debug info from /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat
Failed to update file: invalid section entry size
error: Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install)
    Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install)

My guess: We do not see it more widely because glibc in the buildroot is built without PAC+BTI. The link editor does not produce the problematic output as a result, masking any elfutils problems that may exist.

Comment 7 Jakub Jelinek 2020-07-29 10:37:17 UTC
/* Processor specific dynamic array tags.  */
#define DT_AARCH64_BTI_PLT      (DT_LOPROC + 1)
#define DT_AARCH64_PAC_PLT      (DT_LOPROC + 3)
#define DT_AARCH64_VARIANT_PCS  (DT_LOPROC + 5)
is what binutils sources have.

Comment 8 Mark Wielaard 2020-07-29 10:44:43 UTC
(In reply to Jakub Jelinek from comment #7)
> /* Processor specific dynamic array tags.  */
> #define DT_AARCH64_BTI_PLT      (DT_LOPROC + 1)
> #define DT_AARCH64_PAC_PLT      (DT_LOPROC + 3)
> #define DT_AARCH64_VARIANT_PCS  (DT_LOPROC + 5)
> is what binutils sources have.

Ah, great, so this does seem to confirm that something is up with the .plt section.
Is there any documentation on what it means to have those tags in the dynamic array?

I looked to the change request at https://fedoraproject.org/wiki/Changes/Aarch64_PointerAuthentication
and asked around, but nobody seems to know anything about any ELF, DWARF or gabi changes.
But I guess there must be seeing the issues with the dynamic tags, .plt section and the fact that unwinding seems broken.

Can we merge them into glibc elf.h to expose them to other tools?

Comment 9 Mark Wielaard 2020-07-29 10:58:08 UTC
(In reply to Florian Weimer from comment #6)
> This issue may also trigger during an aarch64 rebuild of glibc if PAC+BTI is
> enabled:
> 
> extracting debug info from
> /builddir/build/BUILDROOT/glibc-2.31.9000-23.fc33.aarch64/usr/bin/gencat
> Failed to update file: invalid section entry size
> error: Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install)
>     Bad exit status from /var/tmp/rpm-tmp.lOeYPV (%install)

This issue is analyzed a bit in comment #3.
You can also see this running eu-elflint on gencat:
section [14] '.plt': size not multiple of entry size

Given some of the other observations, might it be that the linker somehow creates .plt entries of different sizes when creating gencat?
That would cause sh_size % sh_entsize != 0 which makes debugedit/libelf throw an error when it encounters such an .plt section.

Comment 10 Mark Wielaard 2020-07-29 11:13:46 UTC
GDB does seem able to unwind through the core file, but eu-stack doesn't:

# gdb --core tests/test-187673/core.187694 tests/backtrace-child

(gdb) thread apply all bt

Thread 2 (Thread 0xffff9777e010 (LWP 187694)):
#0  0x0000ffff97726610 in __pthread_clockjoin_ex () from /lib64/libpthread.so.0
#1  0x0000aaaad1523b3c in main (argc=<optimized out>, argv=<optimized out>) at backtrace-child.c:241

Thread 1 (Thread 0xffff975a6110 (LWP 187695)):
#0  0x0000ffff97730d48 in raise () from /lib64/libpthread.so.0
#1  0x0000aaaad1523d4c in sigusr2 (signo=<optimized out>) at backtrace-child.c:132
#2  0x0000aaaad1523e2c in stdarg (f=<optimized out>) at backtrace-child.c:176
#3  0x0000aaaad1523e44 in backtracegen () at backtrace-child.c:190
#4  0x0000aaaad1523e54 in start (arg=<optimized out>) at backtrace-child.c:205
#5  0x0000ffff97725294 in start_thread () from /lib64/libpthread.so.0
#6  0x0000ffff9767d27c in thread_start () from /lib64/libc.so.6

# eu-stack -v --core tests/test-187673/core.187694 --exec tests/backtrace-child
PID 187694 - core
TID 187695:
#0  0x0000ffff97730d48     raise - libpthread.so.0
#1  0x0000aaaad1523d4c - 1 sigusr2 - backtrace-child
    /root/elfutils/tests/backtrace-child.c:132:3
#2  0x0000aaaad1523e2c - 1 stdarg - backtrace-child
    /root/elfutils/tests/backtrace-child.c:176:3
#3  0x0000ffff9774c000 - 1 - libpthread.so.0
eu-stack: dwfl_thread_getframes tid 187695 at 0xffff9774bfff in libpthread.so.0: No DWARF information found
TID 187694:
#0  0x0000ffff97726610     __pthread_clockjoin_ex - libpthread.so.0
#1  0x0000aaaad1523b3c - 1 main - backtrace-child
    /root/elfutils/tests/backtrace-child.c:241:5
#2  0x0000ffff975cb838 - 1 __libc_start_main - libc.so.6
#3  0xf00000f4a90153f3 - 1
#4  0xf00000f4a90153f3 - 1
eu-stack: dwfl_thread_getframes tid 187694 at 0xf00000f4a90153f2 in <unknown>: No DWARF information found

Comment 11 Mark Wielaard 2020-07-29 12:01:49 UTC
Note that most backtraces actually work. Unless it goes through a signal frame.
Is there anything about PAC that changes how one unwinds through a signal frame?

Comment 12 Florian Weimer 2020-07-29 12:04:01 UTC
Regarding the gencat problem, the PLT0 entry for gencat has a different size than the other PLT entries:

Disassembly of section .plt:

0000000000401140 <.plt>:
  401140:       d503245f        bti     c
  401144:       a9bf7bf0        stp     x16, x30, [sp, #-16]!
  401148:       d00000f0        adrp    x16, 41f000 <__FRAME_END__+0x1abd4>
  40114c:       f9474a11        ldr     x17, [x16, #3728]
  401150:       913a4210        add     x16, x16, #0xe90
  401154:       d61f0220        br      x17
  401158:       d503201f        nop
  40115c:       d503201f        nop

0000000000401160 <memcpy@plt>:
  401160:       d503245f        bti     c
  401164:       d00000f0        adrp    x16, 41f000 <__FRAME_END__+0x1abd4>
  401168:       f9474e11        ldr     x17, [x16, #3736]
  40116c:       913a6210        add     x16, x16, #0xe98
  401170:       d61f0220        br      x17
  401174:       d503201f        nop

0000000000401178 <strlen@plt>:
  401178:       d503245f        bti     c
  40117c:       d00000f0        adrp    x16, 41f000 <__FRAME_END__+0x1abd4>
  401180:       f9475211        ldr     x17, [x16, #3744]
  401184:       913a8210        add     x16, x16, #0xea0
  401188:       d61f0220        br      x17
  40118c:       d503201f        nop

I don't think that's valid ELF. Another oddity is that the binary has just an AARCH64_BTI_PLT entry:

Dynamic section at offset 0xfc60 contains 29 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-aarch64.so.1]
 0x000000000000000c (INIT)               0x401120
 0x000000000000000d (FINI)               0x403868
 0x0000000000000019 (INIT_ARRAY)         0x41fc40
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x41fc48
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x0000000000000004 (HASH)               0x400330
 0x000000006ffffef5 (GNU_HASH)           0x400498
 0x0000000000000005 (STRTAB)             0x400990
 0x0000000000000006 (SYMTAB)             0x4004e0
 0x000000000000000a (STRSZ)              575 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x41fe80
 0x0000000000000002 (PLTRELSZ)           1008 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x400d30
 0x0000000000000007 (RELA)               0x400c88
 0x0000000000000008 (RELASZ)             168 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x0000000070000001 (AARCH64_BTI_PLT)    
 0x0000000000000018 (BIND_NOW)           
 0x000000006ffffffb (FLAGS_1)            Flags: NOW
 0x000000006ffffffe (VERNEED)            0x400c38
 0x000000006fffffff (VERNEEDNUM)         2
 0x000000006ffffff0 (VERSYM)             0x400bd0
 0x0000000000000000 (NULL)               0x0

But it enables both BTI *and* PAC:

Displaying notes found in: .note.gnu.property
  Owner                Data size        Description
  GNU                  0x00000010       NT_GNU_PROPERTY_TYPE_0
      Properties: AArch64 feature: BTI, PAC

Maybe ld got confused in some way? I'm going to file a binutils bug once I have a few more details.

Comment 13 Jeremy Linton (ARM) 2020-07-29 15:11:35 UTC
So, the arm-elf document https://developer.arm.com/documentation/ihi0056/g/ describes the elf related changes. 


In reference to #11 i remember there was a tweak around general exception handling, which affected libc (and that patch landed a year or so again IIRC), but I need to dig up the details.

Comment 14 Mark Wielaard 2020-07-30 13:37:09 UTC
(In reply to Jeremy Linton (ARM) from comment #13)
> So, the arm-elf document https://developer.arm.com/documentation/ihi0056/g/
> describes the elf related changes. 
> 
> In reference to #11 i remember there was a tweak around general exception
> handling, which affected libc (and that patch landed a year or so again
> IIRC), but I need to dig up the details.

Thanks. That is very useful. Looks like there were actually various ELF changes to support this.
Lets keep this bug to update elfutils for PAC/BTI.

I opened a separate bug for the binutils/ld issue (rpm debugedit being unable to process some files) as https://bugzilla.redhat.com/show_bug.cgi?id=1862110

Comment 15 Ben Cotton 2020-08-11 13:50:21 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 33 development cycle.
Changing version to 33.

Comment 16 Mark Wielaard 2020-09-02 21:31:21 UTC
The new way to unwind aarch64 pac with signed (mangled) return addresses is described in this gdb patch:
https://sourceware.org/legacy-ml/gdb-patches/2017-08/msg00171.html

Comment 17 Mark Wielaard 2020-09-03 13:39:50 UTC
(In reply to Mark Wielaard from comment #16)
> The new way to unwind aarch64 pac with signed (mangled) return addresses is
> described in this gdb patch:
> https://sourceware.org/legacy-ml/gdb-patches/2017-08/msg00171.html

Note that it took 1.5 years for the patch to actually show up:
https://sourceware.org/legacy-ml/gdb-patches/2019-03/msg00084.html

Comment 18 Mark Wielaard 2020-09-03 16:30:01 UTC
I posted a couple of patches upstream:
https://sourceware.org/pipermail/elfutils-devel/2020q3/date.html

libelf: Sync elf.h from glibc
backends: Implement aarch64_dynamic_tag_name and aarch64_dynamic_tag_check
libebl: Handle aarch64 bti, pac bits in gnu property note
libdw,readelf: Recognize DW_CFA_AARCH64_negate_ra_state

It recognizes the various bits (and resolves the confusion wrt DW_CFA_GNU_window_save) which allows unwinding again when the hardware doesn't actually do any pointer authentication. To actually handle mangled return addresses we need a bit more code. But for now this should give us a fully green testsuite again on most aarch64 (< ARMv8.3) hardware.

Comment 19 Fedora Update System 2020-09-08 13:43:43 UTC
FEDORA-2020-d63f2a2d61 has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2020-d63f2a2d61

Comment 20 Fedora Update System 2020-09-08 13:47:03 UTC
FEDORA-2020-820ac199ba has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-820ac199ba

Comment 21 Fedora Update System 2020-09-08 15:23:20 UTC
FEDORA-2020-820ac199ba has been pushed to the Fedora 32 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-820ac199ba`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-820ac199ba

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 22 Fedora Update System 2020-09-08 20:57:19 UTC
FEDORA-2020-d63f2a2d61 has been pushed to the Fedora 33 testing repository.
In short time you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-d63f2a2d61`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-d63f2a2d61

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 23 Fedora Update System 2020-09-10 17:31:32 UTC
FEDORA-2020-820ac199ba has been pushed to the Fedora 32 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 24 Fedora Update System 2020-09-25 16:51:52 UTC
FEDORA-2020-d63f2a2d61 has been pushed to the Fedora 33 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.