Bug 2044483 - kernel: vDSO lacks unwind information on aarch64
Summary: kernel: vDSO lacks unwind information on aarch64
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 35
Hardware: aarch64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 2044811
TreeView+ depends on / blocked
 
Reported: 2022-01-24 17:04 UTC by Florian Weimer
Modified: 2022-12-13 16:25 UTC (History)
21 users (show)

Fixed In Version:
Clone Of:
: 2044811 (view as bug list)
Environment:
Last Closed: 2022-12-13 16:25:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2044265 1 unspecified CLOSED gcc: Crash in _Unwind_Find_FDE if object lacks unwind information 2022-01-27 18:33:49 UTC

Description Florian Weimer 2022-01-24 17:04:02 UTC
As seen with kernel-core-5.15.16-200.fc35.aarch64:

# eu-readelf -l /lib/modules/5.15.16-200.fc35.aarch64/vdso/vdso.so 
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x000938 0x000938 R E 0x10
  DYNAMIC        0x000808 0x0000000000000808 0x0000000000000808 0x000110 0x000110 R   0x8
  NOTE           0x000268 0x0000000000000268 0x0000000000000268 0x00006c 0x00006c R   0x4

 Section to Segment mapping:
  Segment Sections...
   00      [RO: .hash .dynsym .dynstr .gnu.version .gnu.version_d .note .text .dynamic .got .got.plt]
   01      [RO: .dynamic]
   02      [RO: .note]

This may cause issues when unwinding through signal frames, or through vDSO code in general.

On other architectures, the GNU_EH_FRAME segment is present, for example:

# eu-readelf -l /lib/modules/5.15.15-200.fc35.x86_64/vdso/vdso64.so 
Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x000000 0x0000000000000000 0x0000000000000000 0x000d84 0x000d84 R E 0x1000
  DYNAMIC        0x0003e0 0x00000000000003e0 0x00000000000003e0 0x000120 0x000120 R   0x8
  NOTE           0x000500 0x0000000000000500 0x0000000000000500 0x000068 0x000068 R   0x4
  GNU_EH_FRAME   0x000568 0x0000000000000568 0x0000000000000568 0x000044 0x000044 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00      [RO: .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_d .dynamic .note .eh_frame_hdr .eh_frame .text .altinstructions .altinstr_replacement __ex_table]
   01      [RO: .dynamic]
   02      [RO: .note]
   03      [RO: .eh_frame_hdr]

Comment 1 Mark Salter 2022-01-26 14:59:14 UTC
Looks like this was done intentionally for v5.11:

commit e2bba5f92354488c331b7821d873db7c388e31aa
Author: Peter Collingbourne <pcc>
Date:   Wed Dec 30 14:19:54 2020 -0800

    arm64: vdso: disable .eh_frame_hdr via /DISCARD/ instead of --no-eh-frame-hdr
    
    Currently with ld.lld we emit an empty .eh_frame_hdr section (and a
    corresponding program header) into the vDSO. With ld.bfd the section
    is not emitted but the program header is, with p_vaddr set to 0. This
    can lead to unwinders attempting to interpret the data at whichever
    location the program header happens to point to as an unwind info
    header. This happens to be mostly harmless as long as the byte at
    that location (interpreted as a version number) has a value other
    than 1, causing both libgcc and LLVM libunwind to ignore the section
    (in libunwind's case, after printing an error message to stderr),
    but it could lead to worse problems if the byte happened to be 1 or
    the program header points to non-readable memory (e.g. if the empty
    section was placed at a page boundary).
    
    Instead of disabling .eh_frame_hdr via --no-eh-frame-hdr (which
    also has the downside of being unsupported by older versions of GNU
    binutils), disable it by discarding the section, and stop emitting
    the program header that points to it.
    
    I understand that we intend to emit valid unwind info for the vDSO
    at some point. Once that happens this patch can be reverted.

Comment 2 Florian Weimer 2022-01-26 16:41:26 UTC
Huh, this is odd.  We seem to have valid unwind data downstream:

# eu-readelf -a /lib/modules/4.18.0-326.el8.kpq1.aarch64/vdso/vdso.so
[…]
Call frame search table section [ 8] '.eh_frame_hdr':
 version:          1
 eh_frame_ptr_enc: 0x1b (sdata4 pcrel)
 fde_count_enc:    0x3 (udata4)
 table_enc:        0x3b (sdata4 datarel)
 eh_frame_ptr:     0x34 (offset: 0x7e0)
 fde_count:        5
 Table:
  0xfffffb58 (offset:  0x300) -> 0x4c fde=[    14]
  0xfffffc30 (offset:  0x3d8) -> 0x60 fde=[    28]
  0xfffffe40 (offset:  0x5e8) -> 0xb8 fde=[    80]
  0xffffff88 (offset:  0x730) -> 0xec fde=[    b4]
  0xfffffff8 (offset:  0x7a0) -> 0x120 fde=[    e8]
[…]

Just like on the other architectures.

Is it possible that the upstream change should have been restricted to LLD? I mean, the bug LLD bug is very real, but the upstream workaround for that introduces this regression.

Comment 3 Szabolcs Nagy 2022-06-27 16:30:33 UTC
the real issue is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=87676cfca14171fc4c99d96ae2f3e87780488ac4

commit 87676cfca14171fc4c99d96ae2f3e87780488ac4
Author:     Will Deacon <will>
AuthorDate: 2020-06-22 20:24:22 +0100
Commit:     Will Deacon <will>
CommitDate: 2020-06-23 14:47:03 +0100

    arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline

    Commit 7e9f5e6629f6 ("arm64: vdso: Add --eh-frame-hdr to ldflags") results
    in a .eh_frame_hdr section for the vDSO, which in turn causes the libgcc
    unwinder to unwind out of signal handlers using the .eh_frame information
    populated by our .cfi directives. In conjunction with a4eb355a3fda
    ("arm64: vdso: Fix CFI directives in sigreturn trampoline"), this has
    been shown to cause segmentation faults originating from within the
    unwinder during thread cancellation:
[...]

aarch64 DWARF was updated so better cfi is possible to do now:
https://github.com/ARM-software/abi-aa/commit/6100b37a3a3010d185b7d5a8c7cba2ed714b72e3

there may be libgcc bugs here too, it would be useful to know
what exactly worked before that fails now.

Comment 4 Ben Cotton 2022-11-29 17:43:52 UTC
This message is a reminder that Fedora Linux 35 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 35 on 2022-12-13.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '35'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 35 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 5 Ben Cotton 2022-12-13 16:25:54 UTC
Fedora Linux 35 entered end-of-life (EOL) status on 2022-12-13.

Fedora Linux 35 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.