Bug 2044265 - gcc: Crash in _Unwind_Find_FDE if object lacks unwind information
Summary: gcc: Crash in _Unwind_Find_FDE if object lacks unwind information
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: rawhide
Hardware: aarch64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 2045144 2045441 2045442
TreeView+ depends on / blocked
 
Reported: 2022-01-24 10:44 UTC by Mattias Ellert
Modified: 2022-01-27 18:33 UTC (History)
10 users (show)

Fixed In Version: gcc-12.0.1-0.3.fc36
Clone Of:
Environment:
Last Closed: 2022-01-27 18:33:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Backtrace for xrootd (10.70 KB, text/plain)
2022-01-24 10:44 UTC, Mattias Ellert
no flags Details
Backtrace for globus-common (4.67 KB, text/plain)
2022-01-24 10:45 UTC, Mattias Ellert
no flags Details
Backtrace for globus-gssapi-gsi (12.38 KB, text/plain)
2022-01-24 10:46 UTC, Mattias Ellert
no flags Details
Backtrace for globus-common with debuginfo installed (5.51 KB, text/plain)
2022-01-24 13:28 UTC, Mattias Ellert
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 104207 0 P3 UNCONFIRMED [12 Regression] Crash in _Unwind_Find_FDE if object lacks unwind information 2022-01-24 17:10:05 UTC

Internal Links: 2044483

Description Mattias Ellert 2022-01-24 10:44:36 UTC
Created attachment 1853051 [details]
Backtrace for xrootd

Description of problem:

Three of the packages I maintain started failing to build after the introduction of gcc 12 in rawhide for what appeard to be the same issue.

The three packages are
 - globus-common
 - globus-gssapi-gsi
 - xrootd

The problem does not affect the compilation itself, but affects the running of binaries during the build. For the two globus packages, one of the tests in the test suite fails, while for xrootd the documentation generation (using sphinx-build) fails.

I have reproduced the failures in an aarch64 Fedora Rawhide mock buildroot on aarch64-test01.fedorainfracloud.org. For all three packages the backtrace ends in:

(gdb) bt
#0  0x0000fffff47f0944 in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#1  0x0000fffff47ed508 in uw_frame_state_for () from /lib64/libgcc_s.so.1
#2  0x0000fffff47eea14 in _Unwind_ForcedUnwind_Phase2 () from /lib64/libgcc_s.so.1
#3  0x0000fffff47eedcc in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
#4  0x0000fffff7b3bedc in __pthread_unwind () from /lib64/libc.so.6
#5  0x0000fffff7b31138 in sigcancel_handler () from /lib64/libc.so.6
#6  <signal handler called>

The full backtrace for all three are attached.

The failures are specific to aarch64, and do not affect the other architectures.

Version-Release number of selected component (if applicable):

libgcc-12.0.1-0.2.fc36.aarch64

How reproducible:

Always

Steps to Reproduce:
1. Build one of the affected packages for aarch64.

Actual results:

Segmentation fault

Expected results:

Succe3sful build

Additional info:

The backtraces for globus-common and globus-gssapi-gsi are very similar, but the test in globus-common uses libglobus_common from the build being done - with gcc 12, while the test in globus-gssapi-gsi uses libglobus_common from the globus-common pacakge installed in the build root - which was built with an earlier gcc version. The failure does not seem to depend on which of them is used.

Comment 1 Mattias Ellert 2022-01-24 10:45:28 UTC
Created attachment 1853052 [details]
Backtrace for globus-common

Comment 2 Mattias Ellert 2022-01-24 10:46:37 UTC
Created attachment 1853053 [details]
Backtrace for globus-gssapi-gsi

Comment 3 Florian Weimer 2022-01-24 11:37:54 UTC
Could you install gcc debugging information and check where exactly the crash happens?

Do these programs use dlclose? If yes, I expect this happens during PT_GNU_EHFRAME parsing of just-closed objects. The program probably calls pthread_cancel and then immediate calls dlclose for a shared object that is part of the call stack of the canceled thread. This was always racy in principle, but with the fully concurrent unwinding in GCC 12, the race condition is more visible now.

(Note that you need both glibc 2.35 and a GCC 12 built against glibc 2.35 to reproduce this.)

Comment 4 Mattias Ellert 2022-01-24 13:28:04 UTC
Created attachment 1853068 [details]
Backtrace for globus-common with debuginfo installed

Here is the end of the backtrace for the globus-common case with debuginfo installed. The full backtrace is attached.

Thread 2 "thread_test_pth" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xfffff7d3f120 (LWP 34)]
_Unwind_Find_FDE (pc=0xfffff7ffb7eb, bases=bases@entry=0xfffff7d3d1b8) at ../../../libgcc/unwind-dw2-fde-dip.c:513
513	      return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame,
(gdb) bt
#0  _Unwind_Find_FDE (pc=0xfffff7ffb7eb, bases=bases@entry=0xfffff7d3d1b8)
    at ../../../libgcc/unwind-dw2-fde-dip.c:513
#1  0x0000fffff64dd508 in uw_frame_state_for (
    context=context@entry=0xfffff7d3ce90, fs=fs@entry=0xfffff7d3c380)
    at ../../../libgcc/unwind-dw2.c:1263
#2  0x0000fffff64dea14 in _Unwind_ForcedUnwind_Phase2 (
    exc=exc@entry=0xfffff7d3f590, context=context@entry=0xfffff7d3ce90, 
    frames_p=frames_p@entry=0xfffff7d3cac8) at ../../../libgcc/unwind.inc:162
#3  0x0000fffff64dedcc in _Unwind_ForcedUnwind (exc=0xfffff7d3f590, 
    stop=stop@entry=0xfffff7dfbd40 <unwind_stop>, stop_argument=0xfffff7d3e720)
    at ../../../libgcc/unwind.inc:218
#4  0x0000fffff7dfbedc in __GI___pthread_unwind (buf=<optimized out>)
    at unwind.c:130
#5  0x0000fffff7df1138 in __do_cancel () at ../sysdeps/nptl/pthreadP.h:280
#6  sigcancel_handler (sig=32, si=<optimized out>, ctx=<optimized out>)
    at pthread_cancel.c:56
#7  sigcancel_handler (sig=<optimized out>, si=<optimized out>, 
    ctx=<optimized out>) at pthread_cancel.c:32
#8  <signal handler called>

Is it expected that this failure would happen only for aarch64?

Comment 5 Florian Weimer 2022-01-24 16:49:12 UTC
Okay, this is my bug. I forgot to add a check that the object actually has a PT_GNU_EHFRAME segment.

There is also a kernel aspect to this: the vDSO is not built with asynchronous unwinding information, which exposes this bug. I think we want to fix that, too.

Comment 6 Florian Weimer 2022-01-24 17:12:00 UTC
Upstream patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589177.html

Comment 7 Jakub Jelinek 2022-01-27 10:31:37 UTC
Please retry against gcc-12.0.1-0.3.fc36 and annobin-10.51-2.fc36 now in rawhide.


Note You need to log in before you can comment on or make changes to this bug.