Bug 2044265

Summary: gcc: Crash in _Unwind_Find_FDE if object lacks unwind information
Product: [Fedora] Fedora Reporter: Mattias Ellert <mattias.ellert>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: aoliva, dmalcolm, fweimer, jakub, jwakely, law, mpolacek, msebor, nickc, sipoyare
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Unspecified   
Whiteboard:
Fixed In Version: gcc-12.0.1-0.3.fc36 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-27 18:33:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2045144, 2045441, 2045442    
Attachments:
Description Flags
Backtrace for xrootd
none
Backtrace for globus-common
none
Backtrace for globus-gssapi-gsi
none
Backtrace for globus-common with debuginfo installed none

Description Mattias Ellert 2022-01-24 10:44:36 UTC
Created attachment 1853051 [details]
Backtrace for xrootd

Description of problem:

Three of the packages I maintain started failing to build after the introduction of gcc 12 in rawhide for what appeard to be the same issue.

The three packages are
 - globus-common
 - globus-gssapi-gsi
 - xrootd

The problem does not affect the compilation itself, but affects the running of binaries during the build. For the two globus packages, one of the tests in the test suite fails, while for xrootd the documentation generation (using sphinx-build) fails.

I have reproduced the failures in an aarch64 Fedora Rawhide mock buildroot on aarch64-test01.fedorainfracloud.org. For all three packages the backtrace ends in:

(gdb) bt
#0  0x0000fffff47f0944 in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#1  0x0000fffff47ed508 in uw_frame_state_for () from /lib64/libgcc_s.so.1
#2  0x0000fffff47eea14 in _Unwind_ForcedUnwind_Phase2 () from /lib64/libgcc_s.so.1
#3  0x0000fffff47eedcc in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
#4  0x0000fffff7b3bedc in __pthread_unwind () from /lib64/libc.so.6
#5  0x0000fffff7b31138 in sigcancel_handler () from /lib64/libc.so.6
#6  <signal handler called>

The full backtrace for all three are attached.

The failures are specific to aarch64, and do not affect the other architectures.

Version-Release number of selected component (if applicable):

libgcc-12.0.1-0.2.fc36.aarch64

How reproducible:

Always

Steps to Reproduce:
1. Build one of the affected packages for aarch64.

Actual results:

Segmentation fault

Expected results:

Succe3sful build

Additional info:

The backtraces for globus-common and globus-gssapi-gsi are very similar, but the test in globus-common uses libglobus_common from the build being done - with gcc 12, while the test in globus-gssapi-gsi uses libglobus_common from the globus-common pacakge installed in the build root - which was built with an earlier gcc version. The failure does not seem to depend on which of them is used.

Comment 1 Mattias Ellert 2022-01-24 10:45:28 UTC
Created attachment 1853052 [details]
Backtrace for globus-common

Comment 2 Mattias Ellert 2022-01-24 10:46:37 UTC
Created attachment 1853053 [details]
Backtrace for globus-gssapi-gsi

Comment 3 Florian Weimer 2022-01-24 11:37:54 UTC
Could you install gcc debugging information and check where exactly the crash happens?

Do these programs use dlclose? If yes, I expect this happens during PT_GNU_EHFRAME parsing of just-closed objects. The program probably calls pthread_cancel and then immediate calls dlclose for a shared object that is part of the call stack of the canceled thread. This was always racy in principle, but with the fully concurrent unwinding in GCC 12, the race condition is more visible now.

(Note that you need both glibc 2.35 and a GCC 12 built against glibc 2.35 to reproduce this.)

Comment 4 Mattias Ellert 2022-01-24 13:28:04 UTC
Created attachment 1853068 [details]
Backtrace for globus-common with debuginfo installed

Here is the end of the backtrace for the globus-common case with debuginfo installed. The full backtrace is attached.

Thread 2 "thread_test_pth" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xfffff7d3f120 (LWP 34)]
_Unwind_Find_FDE (pc=0xfffff7ffb7eb, bases=bases@entry=0xfffff7d3d1b8) at ../../../libgcc/unwind-dw2-fde-dip.c:513
513	      return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame,
(gdb) bt
#0  _Unwind_Find_FDE (pc=0xfffff7ffb7eb, bases=bases@entry=0xfffff7d3d1b8)
    at ../../../libgcc/unwind-dw2-fde-dip.c:513
#1  0x0000fffff64dd508 in uw_frame_state_for (
    context=context@entry=0xfffff7d3ce90, fs=fs@entry=0xfffff7d3c380)
    at ../../../libgcc/unwind-dw2.c:1263
#2  0x0000fffff64dea14 in _Unwind_ForcedUnwind_Phase2 (
    exc=exc@entry=0xfffff7d3f590, context=context@entry=0xfffff7d3ce90, 
    frames_p=frames_p@entry=0xfffff7d3cac8) at ../../../libgcc/unwind.inc:162
#3  0x0000fffff64dedcc in _Unwind_ForcedUnwind (exc=0xfffff7d3f590, 
    stop=stop@entry=0xfffff7dfbd40 <unwind_stop>, stop_argument=0xfffff7d3e720)
    at ../../../libgcc/unwind.inc:218
#4  0x0000fffff7dfbedc in __GI___pthread_unwind (buf=<optimized out>)
    at unwind.c:130
#5  0x0000fffff7df1138 in __do_cancel () at ../sysdeps/nptl/pthreadP.h:280
#6  sigcancel_handler (sig=32, si=<optimized out>, ctx=<optimized out>)
    at pthread_cancel.c:56
#7  sigcancel_handler (sig=<optimized out>, si=<optimized out>, 
    ctx=<optimized out>) at pthread_cancel.c:32
#8  <signal handler called>

Is it expected that this failure would happen only for aarch64?

Comment 5 Florian Weimer 2022-01-24 16:49:12 UTC
Okay, this is my bug. I forgot to add a check that the object actually has a PT_GNU_EHFRAME segment.

There is also a kernel aspect to this: the vDSO is not built with asynchronous unwinding information, which exposes this bug. I think we want to fix that, too.

Comment 6 Florian Weimer 2022-01-24 17:12:00 UTC
Upstream patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589177.html

Comment 7 Jakub Jelinek 2022-01-27 10:31:37 UTC
Please retry against gcc-12.0.1-0.3.fc36 and annobin-10.51-2.fc36 now in rawhide.