Created attachment 1853051 [details] Backtrace for xrootd Description of problem: Three of the packages I maintain started failing to build after the introduction of gcc 12 in rawhide for what appeard to be the same issue. The three packages are - globus-common - globus-gssapi-gsi - xrootd The problem does not affect the compilation itself, but affects the running of binaries during the build. For the two globus packages, one of the tests in the test suite fails, while for xrootd the documentation generation (using sphinx-build) fails. I have reproduced the failures in an aarch64 Fedora Rawhide mock buildroot on aarch64-test01.fedorainfracloud.org. For all three packages the backtrace ends in: (gdb) bt #0 0x0000fffff47f0944 in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1 #1 0x0000fffff47ed508 in uw_frame_state_for () from /lib64/libgcc_s.so.1 #2 0x0000fffff47eea14 in _Unwind_ForcedUnwind_Phase2 () from /lib64/libgcc_s.so.1 #3 0x0000fffff47eedcc in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1 #4 0x0000fffff7b3bedc in __pthread_unwind () from /lib64/libc.so.6 #5 0x0000fffff7b31138 in sigcancel_handler () from /lib64/libc.so.6 #6 <signal handler called> The full backtrace for all three are attached. The failures are specific to aarch64, and do not affect the other architectures. Version-Release number of selected component (if applicable): libgcc-12.0.1-0.2.fc36.aarch64 How reproducible: Always Steps to Reproduce: 1. Build one of the affected packages for aarch64. Actual results: Segmentation fault Expected results: Succe3sful build Additional info: The backtraces for globus-common and globus-gssapi-gsi are very similar, but the test in globus-common uses libglobus_common from the build being done - with gcc 12, while the test in globus-gssapi-gsi uses libglobus_common from the globus-common pacakge installed in the build root - which was built with an earlier gcc version. The failure does not seem to depend on which of them is used.
Created attachment 1853052 [details] Backtrace for globus-common
Created attachment 1853053 [details] Backtrace for globus-gssapi-gsi
Could you install gcc debugging information and check where exactly the crash happens? Do these programs use dlclose? If yes, I expect this happens during PT_GNU_EHFRAME parsing of just-closed objects. The program probably calls pthread_cancel and then immediate calls dlclose for a shared object that is part of the call stack of the canceled thread. This was always racy in principle, but with the fully concurrent unwinding in GCC 12, the race condition is more visible now. (Note that you need both glibc 2.35 and a GCC 12 built against glibc 2.35 to reproduce this.)
Created attachment 1853068 [details] Backtrace for globus-common with debuginfo installed Here is the end of the backtrace for the globus-common case with debuginfo installed. The full backtrace is attached. Thread 2 "thread_test_pth" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xfffff7d3f120 (LWP 34)] _Unwind_Find_FDE (pc=0xfffff7ffb7eb, bases=bases@entry=0xfffff7d3d1b8) at ../../../libgcc/unwind-dw2-fde-dip.c:513 513 return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame, (gdb) bt #0 _Unwind_Find_FDE (pc=0xfffff7ffb7eb, bases=bases@entry=0xfffff7d3d1b8) at ../../../libgcc/unwind-dw2-fde-dip.c:513 #1 0x0000fffff64dd508 in uw_frame_state_for ( context=context@entry=0xfffff7d3ce90, fs=fs@entry=0xfffff7d3c380) at ../../../libgcc/unwind-dw2.c:1263 #2 0x0000fffff64dea14 in _Unwind_ForcedUnwind_Phase2 ( exc=exc@entry=0xfffff7d3f590, context=context@entry=0xfffff7d3ce90, frames_p=frames_p@entry=0xfffff7d3cac8) at ../../../libgcc/unwind.inc:162 #3 0x0000fffff64dedcc in _Unwind_ForcedUnwind (exc=0xfffff7d3f590, stop=stop@entry=0xfffff7dfbd40 <unwind_stop>, stop_argument=0xfffff7d3e720) at ../../../libgcc/unwind.inc:218 #4 0x0000fffff7dfbedc in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:130 #5 0x0000fffff7df1138 in __do_cancel () at ../sysdeps/nptl/pthreadP.h:280 #6 sigcancel_handler (sig=32, si=<optimized out>, ctx=<optimized out>) at pthread_cancel.c:56 #7 sigcancel_handler (sig=<optimized out>, si=<optimized out>, ctx=<optimized out>) at pthread_cancel.c:32 #8 <signal handler called> Is it expected that this failure would happen only for aarch64?
Okay, this is my bug. I forgot to add a check that the object actually has a PT_GNU_EHFRAME segment. There is also a kernel aspect to this: the vDSO is not built with asynchronous unwinding information, which exposes this bug. I think we want to fix that, too.
Upstream patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589177.html
Please retry against gcc-12.0.1-0.3.fc36 and annobin-10.51-2.fc36 now in rawhide.