Bug 2027789
| Summary: | glibc: backtrace function crashes without vdso on ppc64le | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Mark Wielaard <mjw> |
| Component: | glibc | Assignee: | Florian Weimer <fweimer> |
| Status: | CLOSED ERRATA | QA Contact: | Sergey Kolosov <skolosov> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 9.0 | CC: | ashankar, bugproxy, codonell, dj, fweimer, jchecahi, mnewsome, pfrankli, sipoyare, tulioqm |
| Target Milestone: | rc | Keywords: | Bugfix, Patch, Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | glibc-2.34-11.el9 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-05-17 15:48:51 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I'd like to hear an opinion from the IBM team on this matter. Thanks! Backporting the glibc change without backchain support in the libgcc unwinder seems problematic. Florian, I noticed this bug has been reported against RHEL 9, which has a new GCC version. Programs built with GCC 8 or newer won't be affected by the removal of the powerpc64-specific backtrace() from glibc. The output from backtrace() is incomplete when a binary built with GCC 7 or older is executed on a glibc without the powerpc64-specific backtrace [1]. If this is a supported scenario for RHEL 9, then we have 2 options: 1. Remove the powerpc64-specific backtrace() anyway and get an incomplete output from backtrace(). 2. Test if nip is NULL, generate an incomplete output from backtrace(), but it should have more details than option 1. Unfortunately, I'm not aware of an alternative solution when both the VDSO and asynchronous unwind tables are unavailable. Notice that according to [2], other architectures do not support backtracing when asynchronous unwind tables are unavailable. [1] https://patchwork.sourceware.org/project/glibc/patch/20210212170941.1786380-1-adhemerval.zanella@linaro.org/#106601 [2] https://patchwork.sourceware.org/project/glibc/patch/20210212170941.1786380-1-adhemerval.zanella@linaro.org/#106615 (In reply to Tulio Magno Quites Machado Filho from comment #2) > Florian, I noticed this bug has been reported against RHEL 9, which has a > new GCC version. > Programs built with GCC 8 or newer won't be affected by the removal of the > powerpc64-specific backtrace() from glibc. > > The output from backtrace() is incomplete when a binary built with GCC 7 or > older is executed on a glibc without the powerpc64-specific backtrace [1]. RHEL 7 binaries are still supposed to run on RHEL 9. For RHEL 9, we only used -fexceptions -funwind-tables, not -fexceptions -fasynchronous-unwind-tables. Shouldn't backtrace still work in this configuration? I'll try to get a RHEL 7 ppc64le bugs and check how sticky the unwind table options are in practice. I think we have a downstream patch to enable them by default, so it's quite likely that non-distribution binaries built on RHEL have unwind tables. That would leave a gap for non-RHEL-built binaries only. If backtrace doesn't crash and produce just less data, I think we are still good. So we can remove the backchain-based unwinder from RHEL 9 glibc. Would you recommend we do that? The no-vDSO use case is quite extreme, but the DWARF-based unwinder might be more accurate in other situations, too. Nope, gcc-4.8.5-44.el7.ppc64le does not have the -funwind-tables default baked in. So binaries built on RHEL 7 only have unwinding data if explicitly requested by the programmer. I think that makes it a more difficult call to remove the backchain-based unwinder. > Would you recommend we do that? The no-vDSO use case is quite extreme, but the DWARF-based unwinder might be more accurate in other situations, too. Yes, that' what I'd prefer. However... > I think that makes it a more difficult call to remove the backchain-based unwinder. ... I agree with you here. I guess the only solution available in order to provide similar backtrace() output for RHEL 7 binaries and not crash when executing on valgrind is by keeping the powerpc-specific backtrace() and adding the check if nip == NULL. Notice that in my previous comment I was too harsh when I said: >> 2. Test if nip is NULL, generate an incomplete output from backtrace(), but it should have more details than option 1. This would only affect binaries executing when the VDSO is not available, e.g. on top of valgrind. In the most common scenario, the output would be identical. (In reply to Florian Weimer from comment #4) > Nope, gcc-4.8.5-44.el7.ppc64le does not have the -funwind-tables default > baked in. So binaries built on RHEL 7 only have unwinding data if explicitly > requested by the programmer. I think that makes it a more difficult call to > remove the backchain-based unwinder. But weren't they injected through the redhat-rpm-config? So at least system libraries and programs should have them. (In reply to Mark Wielaard from comment #6) > (In reply to Florian Weimer from comment #4) > > Nope, gcc-4.8.5-44.el7.ppc64le does not have the -funwind-tables default > > baked in. So binaries built on RHEL 7 only have unwinding data if explicitly > > requested by the programmer. I think that makes it a more difficult call to > > remove the backchain-based unwinder. > > But weren't they injected through the redhat-rpm-config? > So at least system libraries and programs should have them. Unfortunately, those aren't relevant here because we have rebuilt them anyway. Let's apply the patch from comment 0 as a downstream-only solution if QE agrees. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: glibc), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:3917 |
Description of problem: When the vdso isn't available (like when running under valgrind) the backtrace () function crashes on ppc64le. Version-Release number of selected component (if applicable): glibc-2.34-8.el9.ppc64le How reproducible: Always without vdso (under valgrind), never with vdso. Steps to Reproduce: $ cat main.c #include <stdio.h> #include <execinfo.h> void call_backtrace(){ void * callstack[128]; backtrace(callstack, 128); } int main(){ call_backtrace(); return 0; } $ gcc -g -o main main.c $ valgrind ./main Actual results: ==26805== Memcheck, a memory error detector ==26805== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==26805== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==26805== Command: ./main ==26805== ==26805== Invalid read of size 8 ==26805== at 0x4263B00: backtrace (backtrace.c:94) ==26805== by 0x10000677: call_backtrace (main.c:6) ==26805== by 0x100006BF: main (main.c:10) ==26805== Address 0x525f52454b414552 is not stack'd, malloc'd or (recently) free'd ==26805== ==26805== ==26805== Process terminating with default action of signal 11 (SIGSEGV) ==26805== at 0x4263B00: backtrace (backtrace.c:94) ==26805== by 0x10000677: call_backtrace (main.c:6) ==26805== by 0x100006BF: main (main.c:10) ==26805== ==26805== HEAP SUMMARY: ==26805== in use at exit: 0 bytes in 0 blocks ==26805== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==26805== ==26805== All heap blocks were freed -- no leaks are possible ==26805== ==26805== For lists of detected and suppressed errors, rerun with: -s ==26805== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) Segmentation fault (core dumped) Expected results: No crash Additional info: The issues is in sysdeps/powerpc/powerpc64/backtrace.c: static inline bool is_sigtramp_address (void *nip) { #ifdef HAVE_SIGTRAMP_RT64 if (nip == GLRO (dl_vdso_sigtramp_rt64) || nip == GLRO (dl_vdso_sigtramp_rt64) + 4) return true; #endif return false; } When there is no vdso dl_vdso_sigtramp_rt64 is NULL and so matches a NULL nip which indicates the end of stack. The function will return true and the code in __backtrace will try to interpret the current frame as a signal frame (which it isn't) and crash. One solution is to backport: commit 82fd7314c7df8c5555dce027df6f2c98ca5a927f Author: Adhemerval Zanella <adhemerval.zanella> Date: Fri Feb 12 19:20:27 2021 +0300 powerpc: Remove backtrace implementation The powerpc optimization to provide a fast stacktrace requires some ad-hoc code to handle Linux signal frames and the change is fragile once the kernel decides to slight change its execution sequence [1]. The generic implementation work as-is and it should be future proof since the kernel provides the expected CFI directives in vDSO shared page. Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and powerpc64-linux-gnu. [1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html This seems to work fine on fedora rawhide with glibc-2.34.9000-21.fc36.ppc64le Another solution is to add a simple NULL check to is_sigtramp_address: diff --git a/sysdeps/powerpc/powerpc64/backtrace.c b/sysdeps/powerpc/powerpc64/backtrace.c index 37de9b5bdd..18ea9e48e4 100644 --- a/sysdeps/powerpc/powerpc64/backtrace.c +++ b/sysdeps/powerpc/powerpc64/backtrace.c @@ -68,8 +68,9 @@ static inline bool is_sigtramp_address (void *nip) { #ifdef HAVE_SIGTRAMP_RT64 - if (nip == GLRO (dl_vdso_sigtramp_rt64) || - nip == GLRO (dl_vdso_sigtramp_rt64) + 4) + if ((nip == GLRO (dl_vdso_sigtramp_rt64) || + nip == GLRO (dl_vdso_sigtramp_rt64) + 4) + && nip) return true; #endif return false;