Bug 2027789

Summary:	glibc: backtrace function crashes without vdso on ppc64le
Product:	Red Hat Enterprise Linux 9	Reporter:	Mark Wielaard <mjw>
Component:	glibc	Assignee:	Florian Weimer <fweimer>
Status:	CLOSED ERRATA	QA Contact:	Sergey Kolosov <skolosov>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	9.0	CC:	ashankar, bugproxy, codonell, dj, fweimer, jchecahi, mnewsome, pfrankli, sipoyare, tulioqm
Target Milestone:	rc	Keywords:	Bugfix, Patch, Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glibc-2.34-11.el9	Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-05-17 15:48:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Mark Wielaard 2021-11-30 16:18:16 UTC

Description of problem:

When the vdso isn't available (like when running under valgrind) the backtrace () function crashes on ppc64le.

Version-Release number of selected component (if applicable):

glibc-2.34-8.el9.ppc64le

How reproducible:

Always without vdso (under valgrind), never with vdso.

Steps to Reproduce:

$ cat main.c

#include <stdio.h>
#include <execinfo.h>

void call_backtrace(){
        void * callstack[128];
        backtrace(callstack, 128);
}

int main(){
    call_backtrace();
    return 0;
}

$ gcc -g -o main main.c

$ valgrind ./main

Actual results:

==26805== Memcheck, a memory error detector
==26805== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==26805== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==26805== Command: ./main
==26805==
==26805== Invalid read of size 8
==26805==    at 0x4263B00: backtrace (backtrace.c:94)
==26805==    by 0x10000677: call_backtrace (main.c:6)
==26805==    by 0x100006BF: main (main.c:10)
==26805==  Address 0x525f52454b414552 is not stack'd, malloc'd or (recently) free'd
==26805==
==26805==
==26805== Process terminating with default action of signal 11 (SIGSEGV)
==26805==    at 0x4263B00: backtrace (backtrace.c:94)
==26805==    by 0x10000677: call_backtrace (main.c:6)
==26805==    by 0x100006BF: main (main.c:10)
==26805==
==26805== HEAP SUMMARY:
==26805==     in use at exit: 0 bytes in 0 blocks
==26805==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==26805==
==26805== All heap blocks were freed -- no leaks are possible
==26805==
==26805== For lists of detected and suppressed errors, rerun with: -s
==26805== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

Expected results:

No crash

Additional info:

The issues is in sysdeps/powerpc/powerpc64/backtrace.c:

static inline bool
is_sigtramp_address (void *nip)
{
#ifdef HAVE_SIGTRAMP_RT64
  if (nip == GLRO (dl_vdso_sigtramp_rt64) ||
      nip == GLRO (dl_vdso_sigtramp_rt64) + 4)
    return true;
#endif
  return false;
}

When there is no vdso dl_vdso_sigtramp_rt64 is NULL and so matches a NULL nip which indicates the end of stack. The function will return true and the code in __backtrace will try to interpret the current frame as a signal frame (which it isn't) and crash.

One solution is to backport:

commit 82fd7314c7df8c5555dce027df6f2c98ca5a927f
Author: Adhemerval Zanella <adhemerval.zanella>
Date:   Fri Feb 12 19:20:27 2021 +0300

    powerpc: Remove backtrace implementation
    
    The powerpc optimization to provide a fast stacktrace requires some
    ad-hoc code to handle Linux signal frames and the change is fragile
    once the kernel decides to slight change its execution sequence [1].
    
    The generic implementation work as-is and it should be future proof
    since the kernel provides the expected CFI directives in vDSO shared
    page.
    
    Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and
    powerpc64-linux-gnu.
    
    [1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html

This seems to work fine on fedora rawhide with glibc-2.34.9000-21.fc36.ppc64le

Another solution is to add a simple NULL check to is_sigtramp_address:

diff --git a/sysdeps/powerpc/powerpc64/backtrace.c b/sysdeps/powerpc/powerpc64/backtrace.c
index 37de9b5bdd..18ea9e48e4 100644
--- a/sysdeps/powerpc/powerpc64/backtrace.c
+++ b/sysdeps/powerpc/powerpc64/backtrace.c
@@ -68,8 +68,9 @@ static inline bool
 is_sigtramp_address (void *nip)
 {
 #ifdef HAVE_SIGTRAMP_RT64
-  if (nip == GLRO (dl_vdso_sigtramp_rt64) ||
-      nip == GLRO (dl_vdso_sigtramp_rt64) + 4)
+  if ((nip == GLRO (dl_vdso_sigtramp_rt64) ||
+       nip == GLRO (dl_vdso_sigtramp_rt64) + 4)
+      && nip)
     return true;
 #endif
   return false;

Comment 1 Florian Weimer 2021-11-30 16:40:46 UTC

I'd like to hear an opinion from the IBM team on this matter. Thanks!

Backporting the glibc change without backchain support in the libgcc unwinder seems problematic.

Comment 2 Tulio Magno Quites Machado Filho 2021-11-30 18:49:58 UTC

Florian, I noticed this bug has been reported against RHEL 9, which has a new GCC version.
Programs built with GCC 8 or newer won't be affected by the removal of the powerpc64-specific backtrace() from glibc.

The output from backtrace() is incomplete when a binary built with GCC 7 or older is executed on a glibc without the powerpc64-specific backtrace [1].
If this is a supported scenario for RHEL 9, then we have 2 options:

1. Remove the powerpc64-specific backtrace() anyway and get an incomplete output from backtrace().
2. Test if nip is NULL, generate an incomplete output from backtrace(), but it should have more details than option 1.

Unfortunately, I'm not aware of an alternative solution when both the VDSO and asynchronous unwind tables are unavailable.
Notice that according to [2], other architectures do not support backtracing when asynchronous unwind tables are unavailable.

[1] https://patchwork.sourceware.org/project/glibc/patch/20210212170941.1786380-1-adhemerval.zanella@linaro.org/#106601
[2] https://patchwork.sourceware.org/project/glibc/patch/20210212170941.1786380-1-adhemerval.zanella@linaro.org/#106615

Comment 3 Florian Weimer 2021-12-03 14:03:38 UTC

(In reply to Tulio Magno Quites Machado Filho from comment #2)
> Florian, I noticed this bug has been reported against RHEL 9, which has a
> new GCC version.
> Programs built with GCC 8 or newer won't be affected by the removal of the
> powerpc64-specific backtrace() from glibc.
> 
> The output from backtrace() is incomplete when a binary built with GCC 7 or
> older is executed on a glibc without the powerpc64-specific backtrace [1].

RHEL 7 binaries are still supposed to run on RHEL 9. For RHEL 9, we only used -fexceptions -funwind-tables, not -fexceptions -fasynchronous-unwind-tables. Shouldn't backtrace still work in this configuration?

I'll try to get a RHEL 7 ppc64le bugs and check how sticky the unwind table options are in practice. I think we have a downstream patch to enable them by default, so it's quite likely that non-distribution binaries built on RHEL have unwind tables.

That would leave a gap for non-RHEL-built binaries only. If backtrace doesn't crash and produce just less data, I think we are still good.

So we can remove the backchain-based unwinder from RHEL 9 glibc. Would you recommend we do that? The no-vDSO use case is quite extreme, but the DWARF-based unwinder might be more accurate in other situations, too.

Comment 4 Florian Weimer 2021-12-03 14:43:45 UTC

Nope, gcc-4.8.5-44.el7.ppc64le does not have the -funwind-tables default baked in. So binaries built on RHEL 7 only have unwinding data if explicitly requested by the programmer. I think that makes it a more difficult call to remove the backchain-based unwinder.

Comment 5 Tulio Magno Quites Machado Filho 2021-12-03 16:07:26 UTC

> Would you recommend we do that? The no-vDSO use case is quite extreme, but the DWARF-based unwinder might be more accurate in other situations, too.

Yes, that' what I'd prefer.  However...

> I think that makes it a more difficult call to remove the backchain-based unwinder.

... I agree with you here.  I guess the only solution available in order to provide similar backtrace() output for RHEL 7 binaries and not crash when executing on valgrind is by keeping the powerpc-specific backtrace() and adding the check if nip == NULL.

Notice that in my previous comment I was too harsh when I said:

>> 2. Test if nip is NULL, generate an incomplete output from backtrace(), but it should have more details than option 1.

This would only affect binaries executing when the VDSO is not available, e.g. on top of valgrind.
In the most common scenario, the output would be identical.

Comment 6 Mark Wielaard 2021-12-06 10:06:19 UTC

(In reply to Florian Weimer from comment #4)
> Nope, gcc-4.8.5-44.el7.ppc64le does not have the -funwind-tables default
> baked in. So binaries built on RHEL 7 only have unwinding data if explicitly
> requested by the programmer. I think that makes it a more difficult call to
> remove the backchain-based unwinder.

But weren't they injected through the redhat-rpm-config?
So at least system libraries and programs should have them.

Comment 7 Florian Weimer 2021-12-06 10:24:19 UTC

(In reply to Mark Wielaard from comment #6)
> (In reply to Florian Weimer from comment #4)
> > Nope, gcc-4.8.5-44.el7.ppc64le does not have the -funwind-tables default
> > baked in. So binaries built on RHEL 7 only have unwinding data if explicitly
> > requested by the programmer. I think that makes it a more difficult call to
> > remove the backchain-based unwinder.
> 
> But weren't they injected through the redhat-rpm-config?
> So at least system libraries and programs should have them.

Unfortunately, those aren't relevant here because we have rebuilt them anyway.

Comment 8 Florian Weimer 2021-12-07 12:22:03 UTC

Let's apply the patch from comment 0 as a downstream-only solution if QE agrees.

Comment 13 errata-xmlrpc 2022-05-17 15:48:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: glibc), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:3917