Bug 1658163

Summary: kernel backtrace issue, aarch64 specific
Product: Red Hat Enterprise Linux 8 Reporter: Martin Cermak <mcermak>
Component: systemtapAssignee: Frank Ch. Eigler <fche>
systemtap sub component: system-version QA Contact: Martin Cermak <mcermak>
Status: CLOSED ERRATA Docs Contact: Vladimír Slávik <vslavik>
Severity: low    
Priority: low CC: bgollahe, bhsharma, dsmith, lberk, mcermak, mjw, mnewsome, wcohen
Version: 8.0   
Target Milestone: rc   
Target Release: 8.0   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: systemtap-4.1-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-05 20:55:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1682585    
Bug Blocks: 1683831, 1701002    
Attachments:
Description Flags
stap -DDEBUG_UNWIND=1 -DDEBUG_SYMBOLS -DDEBUG_TRANS --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }' >& stap_results.out
none
run with linux kernel compiled with gcc option -
none
Patch to make use of kernel's save_stack_trace_regs function for backtraces
none
If available, use kernel's save_stack_trace_regs() for fallback stack unwind none

Description Martin Cermak 2018-12-11 12:25:57 UTC
After the KASLR was introduced in rhel7.5 as the default, problem bz1572501 with kernel backtraces showed up.  That time it was fixed on all the supported arches except for s390x.  Now I see similar problem on rhel8 aarch64.  It is sort of a regression from the user POV because it works on rhel7 aarch64:

=======
 8.0 Server aarch64 # stap --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }'
Pass 1: parsed user script and 457 library scripts using 105728virt/86464res/9408shr/77632data kb, in 310usr/20sys/327real ms.
Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 154624virt/139456res/10368shr/126528data kb, in 970usr/140sys/1108real ms.
Pass 3: using cached /root/.systemtap/cache/4c/stap_4cba6977c29c496ae360951f465419ca_2403.c
Pass 4: using cached /root/.systemtap/cache/4c/stap_4cba6977c29c496ae360951f465419ca_2403.ko
Pass 5: starting run.
 0xffff0000082a6e40 : kmem_cache_alloc+0x0/0x210 [kernel]
 0x0 (inexact)
Pass 5: run completed in 10usr/50sys/1053real ms.
 8.0 Server aarch64 # 
 8.0 Server aarch64 # uname -r
4.18.0-50.el8.aarch64
 8.0 Server aarch64 # stap -V
Systemtap translator/driver (version 4.0/0.174, rpm 4.0-6.el8)
Copyright (C) 2005-2018 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
tested kernel versions: 2.6.18 ... 4.19-rc7
enabled features: AVAHI BPF JAVA PYTHON3 LIBRPM LIBSQLITE3 NLS NSS READLINE
 8.0 Server aarch64 #
=======

Comment 2 Martin Cermak 2018-12-11 12:54:08 UTC
*** Bug 1658161 has been marked as a duplicate of this bug. ***

Comment 3 William Cohen 2019-02-22 16:50:33 UTC
Get similar results on fedora 29 with the 4.20.10-200.fc29.aarch64 kernel and a systemtap build from git commit 25b181ae44ce. Turning on some additional debugging information with -DDEBUG_UNWIND -DDEBUG_SYMS

# stap  -DDEBUG_UNWIND -DDEBUG_SYMS --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }'
Pass 1: parsed user script and 463 library scripts using 136884virt/117560res/8112shr/109260data kb, in 740usr/170sys/917real ms.
Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 199088virt/180672res/9020shr/171464data kb, in 2160usr/270sys/2440real ms.
Pass 3: using cached /root/.systemtap/cache/71/stap_71aefa8662cc9fe5c7a812fd99363eec_2509.c
Pass 4: using cached /root/.systemtap/cache/71/stap_71aefa8662cc9fe5c7a812fd99363eec_2509.ko
Pass 5: starting run.
_stp_stack_unwind_one_kernel:296: STARTING kernel unwind
 0xffff0000082de7e8 : kmem_cache_alloc+0x0/0x230 [kernel]
_stp_stack_unwind_one_kernel:331: CONTINUING kernel unwind to depth 1
unwind:1490: pc=ffff0000082de7e8, ffff0000082de7e8
unwind:1533: trying debug_frame
unwind_frame:1190: Module /usr/lib/debug/usr/lib/modules/4.20.10-200.fc29.aarch64/vmlinux: no unwind frame data
unwind:1537: debug_frame failed: -5, trying eh_frame
unwind_frame:1190: Module /usr/lib/debug/usr/lib/modules/4.20.10-200.fc29.aarch64/vmlinux: no unwind frame data
_stp_stack_unwind_one_kernel:349: ret=-5 PC=ffff0000082de7e8 SP=ffff00000d733d80
_stp_stack_print_fallback:188: no fallback kernel stacktrace (giving up)
 0x0 (inexact)
Pass 5: run completed in 30usr/40sys/790real ms.

Comment 4 William Cohen 2019-02-22 17:08:53 UTC
Created attachment 1537613 [details]
stap  -DDEBUG_UNWIND=1 -DDEBUG_SYMBOLS -DDEBUG_TRANS --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }' >& stap_results.out

Set more debug variables for systemtap and another run with:

stap  -DDEBUG_UNWIND=1 -DDEBUG_SYMBOLS -DDEBUG_TRANS --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }' >& stap_results.out

Comment 5 William Cohen 2019-02-22 20:05:03 UTC
The aarch64 has framepointers (CONFIG_FRAME_POINTER=y) so shouldn't it be falling back to using that?

Used a "-k" option to see what is going into the systemtap module.  The following explains why trying debug_frame and trying eh_frame fail. stap_symbols.h has:

static struct _stp_module _stp_module_0 = {
.name = "kernel",
.path = "/usr/lib/debug/usr/lib/modules/4.20.10-200.fc29.aarch64/vmlinux",
.eh_frame_addr = 0x0,
.unwind_hdr_addr = 0x0,
.debug_frame = NULL,
.debug_frame_len = 0,
.eh_frame = NULL,
.eh_frame_len = 0,
.unwind_hdr = NULL,
.unwind_hdr_len = 0,
.debug_line = NULL,
.debug_line_len = 0,
.sections = _stp_module_0_sections,
.num_sections = sizeof(_stp_module_0_sections)/sizeof(struct _stp_section),
.build_id_bits = (unsigned char *)"\x5a\x14\xce\x68\x9b\xe4\xa4\xe3\xb5\x9c\x86\xcf\x8f\xec\x79\x2c\xe5\x3f\x6c\x6c",
.build_id_len = 20,
.build_id_offset = 0xed0378,
.notes_sect = 0,
};

Looking at other _stp_module see some conditional setting of non-zero values for .eh_frame and .eh_frame_len:

static struct _stp_module _stp_module_20 = {
.name = "fat",
.path = "/usr/lib/modules/4.20.10-200.fc29.aarch64/kernel/fs/fat/fat.ko.xz",
.eh_frame_addr = 0xea88,
.unwind_hdr_addr = 0x0,
.debug_frame = NULL,
.debug_frame_len = 0,
#if defined(STP_USE_DWARF_UNWINDER) && defined(STP_NEED_UNWIND_DATA)
.eh_frame = _stp_module_20_eh_frame,
.eh_frame_len = 7684,
.unwind_hdr = NULL,
.unwind_hdr_len = 0,
#else
.eh_frame = NULL,
.eh_frame_len = 0,
.unwind_hdr = NULL,
.unwind_hdr_len = 0,
#endif /* STP_USE_DWARF_UNWINDER && STP_NEED_UNWIND_DATA*/
.debug_line = NULL,
.debug_line_len = 0,
.sections = _stp_module_20_sections,
.num_sections = sizeof(_stp_module_20_sections)/sizeof(struct _stp_section),
.build_id_bits = (unsigned char *)"\x5f\x89\x0\x2c\x9\x9a\x8e\xe0\x72\xae\xcc\xe7\x6e\x41\x42\x19\xfa\xa0\x56\x9",
.build_id_len = 20,
.build_id_offset = 0x10,
.notes_sect = 0,
};

Comment 6 William Cohen 2019-02-25 22:03:39 UTC
Created attachment 1538612 [details]
run with linux kernel compiled with gcc option -

One problem identified is the kernel has empty .debug_frame sections.  A new kernel was compiled with the gcc option -fno-dwarf2-cfi-asm.  When running that kernel following command was run to see if the backtraces were properly generated:

# stap -DDEBUG_UNWIND=1 -DDEBUG_SYMBOLS -DDEBUG_TRANS --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }' >& stap_results_frame.out


The unwinding doesn't fully unwind the the stack:

#  stap --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }'
Pass 1: parsed user script and 464 library scripts using 136256virt/116908res/8088shr/108632data kb, in 840usr/100sys/943real ms.
Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 197720virt/179184res/8896shr/170096data kb, in 2070usr/80sys/2145real ms.
Pass 3: using cached /root/.systemtap/cache/73/stap_73653534b91c0100600ac027e3226915_2327.c
Pass 4: using cached /root/.systemtap/cache/73/stap_73653534b91c0100600ac027e3226915_2327.ko
Pass 5: starting run.
 0xffff0000082db430 : kmem_cache_alloc+0x0/0x230 [kernel]
 0xffff000008313240 : getname+0x28/0x38 [kernel]
 0x23c3ebe7
 0x0 (inexact)
Pass 5: run completed in 10usr/70sys/774real ms.

Comment 7 William Cohen 2019-02-26 21:11:02 UTC
There are three related issues involved with this lack of backtrace on aarch64:
1) aarch64 kernels are not built with .dwarf_frame (or .eh_frame) sections
2) dwarf unwinder doesn't seem to be working correctly for aarch64
3) no aarch64 simple-minded frame pointer based unwinder when no debuginfo around

At one time the aarch64 kernels were built with .eh_frame sections.  However, there is an unfortunate side effect that .eh_frame sections are loaded into memory when the code.  For the kernel that means that a significant amount of memory is taken up by these sections at are never used by the kernel.  This was disabled by the following kernel commit:

commit 728dabd6d1751cf5e0f8e0535891393da62396e9
Author: William Cohen <wcohen>
Date:   Thu Jan 21 22:56:26 2016 -0500

    Eliminate the .eh_frame sections from the aarch64 vmlinux and kernel modules
    
    By default the aarch64 gcc generates .eh_frame sections.  Unlike
    .debug_frame sections, the .eh_frame sections are loaded into memory
    when the associated code is loaded.  On an example kernel being built
    with this default the .eh_frame section in vmlinux used an extra 1.7MB
    of memory.  The x86 disables the creation of the .eh_frame section.
    The aarch64 should probably do the same to save some memory.
    
    Signed-off-by: William Cohen <wcohen>
    Signed-off-by: Will Deacon <will.deacon>

Adding a "-fno-dwarf2-cfi-asm" to the KBUILD_CFLAGS for aarch64 for the kernel build appears triggers the generation of .debug_frame sections (at least on fedora29)  However, the unwinding does not work as expected as seen in comment#6 and its attachment.

Lastly, even without the debuginfo it should be possible to unwind the stack for aarch64 linux kernels that are built with CONFIG_FRAME_POINTER=y, but systemtap does not have a fallback runtime/stack-arm64.c that has a frame pointer based unwinder.

Comment 8 William Cohen 2019-03-01 01:01:44 UTC
Created attachment 1539677 [details]
Patch to make use of kernel's save_stack_trace_regs function for backtraces

This is a WIP that addresses the third point in the previous comment.  It does allow getting stack backtraces without the proper dwarf information in the kernel.  It needs some proper checking to verify that save_stack_trace_regs is really available and generates sane information.  Another requirements for this to work is that save_stack_trace_regs function is exported in the kernel.  The aarch64 kernels do not currently do that. It is a one line patch to the kernel to add EXPORT_SYMBOL_GPL(save_stack_trace_regs) to the kernel.  The openrisc, s390, and powerpc do export the function.

$  sudo ../install/bin/stap --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }'
[sudo] password for wcohen: 
Pass 1: parsed user script and 458 library scripts using 105744virt/86120res/7752shr/78288data kb, in 600usr/70sys/664real ms.
Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 164624virt/146012res/8712shr/137168data kb, in 1980usr/140sys/2123real ms.
Pass 3: translated to C into "/tmp/stap1Npe4z/stap_3da6d2b2f655bb82c73c11a2d70b794b_2521_src.c" using 164624virt/146208res/8904shr/137168data kb, in 1820usr/1570sys/3402real ms.
Pass 4: compiled C into "stap_3da6d2b2f655bb82c73c11a2d70b794b_2521.ko" in 12980usr/1180sys/14097real ms.
Pass 5: starting run.
 0xffff0000082db430 : kmem_cache_alloc+0x0/0x230 [kernel]
pc : [<ffff0000082db430>] lr : [<ffff0000083130b8>] pstate: 60400005
sp : ffff000010b03d80
x29: ffff000010b03d80 x28: ffff8003910c9d40 
x27: 0000000000000000 x26: 0000000000000000 
x25: 0000000056000000 x24: 0000000000000015 
x23: 0000000000000000 x22: 0000000000000000 
x21: ffff00000947d000 x20: 0000aaab1c7d2680 
x19: 0000000000000000 x18: 0000000000000000 
x17: 0000000000000000 x16: 0000000000000000 
x15: 0000000000000000 x14: 0000000000000000 
x13: 0000000000000000 x12: 0000000000000000 
x11: 0000000000000000 x10: 0000000040000028 
x9 : 0000000000000000 x8 : 0000000000000000 
x7 : 000000007fff0000 x6 : 0000000000080802 
x5 : 0000000000000000 x4 : 0000000000000000 
x3 : ffff8003e2edde70 x2 : 0000000000000000 
x1 : 00000000006000c0 x0 : ffff8003fe170180 

 0xffff0000082db430 : kmem_cache_alloc+0x0/0x230 [kernel]
 0xffff000008313240 : getname+0x28/0x38 [kernel]
 0xffff0000082fd608 : do_sys_open+0xf8/0x1e8 [kernel]
 0xffff0000082fd75c : __arm64_sys_openat+0x2c/0x38 [kernel]
 0xffff000008098634 : el0_svc_common+0x94/0x108 [kernel]
 0xffff0000080986e0 : el0_svc_handler+0x38/0x78 [kernel]
 0xffff000008084488 : el0_svc+0x8/0xc [kernel]
Pass 5: run completed in 10usr/60sys/824real ms.

Comment 9 Frank Ch. Eigler 2019-03-01 18:20:05 UTC
OK, the gist of that is to delegate to save_stack_trace_regs() if it's available.  A stapconf check for that would be good.  It could be an export-sensitive one, if you think that the EXPORT_SYMBOL_* patch can make it into the kernel ASAP.  Otherwise it could be a symbol-presence based test, and this code could use runtime symbol resolution to link up with an unexported symbol.  We have some prior art on that too (see _EXPORTED macros in stp_utrace.c).

Comment 10 William Cohen 2019-03-04 16:32:34 UTC
Created attachment 1540673 [details]
If available, use kernel's save_stack_trace_regs() for  fallback stack unwind

This has been tested in powerpc and aarch64 machines to verify fallback mechanism unwinds the stack with the save_stack_trace_reg() function when available when there is no debuginfo around.

A patch has been submitted upstream for the aarch64 kernel to export save_stack_trace_reg(): https://lore.kernel.org/patchwork/patch/1047246/

Comment 11 William Cohen 2019-03-18 21:43:43 UTC
The following git commits in upstream systemtap are designed to address this problem:

90ff34ac0506f0d5440393a4c78565f6aaf9b726
8c1dde79e6a7166509e1bbaefb4fd0fc03f7f80c
f7fb9c9182dc874e5d1f1b0c4adc7398b763216d

The upstream systemtap will give a traceback for the reproducer:

$ sudo stap --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }'
Pass 1: parsed user script and 464 library scripts using 136892virt/117460res/8004shr/109272data kb, in 820usr/100sys/919real ms.
Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 199100virt/180516res/8848shr/171480data kb, in 2050usr/510sys/3838real ms.
Pass 3: translated to C into "/tmp/stapmqAIZd/stap_71354e652ec67cdbf05e707d09569373_2405_src.c" using 199100virt/180836res/9116shr/171480data kb, in 1540usr/410sys/1955real ms.
Pass 4: compiled C into "stap_71354e652ec67cdbf05e707d09569373_2405.ko" in 38040usr/4300sys/42943real ms.
Pass 5: starting run.
 0xffff0000082de748 : kmem_cache_alloc+0x0/0x230 [kernel]
 0xffff0000082de748 : kmem_cache_alloc+0x0/0x230 [kernel]
 0xffff000008317240 : getname+0x28/0x38 [kernel]
 0xffff0000083015e0 : do_sys_open+0xf8/0x1e8 [kernel]
 0xffff000008301734 : __arm64_sys_openat+0x2c/0x38 [kernel]
 0xffff0000080985b4 : el0_svc_common+0x94/0x108 [kernel]
 0xffff000008098660 : el0_svc_handler+0x38/0x78 [kernel]
 0xffff000008084488 : el0_svc+0x8/0xc [kernel]
Pass 5: run completed in 20usr/90sys/793real ms.

Comment 14 Martin Cermak 2019-08-27 13:40:13 UTC
Verified with systemtap-4.1-6.el8, all supp arches.

Comment 16 errata-xmlrpc 2019-11-05 20:55:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3366