Bug 1658163
Summary: | kernel backtrace issue, aarch64 specific | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Martin Cermak <mcermak> |
Component: | systemtap | Assignee: | Frank Ch. Eigler <fche> |
systemtap sub component: | system-version | QA Contact: | Martin Cermak <mcermak> |
Status: | CLOSED ERRATA | Docs Contact: | Vladimír Slávik <vslavik> |
Severity: | low | ||
Priority: | low | CC: | bgollahe, bhsharma, dsmith, lberk, mcermak, mjw, mnewsome, wcohen |
Version: | 8.0 | ||
Target Milestone: | rc | ||
Target Release: | 8.0 | ||
Hardware: | aarch64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | systemtap-4.1-1.el8 | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-05 20:55:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1682585 | ||
Bug Blocks: | 1683831, 1701002 | ||
Attachments: |
Description
Martin Cermak
2018-12-11 12:25:57 UTC
*** Bug 1658161 has been marked as a duplicate of this bug. *** Get similar results on fedora 29 with the 4.20.10-200.fc29.aarch64 kernel and a systemtap build from git commit 25b181ae44ce. Turning on some additional debugging information with -DDEBUG_UNWIND -DDEBUG_SYMS # stap -DDEBUG_UNWIND -DDEBUG_SYMS --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }' Pass 1: parsed user script and 463 library scripts using 136884virt/117560res/8112shr/109260data kb, in 740usr/170sys/917real ms. Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 199088virt/180672res/9020shr/171464data kb, in 2160usr/270sys/2440real ms. Pass 3: using cached /root/.systemtap/cache/71/stap_71aefa8662cc9fe5c7a812fd99363eec_2509.c Pass 4: using cached /root/.systemtap/cache/71/stap_71aefa8662cc9fe5c7a812fd99363eec_2509.ko Pass 5: starting run. _stp_stack_unwind_one_kernel:296: STARTING kernel unwind 0xffff0000082de7e8 : kmem_cache_alloc+0x0/0x230 [kernel] _stp_stack_unwind_one_kernel:331: CONTINUING kernel unwind to depth 1 unwind:1490: pc=ffff0000082de7e8, ffff0000082de7e8 unwind:1533: trying debug_frame unwind_frame:1190: Module /usr/lib/debug/usr/lib/modules/4.20.10-200.fc29.aarch64/vmlinux: no unwind frame data unwind:1537: debug_frame failed: -5, trying eh_frame unwind_frame:1190: Module /usr/lib/debug/usr/lib/modules/4.20.10-200.fc29.aarch64/vmlinux: no unwind frame data _stp_stack_unwind_one_kernel:349: ret=-5 PC=ffff0000082de7e8 SP=ffff00000d733d80 _stp_stack_print_fallback:188: no fallback kernel stacktrace (giving up) 0x0 (inexact) Pass 5: run completed in 30usr/40sys/790real ms. Created attachment 1537613 [details]
stap -DDEBUG_UNWIND=1 -DDEBUG_SYMBOLS -DDEBUG_TRANS --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }' >& stap_results.out
Set more debug variables for systemtap and another run with:
stap -DDEBUG_UNWIND=1 -DDEBUG_SYMBOLS -DDEBUG_TRANS --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }' >& stap_results.out
The aarch64 has framepointers (CONFIG_FRAME_POINTER=y) so shouldn't it be falling back to using that? Used a "-k" option to see what is going into the systemtap module. The following explains why trying debug_frame and trying eh_frame fail. stap_symbols.h has: static struct _stp_module _stp_module_0 = { .name = "kernel", .path = "/usr/lib/debug/usr/lib/modules/4.20.10-200.fc29.aarch64/vmlinux", .eh_frame_addr = 0x0, .unwind_hdr_addr = 0x0, .debug_frame = NULL, .debug_frame_len = 0, .eh_frame = NULL, .eh_frame_len = 0, .unwind_hdr = NULL, .unwind_hdr_len = 0, .debug_line = NULL, .debug_line_len = 0, .sections = _stp_module_0_sections, .num_sections = sizeof(_stp_module_0_sections)/sizeof(struct _stp_section), .build_id_bits = (unsigned char *)"\x5a\x14\xce\x68\x9b\xe4\xa4\xe3\xb5\x9c\x86\xcf\x8f\xec\x79\x2c\xe5\x3f\x6c\x6c", .build_id_len = 20, .build_id_offset = 0xed0378, .notes_sect = 0, }; Looking at other _stp_module see some conditional setting of non-zero values for .eh_frame and .eh_frame_len: static struct _stp_module _stp_module_20 = { .name = "fat", .path = "/usr/lib/modules/4.20.10-200.fc29.aarch64/kernel/fs/fat/fat.ko.xz", .eh_frame_addr = 0xea88, .unwind_hdr_addr = 0x0, .debug_frame = NULL, .debug_frame_len = 0, #if defined(STP_USE_DWARF_UNWINDER) && defined(STP_NEED_UNWIND_DATA) .eh_frame = _stp_module_20_eh_frame, .eh_frame_len = 7684, .unwind_hdr = NULL, .unwind_hdr_len = 0, #else .eh_frame = NULL, .eh_frame_len = 0, .unwind_hdr = NULL, .unwind_hdr_len = 0, #endif /* STP_USE_DWARF_UNWINDER && STP_NEED_UNWIND_DATA*/ .debug_line = NULL, .debug_line_len = 0, .sections = _stp_module_20_sections, .num_sections = sizeof(_stp_module_20_sections)/sizeof(struct _stp_section), .build_id_bits = (unsigned char *)"\x5f\x89\x0\x2c\x9\x9a\x8e\xe0\x72\xae\xcc\xe7\x6e\x41\x42\x19\xfa\xa0\x56\x9", .build_id_len = 20, .build_id_offset = 0x10, .notes_sect = 0, }; Created attachment 1538612 [details]
run with linux kernel compiled with gcc option -
One problem identified is the kernel has empty .debug_frame sections. A new kernel was compiled with the gcc option -fno-dwarf2-cfi-asm. When running that kernel following command was run to see if the backtraces were properly generated:
# stap -DDEBUG_UNWIND=1 -DDEBUG_SYMBOLS -DDEBUG_TRANS --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }' >& stap_results_frame.out
The unwinding doesn't fully unwind the the stack:
# stap --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }'
Pass 1: parsed user script and 464 library scripts using 136256virt/116908res/8088shr/108632data kb, in 840usr/100sys/943real ms.
Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 197720virt/179184res/8896shr/170096data kb, in 2070usr/80sys/2145real ms.
Pass 3: using cached /root/.systemtap/cache/73/stap_73653534b91c0100600ac027e3226915_2327.c
Pass 4: using cached /root/.systemtap/cache/73/stap_73653534b91c0100600ac027e3226915_2327.ko
Pass 5: starting run.
0xffff0000082db430 : kmem_cache_alloc+0x0/0x230 [kernel]
0xffff000008313240 : getname+0x28/0x38 [kernel]
0x23c3ebe7
0x0 (inexact)
Pass 5: run completed in 10usr/70sys/774real ms.
There are three related issues involved with this lack of backtrace on aarch64: 1) aarch64 kernels are not built with .dwarf_frame (or .eh_frame) sections 2) dwarf unwinder doesn't seem to be working correctly for aarch64 3) no aarch64 simple-minded frame pointer based unwinder when no debuginfo around At one time the aarch64 kernels were built with .eh_frame sections. However, there is an unfortunate side effect that .eh_frame sections are loaded into memory when the code. For the kernel that means that a significant amount of memory is taken up by these sections at are never used by the kernel. This was disabled by the following kernel commit: commit 728dabd6d1751cf5e0f8e0535891393da62396e9 Author: William Cohen <wcohen> Date: Thu Jan 21 22:56:26 2016 -0500 Eliminate the .eh_frame sections from the aarch64 vmlinux and kernel modules By default the aarch64 gcc generates .eh_frame sections. Unlike .debug_frame sections, the .eh_frame sections are loaded into memory when the associated code is loaded. On an example kernel being built with this default the .eh_frame section in vmlinux used an extra 1.7MB of memory. The x86 disables the creation of the .eh_frame section. The aarch64 should probably do the same to save some memory. Signed-off-by: William Cohen <wcohen> Signed-off-by: Will Deacon <will.deacon> Adding a "-fno-dwarf2-cfi-asm" to the KBUILD_CFLAGS for aarch64 for the kernel build appears triggers the generation of .debug_frame sections (at least on fedora29) However, the unwinding does not work as expected as seen in comment#6 and its attachment. Lastly, even without the debuginfo it should be possible to unwind the stack for aarch64 linux kernels that are built with CONFIG_FRAME_POINTER=y, but systemtap does not have a fallback runtime/stack-arm64.c that has a frame pointer based unwinder. Created attachment 1539677 [details]
Patch to make use of kernel's save_stack_trace_regs function for backtraces
This is a WIP that addresses the third point in the previous comment. It does allow getting stack backtraces without the proper dwarf information in the kernel. It needs some proper checking to verify that save_stack_trace_regs is really available and generates sane information. Another requirements for this to work is that save_stack_trace_regs function is exported in the kernel. The aarch64 kernels do not currently do that. It is a one line patch to the kernel to add EXPORT_SYMBOL_GPL(save_stack_trace_regs) to the kernel. The openrisc, s390, and powerpc do export the function.
$ sudo ../install/bin/stap --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }'
[sudo] password for wcohen:
Pass 1: parsed user script and 458 library scripts using 105744virt/86120res/7752shr/78288data kb, in 600usr/70sys/664real ms.
Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 164624virt/146012res/8712shr/137168data kb, in 1980usr/140sys/2123real ms.
Pass 3: translated to C into "/tmp/stap1Npe4z/stap_3da6d2b2f655bb82c73c11a2d70b794b_2521_src.c" using 164624virt/146208res/8904shr/137168data kb, in 1820usr/1570sys/3402real ms.
Pass 4: compiled C into "stap_3da6d2b2f655bb82c73c11a2d70b794b_2521.ko" in 12980usr/1180sys/14097real ms.
Pass 5: starting run.
0xffff0000082db430 : kmem_cache_alloc+0x0/0x230 [kernel]
pc : [<ffff0000082db430>] lr : [<ffff0000083130b8>] pstate: 60400005
sp : ffff000010b03d80
x29: ffff000010b03d80 x28: ffff8003910c9d40
x27: 0000000000000000 x26: 0000000000000000
x25: 0000000056000000 x24: 0000000000000015
x23: 0000000000000000 x22: 0000000000000000
x21: ffff00000947d000 x20: 0000aaab1c7d2680
x19: 0000000000000000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000040000028
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 000000007fff0000 x6 : 0000000000080802
x5 : 0000000000000000 x4 : 0000000000000000
x3 : ffff8003e2edde70 x2 : 0000000000000000
x1 : 00000000006000c0 x0 : ffff8003fe170180
0xffff0000082db430 : kmem_cache_alloc+0x0/0x230 [kernel]
0xffff000008313240 : getname+0x28/0x38 [kernel]
0xffff0000082fd608 : do_sys_open+0xf8/0x1e8 [kernel]
0xffff0000082fd75c : __arm64_sys_openat+0x2c/0x38 [kernel]
0xffff000008098634 : el0_svc_common+0x94/0x108 [kernel]
0xffff0000080986e0 : el0_svc_handler+0x38/0x78 [kernel]
0xffff000008084488 : el0_svc+0x8/0xc [kernel]
Pass 5: run completed in 10usr/60sys/824real ms.
OK, the gist of that is to delegate to save_stack_trace_regs() if it's available. A stapconf check for that would be good. It could be an export-sensitive one, if you think that the EXPORT_SYMBOL_* patch can make it into the kernel ASAP. Otherwise it could be a symbol-presence based test, and this code could use runtime symbol resolution to link up with an unexported symbol. We have some prior art on that too (see _EXPORTED macros in stp_utrace.c). Created attachment 1540673 [details] If available, use kernel's save_stack_trace_regs() for fallback stack unwind This has been tested in powerpc and aarch64 machines to verify fallback mechanism unwinds the stack with the save_stack_trace_reg() function when available when there is no debuginfo around. A patch has been submitted upstream for the aarch64 kernel to export save_stack_trace_reg(): https://lore.kernel.org/patchwork/patch/1047246/ The following git commits in upstream systemtap are designed to address this problem: 90ff34ac0506f0d5440393a4c78565f6aaf9b726 8c1dde79e6a7166509e1bbaefb4fd0fc03f7f80c f7fb9c9182dc874e5d1f1b0c4adc7398b763216d The upstream systemtap will give a traceback for the reproducer: $ sudo stap --all-modules -vwe 'probe kernel.function("kmem_cache_alloc") { print_backtrace(); exit() }' Pass 1: parsed user script and 464 library scripts using 136892virt/117460res/8004shr/109272data kb, in 820usr/100sys/919real ms. Pass 2: analyzed script: 1 probe, 2 functions, 0 embeds, 0 globals using 199100virt/180516res/8848shr/171480data kb, in 2050usr/510sys/3838real ms. Pass 3: translated to C into "/tmp/stapmqAIZd/stap_71354e652ec67cdbf05e707d09569373_2405_src.c" using 199100virt/180836res/9116shr/171480data kb, in 1540usr/410sys/1955real ms. Pass 4: compiled C into "stap_71354e652ec67cdbf05e707d09569373_2405.ko" in 38040usr/4300sys/42943real ms. Pass 5: starting run. 0xffff0000082de748 : kmem_cache_alloc+0x0/0x230 [kernel] 0xffff0000082de748 : kmem_cache_alloc+0x0/0x230 [kernel] 0xffff000008317240 : getname+0x28/0x38 [kernel] 0xffff0000083015e0 : do_sys_open+0xf8/0x1e8 [kernel] 0xffff000008301734 : __arm64_sys_openat+0x2c/0x38 [kernel] 0xffff0000080985b4 : el0_svc_common+0x94/0x108 [kernel] 0xffff000008098660 : el0_svc_handler+0x38/0x78 [kernel] 0xffff000008084488 : el0_svc+0x8/0xc [kernel] Pass 5: run completed in 20usr/90sys/793real ms. Verified with systemtap-4.1-6.el8, all supp arches. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3366 |