Bug 2172568

Summary:

Perf has difficulty producing accurate function backtraces for nbdkit

Product:

[Fedora] Fedora

Reporter:

William Cohen <wcohen>

Component:

kernel-tools

Assignee:

Orphan Owner <extras-orphan>

Status:

CLOSED EOL

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

low

Docs Contact:

Priority:

unspecified

Version:

CC:

acaringi, fche, jforbes, jwboyer, lgoncalv, mjw, pbrobinson, ptalbert

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2024-11-27 21:07:19 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
The FP and DWARF flamegraphs and output of perf report	none
New flamegraphs on rawhide machine.	none

Description William Cohen 2023-02-22 15:05:17 UTC

Created attachment 1945746 [details]
The FP and DWARF flamegraphs and output of perf report

A couple weeks ago I started talking with Richard WM Jones about his
experiences using the DWARF based call-graph support for debugging a
qemu issue mentioned at:

https://listman.redhat.com/archives/libguestfs/2023-February/030581.html

With the current enablement of frame pointers in Fedora rawhide it is
a bit easier to compare the results between perf data generated from
FP and DWARF data.  There were some issues he noticed on resulting
flame graphs:

-Fewer levels of calls in DWARF version
-"[unknown]" and binary names labelling rather than actual functions in DWARF version
-Detached frames

The fewer level of call levels may be due to the limited size of the
stack snapshot (8KB by default).  I initially thought that there the
DWARF unwinding might be getting the unwinding wrong, but also see
"[unknown]" and binary names in the resulting flamegraphs rather than
function names.

System setup:

sudo dnf install -y perf flamegraph flamegraph-stackcollapse
sudo dnf install -y qemu-img nbdkit nbdkit-curl-plugin
sudo dnf debuginfo-install -y qemu-img nbdkit nbdkit-curl-plugin
rpm -q kernel perf flamegraph flamegraph-stackcollapse qemu-img nbdkit nbdkit-curl-plugin

Current machine setup has:

kernel-6.2.0-0.rc8.20230217gitec35307e18ba.60.fc39.x86_64
perf-6.2.0-0.rc8.git0.1.fc39.x86_64
flamegraph-1.0-13.20220917gitd9fcc27.fc38.noarch
flamegraph-stackcollapse-1.0-13.20220917gitd9fcc27.fc38.noarch
qemu-img-7.2.0-6.fc38.x86_64
nbdkit-1.33.8-1.fc38.x86_64
nbdkit-curl-plugin-1.33.8-1.fc38.x86_64


It takes on the order of an hour to collect data using Generate FP backtrace because of the slow data link:

perf record --call-graph=fp --output=qemu_fp_perf.data  nbdkit -r curl https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img --filter=multi-conn multi-conn-mode=unsafe timeout=2000 --run ' qemu-img convert -W -m 16 -p -f qcow2 $uri -O raw jammy-server-cloudimg-amd64.img.raw '
perf script --input=qemu_fp_perf.data | stackcollapse-perf.pl > out.qemu_fp-folded
flamegraph.pl out.qemu_fp-folded > qemu_fp.svg
perf report --input qemu_fp_perf.data  > qemu_fp_perf.out


Similar data collection for the DWARF backtraces:

perf record --call-graph=dwarf --output=qemu_dwarf_perf.data  nbdkit -r curl https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img --filter=multi-conn multi-conn-mode=unsafe timeout=2000 --run ' qemu-img convert -W -m 16 -p -f qcow2 $uri -O raw jammy-server-cloudimg-amd64.img.raw '
perf script --input=qemu_dwarf_perf.data | stackcollapse-perf.pl > out.qemu_dwarf-folded
flamegraph.pl out.qemu_dwarf-folded > qemu_dwarf.svg
perf report --input qemu_dwarf_perf.data  > qemu_dwarf_perf.out

When looking through the qemu_*_perf.out files from perf report it
looks like both stack backtrace methods (FP and DWARF) are having
issues with this particular use case.

One of the things that might be confusing the FP unwinding is qemu-img
use of coroutines and stack switching.  A some "[unknown]" below the
__start_context of qemu-img binary on the FP flame graph.

On both FP and DWARF flame graphs see [nbdkit] and [libcurl.so.4.8.0]
rather than actual function names listed rather than actual function
names.

The *.out and *.svg fils are in nbdkit_perf_data.tar.gz.  The original perf record files are too large to attach to the bug.

Comment 1 William Cohen 2023-03-02 17:14:27 UTC

Most of the the unknown entries were caused by missing debuginfo on
the machine.  I reviewed which binaries had missing debuginfo and
installed them and reran the experiments.  That eliminated much of the
[unknown] entries in the data.  However, there are still some areas
where the data is lacking.  The updated flamegraphs are in
20230301.tar.gz added to this bug.

Summary of the places where  perf fp had issues:
-PLT entries
-libc syscalls and vdso code (also saw this issue on dwarf unwinding)
-libc exit() function not listed in the backtrace
-qemu coroutine code

PLT entries

I started to look up which addresses were problematic I did the
following to group things by address:

grep unknown qemu_fp_perf_script.out |sort | uniq -c |sort -k 2|more > qemu_fp_perf_unknowns.out

when looking through the data would see sections like the following in the output:

     93 	           12070 [unknown] (/usr/lib64/libcurl.so.4.8.0)
     51 	           120a0 [unknown] (/usr/lib64/libcurl.so.4.8.0)
      5 	           120a4 [unknown] (/usr/lib64/libcurl.so.4.8.0)
      7 	           120d0 [unknown] (/usr/lib64/libcurl.so.4.8.0)
      3 	           120d4 [unknown] (/usr/lib64/libcurl.so.4.8.0)
      1 	           12100 [unknown] (/usr/lib64/libcurl.so.4.8.0)

After verifying that libcurl debuginfo was install found that these addresses were mapping back to PLT entries like:

0000000000012070 <SSL_get_error@plt>:
   12070:	f3 0f 1e fa          	endbr64
   12074:	ff 25 26 6f 09 00    	jmp    *0x96f26(%rip)        # a8fa0 <SSL_get_error.0>
   1207a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)

Looking at the original "perf script" output would see things like:

nbdkit  2274 10327.200181:     264642 cycles:u: 
	           12070 [unknown] (/usr/lib64/libcurl.so.4.8.0)
	           75c1e ssl_cf_recv.lto_priv.0+0x2e (/usr/lib64/libcurl.so.4.8.0)
	           5b8ae Curl_read+0x5e (/usr/lib64/libcurl.so.4.8.0)
	           6e99a Curl_readwrite+0x7ea (/usr/lib64/libcurl.so.4.8.0)
	           52ceb multi_runsingle+0xa2b (/usr/lib64/libcurl.so.4.8.0)
	           5572e curl_multi_perform+0xde (/usr/lib64/libcurl.so.4.8.0)
	           290bb curl_easy_perform+0x19b (/usr/lib64/libcurl.so.4.8.0)
	            4b71 curl_pread+0xc1 (/usr/lib64/nbdkit/plugins/nbdkit-curl-plugin.so)
	            9e46 plugin_pread+0x66 (/usr/sbin/nbdkit)
	            d40b backend_pread+0xfb (/usr/sbin/nbdkit)
	            d566 filter_pread+0x46 (/usr/sbin/nbdkit)
	            d40b backend_pread+0xfb (/usr/sbin/nbdkit)
	           11fe7 protocol_recv_request_send_reply+0x987 (/usr/sbin/nbdkit)
	           12fc3 handle_single_connection+0x443 (/usr/sbin/nbdkit)
	           14275 start_thread+0x65 (/usr/sbin/nbdkit)
	           8cef7 start_thread+0x2f7 (/usr/lib64/libc.so.6)
	          11307c clone3+0x2c (/usr/lib64/libc.so.6)



libc syscalls and vdso code

There looks to be cases where the the samples where taken right after
entering the kernel (maybe an effect of the sampling skid). In the
sorted output of qemu_fp_perf_unknowns.out see following entries that
kind of look like kernel addresses, but don't actually seem to map to
actual kernel functions:

    760 	ffffffffb0000000 [unknown] ([unknown])
      6 	ffffffffb0000290 [unknown] ([unknown])
   1848 	ffffffffb00002c0 [unknown] ([unknown])
   8799 	ffffffffb00002c8 [unknown] ([unknown])
     14 	ffffffffb0000b60 [unknown] ([unknown])
    101 	ffffffffb0000e70 [unknown] ([unknown])

Searching through the "perf script" output see ffffffffb0000000 in entries like:

nbdkit  2274  8729.808538:      99918 cycles:u: 
	ffffffffb0000000 [unknown] ([unknown])
	             a49 __vdso_clock_gettime+0xc9 ([vdso])
	    7f576c001280 [unknown] ([unknown])
	    7f5771330750 [unknown] ([unknown])

and 

nbdkit  2274  8781.579889:      48946 cycles:u: 
	ffffffffb0000000 [unknown] ([unknown])
	          105a9d __poll+0x4d (/usr/lib64/libc.so.6)
	           58ee8 Curl_poll+0xa8 (/usr/lib64/libcurl.so.4.8.0)
	           4dc2a multi_wait.part.0+0x4aa (/usr/lib64/libcurl.so.4.8.0)
	    7f5771330668 [unknown] ([unknown])
	    7f576c001280 [unknown] ([unknown])

There are cases __vdso_clock_gettime where the FP unwinder was able to
unwind things successfully from the same address user-space address:

nbdkit  2274  8881.165087:     421448 cycles:u: 
	ffffffffb0000000 [unknown] ([unknown])
	             a49 __vdso_clock_gettime+0xc9 ([vdso])
	           d6b0d clock_gettime@@GLIBC_2.17+0x1d (/usr/lib64/libc.so.6)
	           69ff9 allocate_conn.lto_priv.0+0xc9 (/usr/lib64/libcurl.so.4.8.0)
	           5295d multi_runsingle+0x69d (/usr/lib64/libcurl.so.4.8.0)
	           5572e curl_multi_perform+0xde (/usr/lib64/libcurl.so.4.8.0)
	           290bb curl_easy_perform+0x19b (/usr/lib64/libcurl.so.4.8.0)
	            4b71 curl_pread+0xc1 (/usr/lib64/nbdkit/plugins/nbdkit-curl-plugin.so)
	            9e46 plugin_pread+0x66 (/usr/sbin/nbdkit)
	            d40b backend_pread+0xfb (/usr/sbin/nbdkit)
	            d566 filter_pread+0x46 (/usr/sbin/nbdkit)
	            d40b backend_pread+0xfb (/usr/sbin/nbdkit)
	           11fe7 protocol_recv_request_send_reply+0x987 (/usr/sbin/nbdkit)
	           12fc3 handle_single_connection+0x443 (/usr/sbin/nbdkit)
	           14275 start_thread+0x65 (/usr/sbin/nbdkit)
	           8cef7 start_thread+0x2f7 (/usr/lib64/libc.so.6)
	          11307c clone3+0x2c (/usr/lib64/libc.so.6)

nbdkit  2274  8725.256890:     102341 cycles:u: 
	ffffffffb00002c0 [unknown] ([unknown])
	          105a9d __poll+0x4d (/usr/lib64/libc.so.6)
	           58ee8 Curl_poll+0xa8 (/usr/lib64/libcurl.so.4.8.0)
	           58fe7 Curl_socket_check+0x97 (/usr/lib64/libcurl.so.4.8.0)
	           6e99a Curl_readwrite+0x7ea (/usr/lib64/libcurl.so.4.8.0)
	           52ceb multi_runsingle+0xa2b (/usr/lib64/libcurl.so.4.8.0)
	           5572e curl_multi_perform+0xde (/usr/lib64/libcurl.so.4.8.0)
	           290bb curl_easy_perform+0x19b (/usr/lib64/libcurl.so.4.8.0)
	            4b71 curl_pread+0xc1 (/usr/lib64/nbdkit/plugins/nbdkit-curl-plugin.so)
	            9e46 plugin_pread+0x66 (/usr/sbin/nbdkit)
	            d40b backend_pread+0xfb (/usr/sbin/nbdkit)
	            d566 filter_pread+0x46 (/usr/sbin/nbdkit)
	            d40b backend_pread+0xfb (/usr/sbin/nbdkit)
	           11fe7 protocol_recv_request_send_reply+0x987 (/usr/sbin/nbdkit)
	           12fc3 handle_single_connection+0x443 (/usr/sbin/nbdkit)
	           14275 start_thread+0x65 (/usr/sbin/nbdkit)
	           8cef7 start_thread+0x2f7 (/usr/lib64/libc.so.6)
	          11307c clone3+0x2c (/usr/lib64/libc.so.6)


libc exit() function not listed in backtrace

See samples like the following:

nbdkit  2270 11971.254333:     436407 cycles:u: 
	           9d735 cfree.5+0x25 (/usr/lib64/libc.so.6)
	          15cf18 OSSL_DECODER_free+0x48 (/usr/lib64/libcrypto.so.3.0.8)
	          1dd8bc impl_free+0x1c (/usr/lib64/libcrypto.so.3.0.8)
	          205b09 OPENSSL_sk_pop_free+0x39 (/usr/lib64/libcrypto.so.3.0.8)
	          1e6c3f alg_cleanup+0x2f (/usr/lib64/libcrypto.so.3.0.8)
	          1c0f54 sa_doall.lto_priv.0+0x134 (/usr/lib64/libcrypto.so.3.0.8)
	          1e6d21 ossl_method_store_free+0x31 (/usr/lib64/libcrypto.so.3.0.8)
	          1b9f0b CRYPTO_free_ex_data+0x15b (/usr/lib64/libcrypto.so.3.0.8)
	          1ba0a6 context_deinit.part.0+0x66 (/usr/lib64/libcrypto.so.3.0.8)
	          1bcc52 OPENSSL_cleanup+0x212 (/usr/lib64/libcrypto.so.3.0.8)
	           40246 __run_exit_handlers+0x166 (/usr/lib64/libc.so.6)
	           4038e [unknown] (/usr/lib64/libc.so.6)
	           27b51 __libc_start_call_main+0x81 (/usr/lib64/libc.so.6)
	           27c0b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
	            94d5 _start+0x25 (/usr/sbin/nbdkit)

This looks to be more a issue with the libc exit function having the
call to __run_exit_handler as the last instruction in the region
marked as belonging to exit.  The (unused) return address on the stack
is outside the exit function.  Thus it is listed as unknown.


qemu coroutine code

The qemu code uses setjmp and longjmp operations to implement
coroutine code.  In some cases end up with backtraces like the
following where FP unwinding gets confused:

qemu-img  2271  8724.707399:     121182 cycles:u: 
	           a9cff nbd_reply_chunk_iter_receive+0xbf (/usr/bin/qemu-img)
	           ab182 nbd_client_co_preadv+0x232 (/usr/bin/qemu-img)
	           9f466 bdrv_driver_preadv+0x106 (/usr/bin/qemu-img)
	           a2828 bdrv_aligned_preadv+0x348 (/usr/bin/qemu-img)
	           a2f5c bdrv_co_preadv_part+0x17c (/usr/bin/qemu-img)
	           c2f87 qcow2_co_preadv_task_entry+0x177 (/usr/bin/qemu-img)
	           c3ab8 qcow2_co_preadv_part+0x278 (/usr/bin/qemu-img)
	           a2828 bdrv_aligned_preadv+0x348 (/usr/bin/qemu-img)
	           a2f5c bdrv_co_preadv_part+0x17c (/usr/bin/qemu-img)
	           93313 blk_co_do_preadv_part+0x83 (/usr/bin/qemu-img)
	           9349c blk_co_pread+0x5c (/usr/bin/qemu-img)
	           59a4e convert_co_do_copy+0x47e (/usr/bin/qemu-img)
	          15da82 coroutine_trampoline+0x52 (/usr/bin/qemu-img)
	           55090 __start_context+0x0 (/usr/lib64/libc.so.6)
	    560a5ee22337 [unknown] ([unknown])
	               0 [unknown] ([unknown])

This looks to be one thing that the dwarf unwinding handles better it
doesn't attempt to unwind past that and don't see any "unknown"
entries on the qemu_dwarf.svg under the coroutine_trampoline.

Comment 2 William Cohen 2023-03-02 17:15:59 UTC

Created attachment 1947526 [details]
New flamegraphs on rawhide machine.

Comment 3 William Cohen 2023-03-22 20:08:55 UTC

Looking at the entry_SYSCALL_64 code
(https://elixir.bootlin.com/linux/v6.3-rc3/source/arch/x86/entry/entry_64.S#L87)
there appears to be a window where the pt_regs might not hold the
current values of the registers until after the PUSH_AND_CLEAR_REGS.
Is it possible that the perf sampling occurs in the early part of
entry_SYSCALL_64 and the perf sampling code observes the the sample is
currently in the kernel assumes that the pt_regs are valid when they
are not actually?  There is a comment about "IRQ are off" in
entry_SYSCALL_64.  Is perf using NMI?

Comment 4 William Cohen 2023-03-23 17:58:32 UTC

I have though about other possible causes of the problem and
considered whether this might be a difference between perf collecting
data in a virtualized environment vs non-virtualized. The initial
experiments were on a qemu guest vm and a fedora 37 host. Running the
experiments on the same hardware (a Lenovo P53 laptop and Intel(R)
Xeon(R) E-2276M CPU @ 2.80GHz processor) with updated RPMs:

On the fedora 37 host:
$ rpm -q kernel qemu
kernel-6.2.7-200.fc37.x86_64
qemu-7.0.0-14.fc37.x86_64

On the rawhide guest:

kernel-6.3.0-0.rc3.20230322gita1effab7a3a3.31.fc39.x86_64

perf record --call-graph=fp --output=guest_qemu_fp_perf.data  nbdkit -r curl https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img --filter=multi-conn multi-conn-mode=unsafe timeout=2000 --run ' qemu-img convert -W -m 16 -p -f qcow2 $uri -O raw jammy-server-cloudimg-amd64.img.raw '
perf script --input=guest_qemu_fp_perf.data | stackcollapse-perf.pl > out.guest_qemu_fp-folded
flamegraph.pl out.guest_qemu_fp-folded > guest_qemu_fp.svg
perf report --input guest_qemu_fp_perf.data  > guest_qemu_fp_perf.out


Tried the same experiment on a physical machine, a Intel(R) Xeon(R)
CPU E7540, running rawhide kernel

kernel-6.3.0-0.rc3.20230322gita1effab7a3a3.31.fc39.x86_64

perf record --call-graph=fp --output=raw_qemu_fp_perf.data  nbdkit -r curl https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img --filter=multi-conn multi-conn-mode=unsafe timeout=2000 --run ' qemu-img convert -W -m 16 -p -f qcow2 $uri -O raw jammy-server-cloudimg-amd64.img.raw '
perf script --input=raw_qemu_fp_perf.data | stackcollapse-perf.pl > out.raw_qemu_fp-folded
flamegraph.pl out.raw_qemu_fp-folded > raw_qemu_fp.svg
perf report --input raw_qemu_fp_perf.data  > raw_qemu_fp_perf.out


The data taken on the non-virtualized machine has saner callback
stacks where __libc_recv and __poll are involved as shown in the
svg files in 20230323_data.tar.gz

Comment 6 Fedora Release Engineering 2023-08-16 08:14:47 UTC

This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.

Comment 7 Aoife Moloney 2024-11-08 10:48:05 UTC

This message is a reminder that Fedora Linux 39 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 39 on 2024-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '39'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version. Note that the version field may be hidden.
Click the "Show advanced fields" button if you do not see it.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 39 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 8 Fedora Admin user for bugzilla script actions 2024-11-27 01:49:31 UTC

This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 9 Aoife Moloney 2024-11-27 21:07:19 UTC

Fedora Linux 39 entered end-of-life (EOL) status on 2024-11-26.

Fedora Linux 39 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of Fedora Linux
please feel free to reopen this bug against that version. Note that the version
field may be hidden. Click the "Show advanced fields" button if you do not see
the version field.

If you are unable to reopen this bug, please file a new report against an
active release.

Thank you for reporting this bug and we are sorry it could not be fixed.