Bug 2172568
| Summary: | Perf has difficulty producing accurate function backtraces for nbdkit | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | William Cohen <wcohen> | ||||||
| Component: | kernel-tools | Assignee: | Justin M. Forbes <jforbes> | ||||||
| Status: | NEW --- | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 39 | CC: | acaringi, fche, jforbes, jwboyer, lgoncalv, mjw, pbrobinson, ptalbert | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | Type: | Bug | |||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
William Cohen
2023-02-22 15:05:17 UTC
Most of the the unknown entries were caused by missing debuginfo on
the machine. I reviewed which binaries had missing debuginfo and
installed them and reran the experiments. That eliminated much of the
[unknown] entries in the data. However, there are still some areas
where the data is lacking. The updated flamegraphs are in
20230301.tar.gz added to this bug.
Summary of the places where perf fp had issues:
-PLT entries
-libc syscalls and vdso code (also saw this issue on dwarf unwinding)
-libc exit() function not listed in the backtrace
-qemu coroutine code
PLT entries
I started to look up which addresses were problematic I did the
following to group things by address:
grep unknown qemu_fp_perf_script.out |sort | uniq -c |sort -k 2|more > qemu_fp_perf_unknowns.out
when looking through the data would see sections like the following in the output:
93 12070 [unknown] (/usr/lib64/libcurl.so.4.8.0)
51 120a0 [unknown] (/usr/lib64/libcurl.so.4.8.0)
5 120a4 [unknown] (/usr/lib64/libcurl.so.4.8.0)
7 120d0 [unknown] (/usr/lib64/libcurl.so.4.8.0)
3 120d4 [unknown] (/usr/lib64/libcurl.so.4.8.0)
1 12100 [unknown] (/usr/lib64/libcurl.so.4.8.0)
After verifying that libcurl debuginfo was install found that these addresses were mapping back to PLT entries like:
0000000000012070 <SSL_get_error@plt>:
12070: f3 0f 1e fa endbr64
12074: ff 25 26 6f 09 00 jmp *0x96f26(%rip) # a8fa0 <SSL_get_error.0>
1207a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
Looking at the original "perf script" output would see things like:
nbdkit 2274 10327.200181: 264642 cycles:u:
12070 [unknown] (/usr/lib64/libcurl.so.4.8.0)
75c1e ssl_cf_recv.lto_priv.0+0x2e (/usr/lib64/libcurl.so.4.8.0)
5b8ae Curl_read+0x5e (/usr/lib64/libcurl.so.4.8.0)
6e99a Curl_readwrite+0x7ea (/usr/lib64/libcurl.so.4.8.0)
52ceb multi_runsingle+0xa2b (/usr/lib64/libcurl.so.4.8.0)
5572e curl_multi_perform+0xde (/usr/lib64/libcurl.so.4.8.0)
290bb curl_easy_perform+0x19b (/usr/lib64/libcurl.so.4.8.0)
4b71 curl_pread+0xc1 (/usr/lib64/nbdkit/plugins/nbdkit-curl-plugin.so)
9e46 plugin_pread+0x66 (/usr/sbin/nbdkit)
d40b backend_pread+0xfb (/usr/sbin/nbdkit)
d566 filter_pread+0x46 (/usr/sbin/nbdkit)
d40b backend_pread+0xfb (/usr/sbin/nbdkit)
11fe7 protocol_recv_request_send_reply+0x987 (/usr/sbin/nbdkit)
12fc3 handle_single_connection+0x443 (/usr/sbin/nbdkit)
14275 start_thread+0x65 (/usr/sbin/nbdkit)
8cef7 start_thread+0x2f7 (/usr/lib64/libc.so.6)
11307c clone3+0x2c (/usr/lib64/libc.so.6)
libc syscalls and vdso code
There looks to be cases where the the samples where taken right after
entering the kernel (maybe an effect of the sampling skid). In the
sorted output of qemu_fp_perf_unknowns.out see following entries that
kind of look like kernel addresses, but don't actually seem to map to
actual kernel functions:
760 ffffffffb0000000 [unknown] ([unknown])
6 ffffffffb0000290 [unknown] ([unknown])
1848 ffffffffb00002c0 [unknown] ([unknown])
8799 ffffffffb00002c8 [unknown] ([unknown])
14 ffffffffb0000b60 [unknown] ([unknown])
101 ffffffffb0000e70 [unknown] ([unknown])
Searching through the "perf script" output see ffffffffb0000000 in entries like:
nbdkit 2274 8729.808538: 99918 cycles:u:
ffffffffb0000000 [unknown] ([unknown])
a49 __vdso_clock_gettime+0xc9 ([vdso])
7f576c001280 [unknown] ([unknown])
7f5771330750 [unknown] ([unknown])
and
nbdkit 2274 8781.579889: 48946 cycles:u:
ffffffffb0000000 [unknown] ([unknown])
105a9d __poll+0x4d (/usr/lib64/libc.so.6)
58ee8 Curl_poll+0xa8 (/usr/lib64/libcurl.so.4.8.0)
4dc2a multi_wait.part.0+0x4aa (/usr/lib64/libcurl.so.4.8.0)
7f5771330668 [unknown] ([unknown])
7f576c001280 [unknown] ([unknown])
There are cases __vdso_clock_gettime where the FP unwinder was able to
unwind things successfully from the same address user-space address:
nbdkit 2274 8881.165087: 421448 cycles:u:
ffffffffb0000000 [unknown] ([unknown])
a49 __vdso_clock_gettime+0xc9 ([vdso])
d6b0d clock_gettime@@GLIBC_2.17+0x1d (/usr/lib64/libc.so.6)
69ff9 allocate_conn.lto_priv.0+0xc9 (/usr/lib64/libcurl.so.4.8.0)
5295d multi_runsingle+0x69d (/usr/lib64/libcurl.so.4.8.0)
5572e curl_multi_perform+0xde (/usr/lib64/libcurl.so.4.8.0)
290bb curl_easy_perform+0x19b (/usr/lib64/libcurl.so.4.8.0)
4b71 curl_pread+0xc1 (/usr/lib64/nbdkit/plugins/nbdkit-curl-plugin.so)
9e46 plugin_pread+0x66 (/usr/sbin/nbdkit)
d40b backend_pread+0xfb (/usr/sbin/nbdkit)
d566 filter_pread+0x46 (/usr/sbin/nbdkit)
d40b backend_pread+0xfb (/usr/sbin/nbdkit)
11fe7 protocol_recv_request_send_reply+0x987 (/usr/sbin/nbdkit)
12fc3 handle_single_connection+0x443 (/usr/sbin/nbdkit)
14275 start_thread+0x65 (/usr/sbin/nbdkit)
8cef7 start_thread+0x2f7 (/usr/lib64/libc.so.6)
11307c clone3+0x2c (/usr/lib64/libc.so.6)
nbdkit 2274 8725.256890: 102341 cycles:u:
ffffffffb00002c0 [unknown] ([unknown])
105a9d __poll+0x4d (/usr/lib64/libc.so.6)
58ee8 Curl_poll+0xa8 (/usr/lib64/libcurl.so.4.8.0)
58fe7 Curl_socket_check+0x97 (/usr/lib64/libcurl.so.4.8.0)
6e99a Curl_readwrite+0x7ea (/usr/lib64/libcurl.so.4.8.0)
52ceb multi_runsingle+0xa2b (/usr/lib64/libcurl.so.4.8.0)
5572e curl_multi_perform+0xde (/usr/lib64/libcurl.so.4.8.0)
290bb curl_easy_perform+0x19b (/usr/lib64/libcurl.so.4.8.0)
4b71 curl_pread+0xc1 (/usr/lib64/nbdkit/plugins/nbdkit-curl-plugin.so)
9e46 plugin_pread+0x66 (/usr/sbin/nbdkit)
d40b backend_pread+0xfb (/usr/sbin/nbdkit)
d566 filter_pread+0x46 (/usr/sbin/nbdkit)
d40b backend_pread+0xfb (/usr/sbin/nbdkit)
11fe7 protocol_recv_request_send_reply+0x987 (/usr/sbin/nbdkit)
12fc3 handle_single_connection+0x443 (/usr/sbin/nbdkit)
14275 start_thread+0x65 (/usr/sbin/nbdkit)
8cef7 start_thread+0x2f7 (/usr/lib64/libc.so.6)
11307c clone3+0x2c (/usr/lib64/libc.so.6)
libc exit() function not listed in backtrace
See samples like the following:
nbdkit 2270 11971.254333: 436407 cycles:u:
9d735 cfree.5+0x25 (/usr/lib64/libc.so.6)
15cf18 OSSL_DECODER_free+0x48 (/usr/lib64/libcrypto.so.3.0.8)
1dd8bc impl_free+0x1c (/usr/lib64/libcrypto.so.3.0.8)
205b09 OPENSSL_sk_pop_free+0x39 (/usr/lib64/libcrypto.so.3.0.8)
1e6c3f alg_cleanup+0x2f (/usr/lib64/libcrypto.so.3.0.8)
1c0f54 sa_doall.lto_priv.0+0x134 (/usr/lib64/libcrypto.so.3.0.8)
1e6d21 ossl_method_store_free+0x31 (/usr/lib64/libcrypto.so.3.0.8)
1b9f0b CRYPTO_free_ex_data+0x15b (/usr/lib64/libcrypto.so.3.0.8)
1ba0a6 context_deinit.part.0+0x66 (/usr/lib64/libcrypto.so.3.0.8)
1bcc52 OPENSSL_cleanup+0x212 (/usr/lib64/libcrypto.so.3.0.8)
40246 __run_exit_handlers+0x166 (/usr/lib64/libc.so.6)
4038e [unknown] (/usr/lib64/libc.so.6)
27b51 __libc_start_call_main+0x81 (/usr/lib64/libc.so.6)
27c0b __libc_start_main@@GLIBC_2.34+0x8b (/usr/lib64/libc.so.6)
94d5 _start+0x25 (/usr/sbin/nbdkit)
This looks to be more a issue with the libc exit function having the
call to __run_exit_handler as the last instruction in the region
marked as belonging to exit. The (unused) return address on the stack
is outside the exit function. Thus it is listed as unknown.
qemu coroutine code
The qemu code uses setjmp and longjmp operations to implement
coroutine code. In some cases end up with backtraces like the
following where FP unwinding gets confused:
qemu-img 2271 8724.707399: 121182 cycles:u:
a9cff nbd_reply_chunk_iter_receive+0xbf (/usr/bin/qemu-img)
ab182 nbd_client_co_preadv+0x232 (/usr/bin/qemu-img)
9f466 bdrv_driver_preadv+0x106 (/usr/bin/qemu-img)
a2828 bdrv_aligned_preadv+0x348 (/usr/bin/qemu-img)
a2f5c bdrv_co_preadv_part+0x17c (/usr/bin/qemu-img)
c2f87 qcow2_co_preadv_task_entry+0x177 (/usr/bin/qemu-img)
c3ab8 qcow2_co_preadv_part+0x278 (/usr/bin/qemu-img)
a2828 bdrv_aligned_preadv+0x348 (/usr/bin/qemu-img)
a2f5c bdrv_co_preadv_part+0x17c (/usr/bin/qemu-img)
93313 blk_co_do_preadv_part+0x83 (/usr/bin/qemu-img)
9349c blk_co_pread+0x5c (/usr/bin/qemu-img)
59a4e convert_co_do_copy+0x47e (/usr/bin/qemu-img)
15da82 coroutine_trampoline+0x52 (/usr/bin/qemu-img)
55090 __start_context+0x0 (/usr/lib64/libc.so.6)
560a5ee22337 [unknown] ([unknown])
0 [unknown] ([unknown])
This looks to be one thing that the dwarf unwinding handles better it
doesn't attempt to unwind past that and don't see any "unknown"
entries on the qemu_dwarf.svg under the coroutine_trampoline.
Created attachment 1947526 [details]
New flamegraphs on rawhide machine.
Looking at the entry_SYSCALL_64 code (https://elixir.bootlin.com/linux/v6.3-rc3/source/arch/x86/entry/entry_64.S#L87) there appears to be a window where the pt_regs might not hold the current values of the registers until after the PUSH_AND_CLEAR_REGS. Is it possible that the perf sampling occurs in the early part of entry_SYSCALL_64 and the perf sampling code observes the the sample is currently in the kernel assumes that the pt_regs are valid when they are not actually? There is a comment about "IRQ are off" in entry_SYSCALL_64. Is perf using NMI? I have though about other possible causes of the problem and considered whether this might be a difference between perf collecting data in a virtualized environment vs non-virtualized. The initial experiments were on a qemu guest vm and a fedora 37 host. Running the experiments on the same hardware (a Lenovo P53 laptop and Intel(R) Xeon(R) E-2276M CPU @ 2.80GHz processor) with updated RPMs: On the fedora 37 host: $ rpm -q kernel qemu kernel-6.2.7-200.fc37.x86_64 qemu-7.0.0-14.fc37.x86_64 On the rawhide guest: kernel-6.3.0-0.rc3.20230322gita1effab7a3a3.31.fc39.x86_64 perf record --call-graph=fp --output=guest_qemu_fp_perf.data nbdkit -r curl https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img --filter=multi-conn multi-conn-mode=unsafe timeout=2000 --run ' qemu-img convert -W -m 16 -p -f qcow2 $uri -O raw jammy-server-cloudimg-amd64.img.raw ' perf script --input=guest_qemu_fp_perf.data | stackcollapse-perf.pl > out.guest_qemu_fp-folded flamegraph.pl out.guest_qemu_fp-folded > guest_qemu_fp.svg perf report --input guest_qemu_fp_perf.data > guest_qemu_fp_perf.out Tried the same experiment on a physical machine, a Intel(R) Xeon(R) CPU E7540, running rawhide kernel kernel-6.3.0-0.rc3.20230322gita1effab7a3a3.31.fc39.x86_64 perf record --call-graph=fp --output=raw_qemu_fp_perf.data nbdkit -r curl https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img --filter=multi-conn multi-conn-mode=unsafe timeout=2000 --run ' qemu-img convert -W -m 16 -p -f qcow2 $uri -O raw jammy-server-cloudimg-amd64.img.raw ' perf script --input=raw_qemu_fp_perf.data | stackcollapse-perf.pl > out.raw_qemu_fp-folded flamegraph.pl out.raw_qemu_fp-folded > raw_qemu_fp.svg perf report --input raw_qemu_fp_perf.data > raw_qemu_fp_perf.out The data taken on the non-virtualized machine has saner callback stacks where __libc_recv and __poll are involved as shown in the svg files in 20230323_data.tar.gz This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle. Changing version to 39. |