Bug 2076978

Summary: 'perf top' segfaults
Product: [Fedora] Fedora Reporter: Dr. David Alan Gilbert <dgilbert>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 36CC: acaringi, adscvr, airlied, alciregi, bskeggs, geraldo.simiao.kutz, hdegoede, hpa, jarodwilson, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, omosnace, ptalbert, rjones, rob, steved, ykaul
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-5.17.11-300.fc36 kernel-5.17.11-200.fc35 kernel-5.17.11-100.fc34 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-28 01:15:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dr. David Alan Gilbert 2022-04-20 10:56:03 UTC
1. Please describe the problem:
sudo perf top   segfaults in what looks like an endless recursion

2. What is the Version-Release number of the kernel:
perf-5.17.0-300.fc36.x86_64
kernel-5.17.2-300.fc36.x86_64
libbpf-0.5.0-2.fc36.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
Im pretty sure it was OK in f35

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
  sudo perf top



5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
not sure

6. Are you running any modules that not shipped with directly Fedora's kernel?:
no

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Here's some debug from coredump debug
#0  0x00007f2d64893d7c in btf__get_from_id (id=51, btf=btf@entry=0x7ffd4b90b020) at btf.c:1410
        res = <optimized out>
        err = <optimized out>

1410		*btf = NULL;
1411		res = btf__load_from_kernel_by_id(id);
1412		err = libbpf_get_error(res);

#1  0x00005587fadda8a4 in btf__load_from_kernel_by_id (id=<optimized out>) at util/bpf-event.c:30
        btf = 0x0
        err = <optimized out>

(gdb) list
25	struct btf * __weak  (__u32 id)
26	{
27	       struct btf *btf;
28	#pragma GCC diagnostic push
29	#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
30	       int err = btf__get_from_id(id, &btf);
31	#pragma GCC diagnostic pop
32	
33	       return err ? ERR_PTR(err) : btf;
34	}

#2  0x00007f2d64893d8d in btf__get_from_id (id=<optimized out>, btf=btf@entry=0x7ffd4b90b070) at btf.c:1411
        res = <optimized out>
        err = <optimized out>
#3  0x00005587fadda8a4 in btf__load_from_kernel_by_id (id=<optimized out>) at util/bpf-event.c:30
        btf = 0x0
        err = <optimized out>

it looks like it's got stuck in an infinite recursion.
I'm suspicious this is something to do with the '__weak' in the decl of btf__load_from_kernel_by_id  because my looking at the source wasn't finding that version of that function.

Comment 1 Richard W.M. Jones 2022-04-20 11:03:39 UTC
I am able to reproduce this with the unusual combination of:
perf-5.17.0-300.fc36.x86_64
running on kernel 5.14.0-0.rc4.20210804gitd5ad8ec3cfb5.36.fc35.x86_64
(I cannot reboot this machine because it's running large fuzzing jobs)

Stack trace:

#0  0x000055e11bb43895 in btf__load_from_kernel_by_id (id=37)
    at util/bpf-event.c:26
#1  0x00007fb25400bd8d in btf__get_from_id (id=<optimized out>, 
    btf=btf@entry=0x7ffccb43c040) at btf.c:1411
#2  0x000055e11bb438a4 in btf__load_from_kernel_by_id (id=<optimized out>)
    at util/bpf-event.c:30
#3  0x00007fb25400bd8d in btf__get_from_id (id=<optimized out>, 
    btf=btf@entry=0x7ffccb43c090) at btf.c:1411
#4  0x000055e11bb438a4 in btf__load_from_kernel_by_id (id=<optimized out>)
    at util/bpf-event.c:30
#5  0x00007fb25400bd8d in btf__get_from_id (id=<optimized out>, 
    btf=btf@entry=0x7ffccb43c0e0) at btf.c:1411
#6  0x000055e11bb438a4 in btf__load_from_kernel_by_id (id=<optimized out>)
    at util/bpf-event.c:30

repeating for at least 200,000(!) frames before I killed gdb.

Comment 2 Rob Bradford 2022-04-25 14:04:06 UTC
I was able to reproduce (same back trace) the issue with the versions in Fedora 36 of perf and the kernel. Upgrading to the Rawhide versions resolve the issue:

kernel-5.18.0-0.rc3.20220422gitd569e86915b7f2f.31.fc37.x86_64
perf-5.18.0-0.rc3.git0.1.fc37.x86_64

When running on F36 the issues is not limited to perf top, but perf record too.

Comment 3 Justin M. Forbes 2022-05-12 15:01:04 UTC
*** Bug 2078409 has been marked as a duplicate of this bug. ***

Comment 4 Justin M. Forbes 2022-05-25 20:42:08 UTC
*** Bug 2086870 has been marked as a duplicate of this bug. ***

Comment 5 Fedora Update System 2022-05-25 20:42:35 UTC
FEDORA-2022-b2cde267d9 has been submitted as an update to Fedora 35. https://bodhi.fedoraproject.org/updates/FEDORA-2022-b2cde267d9

Comment 6 Fedora Update System 2022-05-25 20:42:39 UTC
FEDORA-2022-014c3a24d9 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2022-014c3a24d9

Comment 7 Fedora Update System 2022-05-25 20:42:43 UTC
FEDORA-2022-8095b23575 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-8095b23575

Comment 8 Fedora Update System 2022-05-26 02:33:05 UTC
FEDORA-2022-8095b23575 has been pushed to the Fedora 36 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-8095b23575`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-8095b23575

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 9 Fedora Update System 2022-05-26 02:33:53 UTC
FEDORA-2022-b2cde267d9 has been pushed to the Fedora 35 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-b2cde267d9`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-b2cde267d9

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 10 Fedora Update System 2022-05-26 03:21:33 UTC
FEDORA-2022-014c3a24d9 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-014c3a24d9`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-014c3a24d9

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 11 Geraldo SimiĆ£o 2022-05-26 13:02:31 UTC
This update fixed the bug, no more perf crashes.
perf-5.17.11-300.fc36.x86_64

Comment 12 Fedora Update System 2022-05-28 01:15:05 UTC
FEDORA-2022-8095b23575 has been pushed to the Fedora 36 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 13 Fedora Update System 2022-05-28 01:21:54 UTC
FEDORA-2022-b2cde267d9 has been pushed to the Fedora 35 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 14 Fedora Update System 2022-05-28 01:32:27 UTC
FEDORA-2022-014c3a24d9 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.