Bug 1886599

Summary: "perf report" dies in 1GB RAM
Product: [Fedora] Fedora Reporter: John Reiser <jreiser>
Component: kernel-toolsAssignee: Justin M. Forbes <jforbes>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: acaringi, jeremy, jforbes, jwboyer, lgoncalv, pbrobinson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-30 18:49:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
oom-killer info from syslog none

Description John Reiser 2020-10-08 21:05:09 UTC
Description of problem: "perf report" suffers oom-killer while running under Fedora-Minimal-33_Beta-1.3.aarch64.raw.xz on RaspberryPi 3 model B+ in 64-bit mode (aarch64) with 1GB RAM. /usr/bin/top reports 935MiB Mem, 467MiB Swap.
The input perf.data is 160MB from "perf record -g" lasting 160 seconds.

Inspection using
   strace --e trace=brk,open,openat,close,mmap,munmap,mprotect perf report
shows several related problems:

1) No feature to limit memory allocation, and adapt processing accordingly.  There should be a command-line option such as "--max-memory=N" where N is a number in bytes with suffix K, M, G indicating (N<<10), (N<<20), or (N<<30).  The default value should be (0.75 * /proc/meminfo["MemTotal"]).  Most of the brk() allocation seems to be decoded tracebacks that are awaiting a final sort(); but external sorting using temporary files has been well-researched for decades, see vol.3 of Knuth.

2. File descriptor leak: The various modules (main programs and shared libraries) that are reported by "perf buildid-list" are openat() and mmap() in order, up to a file descriptor bound of 33 or 34, but "perf report" closes only 1 or 2 of them despite not needing the fd after mmap().

3. Broken mmap() cache.  All slots in the cache are occupied by only one mmap(), ever, except for the last slot.  The last slot is completely cycled on every access, from empty to full then back to empty, and not even peep-hole optimized (retain the fd+mmap for consecutive accesses to the same module; apply munmap() and close() only upon a request for a different module).  This is wasteful by a factor of 2.  For example, from the strace output:
=====
openat(AT_FDCWD, "/usr/lib64/libmodulemd.so.2.9.4", O_RDONLY) = 33
mmap(NULL, 530400, PROT_READ, MAP_PRIVATE, 33, 0) = 0xffffa40c5000
munmap(0xffffa40c5000, 530400)          = 0
close(33)                               = 0
openat(AT_FDCWD, "/usr/lib64/libmodulemd.so.2.9.4", O_RDONLY) = 33
mmap(NULL, 530400, PROT_READ, MAP_PRIVATE, 33, 0) = 0xffffa40c5000
munmap(0xffffa40c5000, 530400)          = 0
close(33)                               = 0
openat(AT_FDCWD, "/usr/lib64/libyaml-0.so.2.0.9", O_RDONLY) = 33
mmap(NULL, 135808, PROT_READ, MAP_PRIVATE, 33, 0) = 0xffffa4125000
munmap(0xffffa4125000, 135808)          = 0
close(33)                               = 0
openat(AT_FDCWD, "/usr/lib64/libyaml-0.so.2.0.9", O_RDONLY) = 33
mmap(NULL, 135808, PROT_READ, MAP_PRIVATE, 33, 0) = 0xffffa4125000
munmap(0xffffa4125000, 135808)          = 0
close(33)                               = 0
=====

4. In many cases the data cannot be collected and transported for analysis to a system with more RAM, because "perf archive" ignores the second column of "perf buildid-list".  If the that is file named with a given .buildid is not found in the .debug directories, then "perf archive" complains and omits that file entirely.  Instead "perf archive" should check that the path given in the second column does have the buildid that is listed im the first column, then use the path from the second column as the source for that module in the archive.  That path might not have as many symbols as a .debug file, but usually there are some symbols; and the motivated user can disassemble to gain some knowledge even from just a numeric address.


Version-Release number of selected component (if applicable):
perf-5.8.11-300.fc33.aarch64


How reproducible: every time


Steps to Reproduce:
1. Run a system-wide "perf record -g" until 160MB of data has been collected in file perf.data
2. Attempt to generated output using "perf report" on a system with only 1GB RAM.
3.

Actual results: Death by oom-killer, an no useful output.


Expected results: Useful and complete output from "perf report".


Additional info:
$ perf buildid-list | wc
     81     162    5972

$ perf buildid-list | sed 10q
5ab8bd57a0dfe5043b533fa0f14bb2858f8f7104 [kernel.kallsyms]
63653a0ac1d1cd057523b982bfc379a0b761fa98 /usr/lib/systemd/systemd
addd1d0fa6a711e0dde947e3eb48a9713dc336c7 /usr/lib64/libnl-3.so.200.26.0
0582ebd57538a155cfca5c1dd78412461000b886 /usr/lib64/libnl-route-3.so.200.26.0
01b3afe3607d7578f0843b97f7a75393bb797d52 /usr/lib64/libudev.so.1.6.18
d1db9e285cb7c68f98d2becbcf4a74cf4dc3505a /usr/lib64/libffi.so.6.0.2
8891095747b5ef5946c6bbc0d48d3e69b4d5701f /usr/lib64/libz.so.1.2.11
a2fe285121b1202b34262e7e16c9c164fd7f9637 /usr/lib64/libpthread-2.32.so
45cfdce4c600e7547868615334f49443c5d77fa8 /usr/lib64/libzstd.so.1.4.5
311449c5479e707d4346c1a03b70d67d00e0c9ed /usr/lib64/libcrypto.so.1.1.1g

$ perf report --header-only
# ========
# captured on    : Sat Oct  3 16:30:46 2020
# header version : 1
# data offset    : 440
# data size      : 165914112
# feat offset    : 165914552
# hostname : localhost.localdomain
# os release : 5.8.12-300.fc33.aarch64
# perf version : 5.8.11
# arch : aarch64
# nrcpus online : 4
# nrcpus avail : 4
# cpuid : 0x00000000410fd030
# total memory : 958016 kB
# cmdline : /usr/bin/perf record -g 
# event : name = cycles, , id = { 4, 5, 6, 7 }, size = 120, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CALLCHAIN|ID|CPU|PERIOD, read_format = ID, disa>
# event : name = dummy:HG, , id = { 8, 9, 10, 11 }, type = 1, size = 120, config = 0x9, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CALLCHAIN|ID|CPU|PE>
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: software = 1, uprobe = 7, breakpoint = 5, tracepoint = 2, kprobe = 6, armv8_cortex_a53 = 8
# CACHE info available, use -I to display
# time of first sample : 739.524839
# time of last sample : 899.941060
# sample duration : 160416.221 ms
# MEM_TOPOLOGY info available, use -I to display
# bpf_prog_info 61: bpf_prog_a42d275341448247 addr 0xffff800000008018 size 472
# bpf_prog_info 62: bpf_prog_6deef7357e7b4530 addr 0xffff80000000ae7c size 104
# bpf_prog_info 63: bpf_prog_6deef7357e7b4530 addr 0xffff80000002ca74 size 104
# bpf_prog_info 64: bpf_prog_438c5618576e5b0c addr 0xffff80000002e32c size 524
# bpf_prog_info 65: bpf_prog_6deef7357e7b4530 addr 0xffff800000030c88 size 104
# bpf_prog_info 66: bpf_prog_6deef7357e7b4530 addr 0xffff800000054c44 size 104
# bpf_prog_info 67: bpf_prog_6ceaf4d6095cca93 addr 0xffff80000000c6f0 size 652
# bpf_prog_info 68: bpf_prog_6deef7357e7b4530 addr 0xffff8000000622b0 size 104
# bpf_prog_info 69: bpf_prog_6deef7357e7b4530 addr 0xffff8000000649e8 size 104
# bpf_prog_info 70: bpf_prog_2705a24f44b96941 addr 0xffff800000066824 size 444
# bpf_prog_info 73: bpf_prog_ee0e253c78993a24 addr 0xffff8000000bc4e8 size 408
# bpf_prog_info 74: bpf_prog_1ab2530658b0b2d1 addr 0xffff8000000beda8 size 468
# bpf_prog_info 75: bpf_prog_6deef7357e7b4530 addr 0xffff8000000c0334 size 104
# bpf_prog_info 76: bpf_prog_6deef7357e7b4530 addr 0xffff8000000fad50 size 104
# bpf_prog_info 80: bpf_prog_ee0e253c78993a24 addr 0xffff8000000b0290 size 408
# bpf_prog_info 81: bpf_prog_6deef7357e7b4530 addr 0xffff80000006cdd8 size 104
# bpf_prog_info 82: bpf_prog_6deef7357e7b4530 addr 0xffff8000000fc234 size 104
# missing features: TRACING_DATA CPUDESC BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS 
# ========

Comment 1 John Reiser 2020-10-08 21:15:09 UTC
Created attachment 1720068 [details]
oom-killer info from syslog

Comment 2 Jarod Wilson 2020-10-09 13:41:54 UTC
kernel-utils isn't a package that has existed since circa RHEL 4.

Comment 3 Fedora Admin user for bugzilla script actions 2021-05-19 00:03:50 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 4 Fedora Admin user for bugzilla script actions 2021-05-20 00:04:11 UTC
This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component.

Comment 5 Ben Cotton 2021-11-04 13:59:14 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 6 Ben Cotton 2021-11-04 14:28:32 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Ben Cotton 2021-11-04 15:26:14 UTC
This message is a reminder that Fedora 33 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 33 on 2021-11-30.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '33'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 33 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Ben Cotton 2021-11-30 18:49:06 UTC
Fedora 33 changed to end-of-life (EOL) status on 2021-11-30. Fedora 33 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.