Bug 1677858 - powertop crash
Summary: powertop crash
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: powertop
Version: 29
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jaroslav Škarvada
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-16 12:54 UTC by Maciej Żenczykowski
Modified: 2019-11-02 03:24 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-02 03:24:07 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Maciej Żenczykowski 2019-02-16 12:54:13 UTC
powertop segfaults

# strace -ff powertop
...
openat(AT_FDCWD, "/sys/bus/platform/devices/pcspkr/power/runtime_active_time", O_RDONLY) = 123
read(123, "0\n", 8191)                  = 2
close(123)                              = 0
openat(AT_FDCWD, "/sys/bus/i2c/devices/i2c-3/power/runtime_suspended_time", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus/i2c/devices/i2c-1/power/runtime_suspended_time", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus/i2c/devices/i2c-8/power/runtime_suspended_time", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus/i2c/devices/i2c-6/power/runtime_suspended_time", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus/i2c/devices/i2c-4/power/runtime_suspended_time", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus/i2c/devices/i2c-2/power/runtime_suspended_time", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus/i2c/devices/i2c-0/power/runtime_suspended_time", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus/i2c/devices/i2c-7/power/runtime_suspended_time", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/bus/i2c/devices/0-0008/power/runtime_suspended_time", O_RDONLY) = 123
read(123, "0\n", 8191)                  = 2
close(123)                              = 0
openat(AT_FDCWD, "/sys/bus/i2c/devices/0-0008/power/runtime_active_time", O_RDONLY) = 123
read(123, "0\n", 8191)                  = 2
close(123)                              = 0
openat(AT_FDCWD, "/sys/bus/i2c/devices/i2c-5/power/runtime_suspended_time", O_RDONLY) = -1 ENOENT (No such file or directory)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++
                                                                                                     Segmentation fault (core dumped)
# cat /sys/bus/i2c/devices/i2c-5/power/runtime_suspended_time
cat: /sys/bus/i2c/devices/i2c-5/power/runtime_suspended_time: No such file or directory

# ls /sys/bus/i2c/devices/i2c-5/power/
# ls /sys/bus/i2c/devices/i2c-5/
delete_device  device/        i2c-dev/       name           new_device     power/         subsystem/     uevent         

# cat /sys/bus/i2c/devices/i2c-5/name 
NVIDIA i2c adapter 5 at 1:00.0

# lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 07)
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] (rev 31)
00:1b.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 (rev f1)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 (rev f1)
00:1c.2 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #3 (rev f1)
00:1c.4 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Z170 Chipset LPC/eSPI Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V (rev 31)
01:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 UCSI Controller (rev a1)
04:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
05:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
06:00.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
06:01.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
06:02.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
06:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
09:00.0 USB controller: Intel Corporation DSL6540 USB 3.1 Controller [Alpine Ridge]

So possibly something related to the GeForce RTX 2080 Ti card with nvidia drivers triggering some sort of bug in powertop.

Comment 1 Maciej Żenczykowski 2019-02-16 12:58:24 UTC
#gdb powertop
...
Program received signal SIGSEGV, Segmentation fault.
                                                    0x000055555559a55b in perf_event::process(void*) ()
(gdb) bt
#0  0x000055555559a55b in perf_event::process(void*) ()
#1  0x000055555559b2d1 in perf_bundle::process() ()
#2  0x0000555555574415 in process_cpu_data() ()
#3  0x000055555556e80a in one_measurement(int, int, char*) ()
#4  0x00005555555643ff in main ()
(gdb) quit
A debugging session is active.

        Inferior 1 [process 29831] will be killed.

Quit anyway? (y or n) y

Comment 2 Maciej Żenczykowski 2019-02-16 13:01:10 UTC
# powertop
...
failed to mmap with 12 (Cannot allocate memory)
                                                                                             failed to mmap with 12 (Cannot allocate memory)
Segmentation fault (core dumped)

(there's tens of these mmap failed messages)

Comment 3 Maciej Żenczykowski 2019-02-16 13:02:57 UTC
# strace -emmap powertop
mmap(NULL, 100890, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fedbc2ba000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fedbc2b8000
mmap(NULL, 29448, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fedbc2b0000
mmap(0x7fedbc2b2000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7fedbc2b2000
mmap(0x7fedbc2b4000, 4096, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x7fedbc2b4000
mmap(0x7fedbc2b6000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x5000) = 0x7fedbc2b6000
mmap(NULL, 143856, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fedbc28c000
mmap(0x7fedbc294000, 65536, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x7fedbc294000
mmap(0x7fedbc2a4000, 36864, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x7fedbc2a4000
mmap(0x7fedbc2ad000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x20000) = 0x7fedbc2ad000
mmap(NULL, 258488, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fedbc24c000
mmap(0x7fedbc256000, 172032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa000) = 0x7fedbc256000
mmap(0x7fedbc280000, 36864, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x34000) = 0x7fedbc280000
mmap(0x7fedbc28a000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3d000) = 0x7fedbc28a000
mmap(NULL, 190848, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fedbc21d000
mmap(0x7fedbc22b000, 61440, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe000) = 0x7fedbc22b000
mmap(0x7fedbc23a000, 53248, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1d000) = 0x7fedbc23a000
mmap(0x7fedbc247000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x29000) = 0x7fedbc247000
mmap(NULL, 180616, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fedbc1f0000
mmap(0x7fedbc1f8000, 114688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8000) = 0x7fedbc1f8000
mmap(0x7fedbc214000, 24576, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7fedbc214000
mmap(0x7fedbc21b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2a000) = 0x7fedbc21b000
mmap(NULL, 62064, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fedbc1e0000
mmap(0x7fedbc1e3000, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7fedbc1e3000
mmap(0x7fedbc1eb000, 12288, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb000) = 0x7fedbc1eb000
mmap(0x7fedbc1ee000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0x7fedbc1ee000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fedbc1de000
mmap(NULL, 1667232, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fedbc046000
mmap(0x7fedbc0d0000, 774144, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x8a000) = 0x7fedbc0d0000
mmap(0x7fedbc18d000, 262144, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x147000) = 0x7fedbc18d000
mmap(0x7fedbc1ce000, 49152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x187000) = 0x7fedbc1ce000
mmap(0x7fedbc1da000, 12448, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fedbc1da000
mmap(NULL, 1585472, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fedbbec2000
mmap(0x7fedbbecf000, 655360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0x7fedbbecf000
mmap(0x7fedbbf6f000, 872448, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xad000) = 0x7fedbbf6f000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot allocate memory)
                                                                                               failed to mmap with 12 (Cannot allocate memory)
 mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0) = -1 ENOMEM (Cannot allocate memory)
                                                                                                failed to mmap with 12 (Cannot allocate memory)
...

Comment 4 Maciej Żenczykowski 2019-02-16 13:11:21 UTC
perhaps the most relevant:

...
perf_event_open({type=PERF_TYPE_TRACEPOINT, size=PERF_ATTR_SIZE_VER0, config=316, ...}, -1, 0, -1, 0) = 3
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\22\7\0\0\0\0\0\0", 32) = 32
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = -1 ENOMEM (Cannot allocate memory)
...

Comment 6 Maciej Żenczykowski 2019-02-21 11:59:30 UTC
Indeed likely fix is:

https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.20.11

commit d26f63553b274355a18c515b891927b916beeba5
Author: Ingo Molnar <mingo>
Date:   Wed Feb 13 07:57:02 2019 +0100

    perf/core: Fix impossible ring-buffer sizes warning
    
    commit 528871b456026e6127d95b1b2bd8e3a003dc1614 upstream.
    
    The following commit:
    
      9dff0aa95a32 ("perf/core: Don't WARN() for impossible ring-buffer sizes")
    
    results in perf recording failures with larger mmap areas:
    
      root@skl:/tmp# perf record -g -a
      failed to mmap with 12 (Cannot allocate memory)
    
    The root cause is that the following condition is buggy:
    
            if (order_base_2(size) >= MAX_ORDER)
                    goto fail;
    
    The problem is that @size is in bytes and MAX_ORDER is in pages,
    so the right test is:
    
            if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
                    goto fail;
    
    Fix it.
    
    Reported-by: "Jin, Yao" <yao.jin.com>
    Bisected-by: Borislav Petkov <bp>
    Analyzed-by: Peter Zijlstra <peterz>
    Cc: Julien Thierry <julien.thierry>
    Cc: Mark Rutland <mark.rutland>
    Cc: Alexander Shishkin <alexander.shishkin.com>
    Cc: Arnaldo Carvalho de Melo <acme>
    Cc: Jiri Olsa <jolsa>
    Cc: Linus Torvalds <torvalds>
    Cc: Namhyung Kim <namhyung>
    Cc: Peter Zijlstra <peterz>
    Cc: Thomas Gleixner <tglx>
    Cc: Greg Kroah-Hartman <gregkh>
    Cc: <stable.org>
    Fixes: 9dff0aa95a32 ("perf/core: Don't WARN() for impossible ring-buffer sizes")
    Signed-off-by: Ingo Molnar <mingo>
    Signed-off-by: Greg Kroah-Hartman <gregkh>

Comment 7 Maciej Żenczykowski 2019-02-21 12:12:33 UTC
4.20.8 worked, 4.20.10 didn't, 4.20.11 works on a machine without the RTX 2080 Ti.

So fix confirmed.

Comment 8 Ben Cotton 2019-10-31 19:55:31 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.


Note You need to log in before you can comment on or make changes to this bug.