Created attachment 1575695 [details] Linux version 5.1.5-300.fc30.x86_64 Description of the problem: Kernel stalls, no tty or response to Ctrl+Alt+del press. Problematic kernel version: Linux version 5.1.5-300.fc30.x86_64 Last working version: Kernel version 5.0.17 is the last working version so far. The version problem started: All kernel preceding 5.1.0 have this issue. Steps to reproduce the problem: Install kernel version 5.1+. Use a GPU RX 580 8GB with z77 chipset and i7 3770 Processor. Latest rawhide kernel version (kernel-5.2.0-0.rc1.git2.2.fc31.x86_64) also exhibit this problem. Not using any external modules. Attached kernel log for version 5.1.5
If using kernel command line amdgpu.dpm=0, kernel boots. But on rawhide kernel when using amdgpu.dpm=0 it produces this error: kernel: [drm] amdgpu kernel modesetting enabled. kernel: CRAT table not found kernel: Virtual CRAT table created for CPU kernel: Parsing CRAT table with 1 nodes kernel: Creating topology SYSFS entries kernel: Topology: Add CPU node kernel: Finished initializing topology kernel: amdgpu 0000:04:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff kernel: amdgpu 0000:04:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff kernel: amdgpu 0000:04:00.0: remove_conflicting_pci_framebuffers: bar 5: 0xf7800000 -> 0xf783ffff kernel: checking generic (e0000000 300000) vs hw (e0000000 10000000) kernel: fb0: switching to amdgpudrmfb from EFI VGA kernel: Console: switching to colour dummy device 80x25 kernel: amdgpu 0000:04:00.0: vgaarb: deactivate vga console kernel: [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1DA2:0xE387 0xE7). kernel: [drm] register mmio base: 0xF7800000 kernel: [drm] register mmio size: 262144 kernel: [drm] add ip block number 0 <vi_common> kernel: [drm] add ip block number 1 <gmc_v8_0> kernel: [drm] add ip block number 2 <tonga_ih> kernel: [drm] add ip block number 3 <gfx_v8_0> kernel: [drm] add ip block number 4 <sdma_v3_0> kernel: [drm] add ip block number 5 <powerplay> kernel: [drm] add ip block number 6 <dm> kernel: [drm] add ip block number 7 <uvd_v6_0> kernel: [drm] add ip block number 8 <vce_v3_0> kernel: kfd kfd: skipped device 1002:67df, PCI rejects atomics kernel: [drm] UVD is enabled in VM mode kernel: [drm] UVD ENC is enabled in VM mode kernel: [drm] VCE enabled in VM mode kernel: resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000d3fff window] kernel: caller pci_map_rom+0x6a/0x17d mapping multiple BARs kernel: amdgpu 0000:04:00.0: No more image in the PCI ROM kernel: ATOM BIOS: 113-1E3870U-O45 kernel: [drm] RAS INFO: ras initialized successfully, hardware ability[0] ras_mask[0] kernel: [drm] vm size is 128 GB, 2 levels, block size is 10-bit, fragment size is 9-bit kernel: amdgpu 0000:04:00.0: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used) kernel: amdgpu 0000:04:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF kernel: [drm] Detected VRAM RAM=8192M, BAR=256M kernel: [drm] RAM width 256bits GDDR5 kernel: [TTM] Zone kernel: Available graphics memory: 12350340 KiB kernel: [TTM] Zone dma32: Available graphics memory: 2097152 KiB kernel: [TTM] Initializing pool allocator kernel: [TTM] Initializing DMA pool allocator kernel: [drm] amdgpu: 8192M of VRAM memory ready kernel: [drm] amdgpu: 8192M of GTT memory ready. kernel: [drm] GART: num cpu pages 65536, num gpu pages 65536 kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400300000). kernel: [drm] Chained IB support enabled! kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16 kernel: [drm] Found VCE firmware Version: 53.26 Binary ID: 3 kernel: BUG: unable to handle page fault for address: ffffa5bd8394f650 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 606549067 P4D 606549067 PUD 0 kernel: Oops: 0000 [#1] SMP PTI kernel: CPU: 6 PID: 461 Comm: systemd-udevd Not tainted 5.2.0-0.rc1.git1.1.vanilla.knurd.1.fc30.x86_64 #1 kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./G1.Sniper 3, BIOS F8k 04/29/2013 kernel: RIP: 0010:bw_calcs_data_update_from_pplib.isra.0+0x378/0x4d0 [amdgpu] kernel: Code: 00 00 5b 5d 41 5c 41 5d 41 5e c3 48 8b 7d 00 4c 89 f2 be 02 00 00 00 e8 26 bf f9 ff 8b 04 24 4c 8b 23 be e8 03 00 00 83 e8 01 <8b> 7c 84 04 e8 6f 4d fb ff be e8 03 00 00 49 89 44 24 60 8b 04 24 kernel: RSP: 0018:ffffa5b98394f650 EFLAGS: 00010297 kernel: RAX: 00000000ffffffff RBX: ffff928b34cb92d8 RCX: 0000000000000000 kernel: RDX: ffffa5b98394f58c RSI: 00000000000003e8 RDI: ffff928b39c12800 kernel: RBP: ffff928b34cb9208 R08: 0000000000000020 R09: 000000032a000000 kernel: R10: 00000003ce000000 R11: 0000001770000000 R12: ffff928b3ac0b300 kernel: R13: ffffa5b98394f76c R14: ffffa5b98394f650 R15: ffffffffc0839d60 kernel: FS: 00007f1133ad1940(0000) GS:ffff928b46b80000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: ffffa5bd8394f650 CR3: 00000005faf54004 CR4: 00000000001606e0 kernel: Call Trace: kernel: dce112_create_resource_pool+0x6de/0x700 [amdgpu] kernel: dc_create_resource_pool+0x16c/0x220 [amdgpu] kernel: ? dal_gpio_service_create+0x92/0x110 [amdgpu] kernel: dc_create+0x219/0x620 [amdgpu] kernel: ? amdgpu_cgs_create_device+0x23/0x50 [amdgpu] kernel: amdgpu_dm_init+0xeb/0x160 [amdgpu] kernel: dm_hw_init+0xe/0x20 [amdgpu] kernel: amdgpu_device_init.cold+0x128d/0x161f [amdgpu] kernel: ? kmalloc_order+0x14/0x30 kernel: amdgpu_driver_load_kms+0x88/0x270 [amdgpu] kernel: drm_dev_register+0x111/0x150 [drm] kernel: amdgpu_pci_probe+0xbd/0x120 [amdgpu] kernel: ? __pm_runtime_resume+0x58/0x80 kernel: local_pci_probe+0x42/0x80 kernel: pci_device_probe+0x115/0x190 kernel: really_probe+0xf0/0x390 kernel: driver_probe_device+0xb6/0x100 kernel: device_driver_attach+0x53/0x60 kernel: __driver_attach+0x8a/0x150 kernel: ? device_driver_attach+0x60/0x60 kernel: bus_for_each_dev+0x78/0xc0 kernel: bus_add_driver+0x14a/0x1e0 kernel: driver_register+0x6c/0xb0 kernel: ? 0xffffffffc09b9000 kernel: do_one_initcall+0x46/0x1f4 kernel: ? _cond_resched+0x15/0x30 kernel: ? kmem_cache_alloc_trace+0x154/0x1c0 kernel: ? do_init_module+0x23/0x230 kernel: do_init_module+0x5c/0x230 kernel: load_module+0x22eb/0x28e0 kernel: ? __do_sys_init_module+0x16e/0x1a0 kernel: __do_sys_init_module+0x16e/0x1a0 kernel: do_syscall_64+0x5b/0x180 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 kernel: RIP: 0033:0x7f1134ad1bae kernel: Code: 48 8b 0d dd 42 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d aa 42 0c 00 f7 d8 64 89 01 48 kernel: RSP: 002b:00007ffe9cb83118 EFLAGS: 00000246 ORIG_RAX: 00000000000000af kernel: RAX: ffffffffffffffda RBX: 0000563b364ce650 RCX: 00007f1134ad1bae kernel: RDX: 0000563b364b50a0 RSI: 00000000006dfa2e RDI: 0000563b36d998b0 kernel: RBP: 0000563b36d998b0 R08: 0000563b364ba730 R09: 0000000000000001 kernel: R10: 0000000000000002 R11: 0000000000000246 R12: 0000563b364b50a0 kernel: R13: 0000000000000006 R14: 0000563b364c9fa0 R15: 0000000000000000 kernel: Modules linked in: amdgpu(+) amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crc32c_intel serio_raw drm e1000e(+) alx mdio video wmi vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio kernel: CR2: ffffa5bd8394f650 kernel: ---[ end trace e14f412d43dd70ae ]--- kernel: RIP: 0010:bw_calcs_data_update_from_pplib.isra.0+0x378/0x4d0 [amdgpu] kernel: Code: 00 00 5b 5d 41 5c 41 5d 41 5e c3 48 8b 7d 00 4c 89 f2 be 02 00 00 00 e8 26 bf f9 ff 8b 04 24 4c 8b 23 be e8 03 00 00 83 e8 01 <8b> 7c 84 04 e8 6f 4d fb ff be e8 03 00 00 49 89 44 24 60 8b 04 24 kernel: RSP: 0018:ffffa5b98394f650 EFLAGS: 00010297 kernel: RAX: 00000000ffffffff RBX: ffff928b34cb92d8 RCX: 0000000000000000 kernel: RDX: ffffa5b98394f58c RSI: 00000000000003e8 RDI: ffff928b39c12800 kernel: RBP: ffff928b34cb9208 R08: 0000000000000020 R09: 000000032a000000 kernel: R10: 00000003ce000000 R11: 0000001770000000 R12: ffff928b3ac0b300 kernel: R13: ffffa5b98394f76c R14: ffffa5b98394f650 R15: ffffffffc0839d60 kernel: FS: 00007f1133ad1940(0000) GS:ffff928b46b80000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: ffffa5bd8394f650 CR3: 00000005faf54004 CR4: 00000000001606e0
My hardware is as follows: CPU: i7 3770 at stock clock Motherboard: Gigabyte G1.Sniper 3 latest BIOS available RAM: 24 GB DDR3 at 1600 mhz GPU: RX 580 8GB (Sapphire) latest VBIOS Tried mainline stable branch version 5.1.6 the results are same. Display hangs when amdgpu driver loads. I'm unable to determine if the booting is continued or hangs as well. Disk activity stops after couple seconds and not possible to switch TTY. Ctrl+Alt+Del is unresponsive as well. This problem goes away when amdgpu.dpm=0 is used but in that case dynamic power scaling is not available and gpu stuck at low clock, graphics performance is abysmal. Also GPU temp/fan speed utilities doesn't work. Here is the excerpt of the problematic log lines: Jun 02 09:54:05 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:06 kernel: amdgpu: [powerplay] failed to send message 15b ret is 65535 Jun 02 09:54:06 kernel: hrtimer: interrupt took 287743313 ns Jun 02 09:54:06 kernel: clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large: Jun 02 09:54:06 kernel: clocksource: 'hpet' wd_now: 628dd7b wd_last: 5fef431 mask: ffffffff Jun 02 09:54:06 kernel: clocksource: 'tsc' cs_now: 254aa24747 cs_last: 25104a5bfd mask: ffffffffffffffff Jun 02 09:54:06 kernel: tsc: Marking TSC unstable due to clocksource watchdog Jun 02 09:54:07 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:07 kernel: amdgpu: [powerplay] failed to send message 148 ret is 65535 Jun 02 09:54:07 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:07 kernel: amdgpu: [powerplay] failed to send message 145 ret is 65535 Jun 02 09:54:08 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:08 kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. Jun 02 09:54:08 kernel: sched_clock: Marking unstable (8791691311, 362291)<-(8817904668, -25851212) Jun 02 09:54:08 kernel: amdgpu: [powerplay] failed to send message 146 ret is 65535 Jun 02 09:54:08 kernel: hid-generic 0003:09DA:FC7C.0003: input,hidraw2: USB HID v1.11 Mouse [COMPANY USB Device] on usb-0000:00:1a.0-1.5.3/input0 Jun 02 09:54:09 kernel: hid-generic 0003:09DA:FC7C.0004: hiddev97,hidraw3: USB HID v1.11 Device [COMPANY USB Device] on usb-0000:00:1a.0-1.5.3/input1 Jun 02 09:54:11 kernel: clocksource: Switched to clocksource hpet Jun 02 09:54:13 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:13 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:14 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:15 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:15 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:15 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:15 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] last message was failed ret is 65535 Jun 02 09:54:16 kernel: amdgpu: [powerplay] failed to send message 260 ret is 65535 Jun 02 09:54:17 kernel: [drm] Initialized amdgpu 3.30.0 20150101 for 0000:04:00.0 on minor 0 Jun 02 09:54:17 kernel: EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null) Jun 02 09:54:20 kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110). Jun 02 09:54:21 kernel: [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test failed (-110). Any help is appreciated. Also let me know if I can help in any way.
Created attachment 1576462 [details] Linux version 5.1.6-350.vanilla.knurd.1.fc30.x86_64
Bug report progress is here: https://bugs.freedesktop.org/show_bug.cgi?id=110822
Problem still exist in 5.1.7 and 5.1.8 from updates-testing repo. Also in 5.1.8 and 5.2.0-0.rc3.git3.1 from vanilla fedora repo.
This message is a reminder that Fedora 30 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '30'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 30 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.