Bug 1911009
Summary: | Black screen and unresponsive system involving amdgpu starting when booting 5.10 kernels | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Matt Fagnani <matt.fagnani> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | acaringi, adscvr, airlied, bskeggs, hdegoede, itamar, jarodwilson, jeremy, jglisse, jonathan, josef, kernel-maint, lgoncalv, linville, masami256, mchehab, ptalbert, steved |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | --- | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-01-10 01:34:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Matt Fagnani
2020-12-26 23:15:51 UTC
Created attachment 1742164 [details]
trace image from boot with 5.10.0-0.rc6.20201204git34816d20f173.92.fc34
5.10.3 is affected by the same problem with the default kernel command line. When I booted 5.10.3 with amdgpu.dc=0, a null pointer dereference in dc_commit_state in amdgpu happened while amdgpu was starting. The boot completed with amdgpu.dc=0. Dec 29 15:21:08 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Dec 29 15:21:08 kernel: #PF: supervisor instruction fetch in kernel mode Dec 29 15:21:08 kernel: #PF: error_code(0x0010) - not-present page Dec 29 15:21:08 kernel: PGD 0 P4D 0 Dec 29 15:21:08 kernel: Oops: 0010 [#1] SMP NOPTI Dec 29 15:21:08 kernel: CPU: 2 PID: 356 Comm: plymouthd Not tainted 5.10.3-200.fc33.x86_64 #1 Dec 29 15:21:08 kernel: Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 12/03/2019 Dec 29 15:21:08 kernel: RIP: 0010:0x0 Dec 29 15:21:08 kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. Dec 29 15:21:08 kernel: RSP: 0018:ffffa0cdc05938c8 EFLAGS: 00010286 Dec 29 15:21:08 kernel: RAX: 0000000000000000 RBX: ffff8d638f2c01b8 RCX: ffff8d638a86a000 Dec 29 15:21:08 kernel: RDX: 0000000000000000 RSI: 00000000000005cf RDI: ffff8d6388d99420 Dec 29 15:21:08 kernel: RBP: ffff8d638f2c0000 R08: ffffa0cdc05938c4 R09: 0000000000000001 Dec 29 15:21:08 kernel: R10: 0000000000000004 R11: 0000000000000003 R12: 0000000000000000 Dec 29 15:21:08 kernel: R13: 0000000000000000 R14: ffff8d638b16ec00 R15: ffff8d638e870000 Dec 29 15:21:08 kernel: FS: 00007f0defc53f40(0000) GS:ffff8d6477500000(0000) knlGS:0000000000000000 Dec 29 15:21:08 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 29 15:21:08 kernel: CR2: ffffffffffffffd6 CR3: 00000001097ca000 CR4: 00000000001506e0 Dec 29 15:21:08 kernel: Call Trace: Dec 29 15:21:08 kernel: dc_commit_state+0x823/0xa20 [amdgpu] Dec 29 15:21:08 kernel: ? drm_calc_timestamping_constants+0x195/0x1f0 [drm] Dec 29 15:21:08 kernel: amdgpu_dm_atomic_commit_tail+0x527/0x2420 [amdgpu] Dec 29 15:21:08 kernel: ? amdgpu_move_blit+0xbc/0x200 [amdgpu] Dec 29 15:21:08 kernel: ? amdgpu_bo_move+0x9f/0x290 [amdgpu] Dec 29 15:21:08 kernel: ? ttm_bo_handle_move_mem+0xb4/0x460 [ttm] Dec 29 15:21:08 kernel: ? ttm_bo_validate+0x121/0x130 [ttm] Dec 29 15:21:08 kernel: ? dm_plane_helper_prepare_fb+0x18b/0x220 [amdgpu] Dec 29 15:21:08 kernel: ? _cond_resched+0x16/0x40 Dec 29 15:21:08 kernel: ? _cond_resched+0x16/0x40 Dec 29 15:21:08 kernel: ? __wait_for_common+0x2b/0x130 Dec 29 15:21:08 kernel: commit_tail+0x94/0x130 [drm_kms_helper] Dec 29 15:21:08 kernel: drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper] Dec 29 15:21:08 kernel: drm_atomic_helper_set_config+0x70/0xb0 [drm_kms_helper] Dec 29 15:21:08 kernel: drm_mode_setcrtc+0x1d3/0x6f0 [drm] Dec 29 15:21:08 kernel: ? avc_has_extended_perms+0x18d/0x3e0 Dec 29 15:21:08 kernel: ? drm_mode_getcrtc+0x180/0x180 [drm] Dec 29 15:21:08 kernel: drm_ioctl_kernel+0x86/0xd0 [drm] Dec 29 15:21:08 kernel: drm_ioctl+0x20f/0x3a0 [drm] Dec 29 15:21:08 kernel: ? drm_mode_getcrtc+0x180/0x180 [drm] Dec 29 15:21:08 kernel: amdgpu_drm_ioctl+0x49/0x80 [amdgpu] Dec 29 15:21:08 kernel: __x64_sys_ioctl+0x83/0xb0 Dec 29 15:21:08 kernel: do_syscall_64+0x33/0x40 Dec 29 15:21:08 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Dec 29 15:21:08 kernel: RIP: 0033:0x7f0defb3538b Dec 29 15:21:08 kernel: Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd ba 0c 00 f7 d8 64 89 01 48 Dec 29 15:21:08 kernel: RSP: 002b:00007fff1fdf1898 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Dec 29 15:21:08 kernel: RAX: ffffffffffffffda RBX: 00007fff1fdf18d0 RCX: 00007f0defb3538b Dec 29 15:21:08 kernel: RDX: 00007fff1fdf18d0 RSI: 00000000c06864a2 RDI: 0000000000000009 Dec 29 15:21:08 kernel: RBP: 00000000c06864a2 R08: 0000000000000000 R09: 0000563833617a10 Dec 29 15:21:08 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000049 Dec 29 15:21:08 kernel: R13: 0000000000000009 R14: 0000563833617960 R15: 00005638336179a0 Dec 29 15:21:08 kernel: Modules linked in: hid_logitech_hidpp hid_logitech_dj amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 ghash_clmulni_intel gpu_sched ttm i2c_algo_bit drm_kms_helper serio_raw cec drm r8169 xhci_pci xhci_pci_renesas wmi video hid_multitouch fuse Dec 29 15:21:08 kernel: CR2: 0000000000000000 Dec 29 15:21:08 kernel: ---[ end trace 744138fdca27bd9c ]--- Dec 29 15:21:08 kernel: RIP: 0010:0x0 Dec 29 15:21:08 kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. Dec 29 15:21:08 kernel: RSP: 0018:ffffa0cdc05938c8 EFLAGS: 00010286 Dec 29 15:21:08 kernel: RAX: 0000000000000000 RBX: ffff8d638f2c01b8 RCX: ffff8d638a86a000 Dec 29 15:21:08 kernel: RDX: 0000000000000000 RSI: 00000000000005cf RDI: ffff8d6388d99420 Dec 29 15:21:08 kernel: RBP: ffff8d638f2c0000 R08: ffffa0cdc05938c4 R09: 0000000000000001 Dec 29 15:21:08 kernel: R10: 0000000000000004 R11: 0000000000000003 R12: 0000000000000000 Dec 29 15:21:08 kernel: R13: 0000000000000000 R14: ffff8d638b16ec00 R15: ffff8d638e870000 Dec 29 15:21:08 kernel: FS: 00007f0defc53f40(0000) GS:ffff8d6477500000(0000) knlGS:0000000000000000 Dec 29 15:21:08 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 29 15:21:08 kernel: CR2: ffffffffffffffd6 CR3: 00000001097ca000 CR4: 00000000001506e0 A warning involving amdgpu and systemd-backlight happened shortly after that. I've seen this warning before when booting 5.9 kernels where systemd-backlight failed to start so I'm unsure if it's related to the black screen and unresponsive system problem. Dec 29 15:22:16 kernel: ------------[ cut here ]------------ Dec 29 15:22:16 kernel: WARNING: CPU: 2 PID: 619 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2548 dc_link_set_backlight_level+0x8a/0xf0 [amdgpu] Dec 29 15:22:16 kernel: Modules linked in: soundcore fjes(-) i2c_scmi hp_wireless acpi_cpufreq zram ip_tables hid_logitech_hidpp hid_logitech_dj amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel iommu_v2 ghash_clmulni_intel gpu_sched ttm i2c_algo_bit drm_kms_helper serio_raw cec drm r8169 xhci_pci xhci_pci_renesas wmi video hid_multitouch fuse Dec 29 15:22:16 kernel: CPU: 2 PID: 619 Comm: systemd-backlig Tainted: G D 5.10.3-200.fc33.x86_64 #1 Dec 29 15:22:16 kernel: Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.52 12/03/2019 Dec 29 15:22:16 kernel: RIP: 0010:dc_link_set_backlight_level+0x8a/0xf0 [amdgpu] Dec 29 15:22:16 kernel: Code: 70 03 00 00 31 c0 48 8d 96 c0 01 00 00 48 8b 0a 48 85 c9 74 06 48 3b 59 08 74 20 83 c0 01 48 81 c2 d8 04 00 00 83 f8 06 75 e3 <0f> 0b 45 31 e4 5b 44 89 e0 5d 41 5c 41 5d 41 5e c3 48 98 48 69 c0 Dec 29 15:22:16 kernel: RSP: 0018:ffffa0cdc0a77e08 EFLAGS: 00010246 Dec 29 15:22:16 kernel: RAX: 0000000000000006 RBX: ffff8d638b16ec00 RCX: 0000000000000000 Dec 29 15:22:16 kernel: RDX: ffff8d638e8c1ed0 RSI: ffff8d638e8c0000 RDI: 0000000000000000 Dec 29 15:22:16 kernel: RBP: ffff8d638e870000 R08: 0000000000000032 R09: 000000000000000a Dec 29 15:22:16 kernel: R10: 000000000000000a R11: f000000000000000 R12: 0000000000003b01 Dec 29 15:22:16 kernel: R13: 0000000000000000 R14: 0000000000003be1 R15: ffff8d6380f550e0 Dec 29 15:22:16 kernel: FS: 00007f151b4bc000(0000) GS:ffff8d6477500000(0000) knlGS:0000000000000000 Dec 29 15:22:16 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 29 15:22:16 kernel: CR2: 000055dfffaf18f8 CR3: 0000000102068000 CR4: 00000000001506e0 Dec 29 15:22:16 kernel: Call Trace: Dec 29 15:22:16 kernel: amdgpu_dm_backlight_update_status+0xb4/0xc0 [amdgpu] Dec 29 15:22:16 kernel: backlight_device_set_brightness+0x6e/0x110 Dec 29 15:22:16 kernel: brightness_store+0x3b/0x50 Dec 29 15:22:16 kernel: kernfs_fop_write+0xce/0x1b0 Dec 29 15:22:16 kernel: vfs_write+0xc3/0x270 Dec 29 15:22:16 kernel: ksys_write+0x4f/0xc0 Dec 29 15:22:16 kernel: do_syscall_64+0x33/0x40 Dec 29 15:22:16 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Dec 29 15:22:16 kernel: RIP: 0033:0x7f151b5bf297 Dec 29 15:22:16 kernel: Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 Dec 29 15:22:16 kernel: RSP: 002b:00007fff7481c928 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 Dec 29 15:22:16 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f151b5bf297 Dec 29 15:22:16 kernel: RDX: 0000000000000003 RSI: 00007fff7481ca10 RDI: 0000000000000004 Dec 29 15:22:16 kernel: RBP: 00007fff7481ca10 R08: 0000000000000000 R09: 0000000000000000 Dec 29 15:22:16 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 Dec 29 15:22:16 kernel: R13: 000055dfffadb650 R14: 0000000000000003 R15: 00007f151b692720 Dec 29 15:22:16 kernel: ---[ end trace 744138fdca27bd9d ]--- Created attachment 1743062 [details]
journalctl output for boot with default kernel parameters ending with a black screen and unresponsive system
The traces with the null pointer dereference and warning in amdgpu in my previous comment were for the first boot of 5.10.3 with the default kernel command line parameters which ended with the black screen and unresponsive system. I just didn't see them until the next successful boot with 5.10.3 and amdgpu.dc=0. I'm attaching the journal for the first boot with the null pointer dereference and black screen problem.
This problem appears to have been fixed in 5.10.5. 5.10.5 has booted normally each time with the default kernel parameters. 5.10.4 was affected by this problem. I reported this problem on 12/30 at https://gitlab.freedesktop.org/drm/amd/-/issues/1421 Thanks. |