1. Please describe the problem: When starting up the system, (after rebooting or powering it off) the grub menu is shown. After selecting the 6.17.11 kernel the screen remains black until I press ESC which shows an error: [ ***] (2 of 2) Job dev-mapper-fedora_fedovice/start running (5s / no limit) BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page The boot-up process indefinitely stalls on dracut-initqueue.service and dev-mapper-fedora_fedovice. About 25% of the time there is also a kernel oops message related to amdgpu. 2. What is the Version-Release number of the kernel: 6.17.11-300.fc43 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : The problem started with 6.17.11. 6.17.10 and 6.17.9 both work. 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: I can reproduce it on every start. Steps: - Power on the system - Select the 6.17.11 kernel - The system fails to boot 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: The problems occurs with 6.17.12-300 and 6.18.0-65 6. Are you running any modules that not shipped with directly Fedora's kernel?: No. 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. Nothing is logged to journalctl when using the problematic kernels. I'll attach what's displayed on my screen. Reproducible: Always
Created attachment 2118624 [details] 6.17.11 error with oops message This is the output from my screen since journalctl logs nothing when using 6.17.11 or later
Two comments... FIRST Comment I am experiencing same issue as shown below (you see 'Job' lines get a bit mangled with each other on the screen): [ *** ] (1 of 3) Job dev-mapper-fedora_coppvice/start running (9s / no limit) BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page [*** ] (2 of 3) Job dracut-initqueue.service/start running (10s / no limit) #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not present page PGD 0 P4D 0 Oops: Oops: 0010 [#2] SMP NOPTI CPU: 4 UID: 0 PID: 399 Comm: kworker/4:2 Tainted: G S D 6.17.11-300.fc43.x86_64 #1 PREEMPT(lazy) Tainted: [S]=CPU_OUT_OF_SPEC, [D]=DIE Hardware name: To be filled by O.E.M. To be filled by O.E.M. /M5A99FX PRO R2.0, BIOS 2501 04/07/2014 Workqueue: events amdgpu_tlb_fence_work [amdgpu] RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffd0ac80ae7de0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000008001 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 0000000000008001 RDI: ffff8d35d4c00000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: ffff8d35d4c00000 FS: 0000000000000000(0000) GS:ffff8d3d57a47000(000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000110f19000 CR4: 00000000000406f0 Call Trace: <TASK> amdgpu_gmc_flush_gpu_tlb_pasid+0xd9/0x400 [amdgpu] amd_tlb_fence_work+0x6e/0xe0 [amdgpu] process_one_work+0x192/0x350 worker_thread+0x25a/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xfc/0x240 ? __pfx_kthread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0xf4/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: amdgpu amdxcp gpu_sched drm_panel_backlight_quirks drm_buddy radeon drm_ttm_helper ttm video drm_exec ic2_algo_bit drm_suballoc_helper polyval_clmulni drm_display_helper ghash_clmulni_intel sp5100_toc cec wmi ntsync i2c_ dev fuse CR2: 0000000000000000 ---[end trace 0000000000000000 ] --- RIP 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffd0ac80ae7de0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000008000 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 0000000000008000 RDI: ffff8d35d4c00000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: ffff8d35d4c00000 FS: 0000000000000000(0000) GS:ffff8d3d57a47000(000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000110f19000 CR4: 00000000000406f0 [*** ] (3 of 3) Job dev-disk-by\x2duuid-72tart running (1min 11s / no limit) SECOND Comment, However I also notice that the processing hangs waiting forever on motherboard the SATA connected to a DVD drive (interface ata5). Other drives load sucessfully. endlessly reports extracts shown below (you see 'Job' lines get a bit mangled on the screen): ata5.00: qc timeout after 5000 msecs (cmd 0xa1) ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ *** ] (3 of 3) Job dev-disk-by\x2duuid-72ice/start running (13s / no limit) [ *** ] (1 of 3) Job dracut-initqueue.service/start running (14s / no limit) [ *] (2 of 3) Job dev-mapper-fedora_coppice/start running (15s / no limit) ata5: STAT link up 1.5Gbps (SStatus 113 SControl 300) ata5.00: configured for PIO0 ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.00: irq_stat 0x08000000, interface fatal error ata5.00: cmd a0/00:00:00:58:01/00:00:00:00:00/a0 tag 8 pio 16728 in Get configuration 46 00 00 00 00 00 00 01 58 00res 50/00:03:00:08:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error) ata5.00: status: { DRDY } [ *** ] etc... Boot parameters used are: kernel="/boot/vmlinuz-6.17.11-300.fc43.x86_64" args="ro resume=UUID=72fe53d3-a0ec-4bfa-aeaf-d1f29d0cedcb rd.lvm.lv=fedora_copper00/root rd.lvm.lv=fedora_copper00/swap rhgb quiet radeon.core=0 amd_iommu=on iommu=pt psi=1 amdgpu.ppfeaturemask=0xffffffff $tuned_params" root="/dev/mapper/fedora_copper00-root" initrd="/boot/initramfs-6.17.11-300.fc43.x86_64.img $tuned_initrd" title="Fedora Linux (6.17.11-300.fc43.x86_64) 43 (Workstation Edition)" id="aca48782b6b1487aa208ca8fa9f174bd-6.17.11-300.fc43.x86_64" Where $tuned_params profile: balanced $tuned_initrd : "-initrd" Base Board Information: Manufacturer: ASUSTeK COMPUTER INC. Product Name: M5A99FX PRO R2.0 Version: Rev 1.xx Serial Number: 141134705300622 Screenshots can be provided if required.
#### #### Same behaviour observed following recent 6.17.12-300.fc43.x86_64 update shown as below (copied from console photo): #### [ *** ] (1 of 3) Job dev-mapper-fedora-coppvice/start running (9s / no limit) BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor fetch in kernel mode #PF: error_code(0x0010) - not-present page [*** ] (2 of 3) Job dracut-initqueue.service/start running (10s / no limit) #PF: supervisor fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 0 P4D 0 Oops: Oops: 0010 [#2] SMP NOPTI CPU: 2 UID: 0 PID: 227 Comm: kworker/2:2 Tainted: G S D 6.17.12-300.fc43.x86_64 ~1 PREEMPT(lazy) Tainted: [S]=CPU_OUT_OF_SPEC, [D]=DIE Hardware name: To be filled by O.E.M. To be filled by O.E.M/M5A99FX PRO R2.0, BIOS 2501 04/07/2014 Workqueue: events amdgpu_tlb_fence_work [amdgpu] RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffd33000427de0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000008001 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 0000000000008001 RDI: ffff8b6359a80000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: ffff8b6359a80000 FS: 0000000000000000(0000) GS:ffff8b6aca347000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000110b65000 CR4: 00000000000406f0 Call Trace: <TASK> amdgpu_gmc_flush_gpu_tlb_pasid+0xd9/0x400 [amdgpu] amdgpu_tlb_fence_work+0x6e/0xe0 [amdgpu] process_one_work+0x192/0x350 worker_thread+0x25a/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xfc/0x240 ? __pfx_worker_thread+0x10/0x10 ? __pfx_worker_thread+0x10/0x10 ret_from_fork+0xf4/0x110 ? __pfx_worker_thread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Module linked in: amdgpu amdxcp gpu_sched drm_panel_backlight_quirks drm_buddy radeon drm_ttm_helper ttm video drm_exec i2c_algo_bit drm_suballoc_helper polyval_clmulni ghash_clmulni_intel drm_display_helper sp5100_tco cec wmi ntsync i2c_ dev fuse CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffd3300092fde0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000008000 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 0000000000008000 RDI: ffff8b6359a80000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000004 R10: ffff8b63458dd380 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: ffff8b6359a80000 FS: 0000000000000000(0000) GS:ffff8b6aca347000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000110b65000 CR4: 00000000000406f0 [ *** ] (3 of 3) Job dev-disk-by\x2duuid-72tart running (3min 29s / no limit) #### #### Boot loader config: #### root@fedora:/boot/loader/entries# cat aca48782b6b1487aa208ca8fa9f174bd-6.17.12-300.fc43.x86_64.conf title Fedora Linux (6.17.12-300.fc43.x86_64) 43 (Workstation Edition) version 6.17.12-300.fc43.x86_64 linux /vmlinuz-6.17.12-300.fc43.x86_64 initrd /initramfs-6.17.12-300.fc43.x86_64.img $tuned_initrd options root=/dev/mapper/fedora_copper00-root ro resume=UUID=72fe53d3-a0ec-4bfa-aeaf-d1f29d0cedcb rd.lvm.lv=fedora_copper00/root rd.lvm.lv=fedora_copper00/swap rhgb quiet radeon.core=0 amd_iommu=on iommu=pt psi=1 amdgpu.ppfeaturemask=0xffffffff $tuned_params grub_users $grub_users grub_arg --unrestricted grub_class fedora #### #### Will attempt with removal of boot option amdgpu.ppfeaturemask=0xffffffff as running outside of spec #### Will also attempt alternate andgpu feature masks 0xfffd7fff or 0xfffd3fff. ####
#### removal of boot option amdgpu.ppfeaturemask=0xffffffff does not result in change to behaviour, neither does adding PTI - Page Table isolation. [ OK ] Stopped systemd-vconsole-setup.service - Virtual Console Setup. Stopping systemd-vconsole-setup.service - Virtual Console Setup... Starting systemd-vconsole-setup.service - Virtual Console Setup... BUG: kernel NULL pointer dereference, address: 0000000000000000 [ OK ] Finished systemd-vconsole-setup.service - Virtual Console Setup. #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 0 P4D 0 Oops: Oops: 0010 [#2] SMP PTI CPU: 2 UID: 0 PID: 288 Comm: kworker/2:2 Tainted: G D 6.17.12-300.fc43.x86_64 #1 PREEMPT(lazy) Tainted: [D]=DIE Hardware name: To be filled by O.E.M. To be filled by O.E.M/M5A99FX PRO R2.0, BIOS 2501 04/07/2014 Workqueue: events amdgpu_tlb_fence_work [amdgpu] RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffce7980977de0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000008001 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 0000000000008001 RDI: ffff8c17d6400000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: ffff8c17d6400000 FS: 0000000000000000(0000) GS:ffff8c1f2d347000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000111fda000 CR4: 00000000000406f0 Call Trace: <TASK> amdgpu_gmc_flush_gpu_tlb_pasid+0xd9/0x400 [amdgpu] amdgpu_tlb_fence_work+0x6e/0xe0 [amdgpu] process_one_work+0x192/0x350 worker_thread+0x25a/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xfc/0x240 ? __pfx_worker_thread+0x10/0x10 ? __pfx_worker_thread+0x10/0x10 ret_from_fork+0xf4/0x110 ? __pfx_worker_thread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Module linked in: amdgpu amdxcp gpu_sched drm_panel_backlight_quirks drm_buddy radeon drm_ttm_helper ttm video drm_exec i2c_algo_bit drm_suballoc_helper polyval_clmulni drm_display_helper ghash_clmulni_intel sp5100_tco cec wmi ntsync i2c_ dev fuse CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0018:ffffce7980937de0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000008000 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 0000000000008000 RDI: ffff8c17d6400000 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000004 R10: ffff8c17c1328000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: ffff8c17d6400000 FS: 0000000000000000(0000) GS:ffff8c1f2d347000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 0000000111fda000 CR4: 00000000000406f0 [* ] (1 of 3) Job dev-disk-by\x2duuid-72fe53d3\x2da0ec\x2d4bfa\x2daeaf\x2dd1f29d0cedcb.device/start running (38s / no limit)
I think this is a new amdgpu support issue with existing Radeon GPU, in my case specifically $ lspci | grep -i 'VGA' VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland PRO [Radeon R7 240/340 / Radeon 520] (rev 87) I think backwards compatibility with this GPU has been lost between 6.17.10-300.fc43.x86_64 and 6.17.11-300.fc43.x86_64. Have checked 6.17.10-300.fc43.x86_64 using modinfo all loaded modules and determined that all are GPL or Dual licensed (not proprietary) and that all modules report as intree. Full list of modules in kernel are: $ lsmod Module Size Used by uinput 32768 0 snd_seq_dummy 12288 0 snd_hrtimer 12288 1 nf_conntrack_netbios_ns 12288 1 nf_conntrack_broadcast 12288 1 nf_conntrack_netbios_ns nft_fib_inet 12288 1 nft_fib_ipv4 12288 1 nft_fib_inet nft_fib_ipv6 12288 1 nft_fib_inet nft_fib 12288 3 nft_fib_ipv6,nft_fib_ipv4,nft_fib_inet nft_reject_inet 12288 10 nf_reject_ipv4 12288 1 nft_reject_inet nf_reject_ipv6 24576 1 nft_reject_inet nft_reject 12288 1 nft_reject_inet nft_ct 28672 9 nft_chain_nat 12288 3 nf_nat 65536 1 nft_chain_nat nf_conntrack 212992 4 nf_nat,nft_ct,nf_conntrack_netbios_ns,nf_conntrack_broadcast nf_defrag_ipv6 24576 1 nf_conntrack nf_defrag_ipv4 12288 1 nf_conntrack nf_tables 430080 327 nft_ct,nft_reject_inet,nft_fib_ipv6,nft_fib_ipv4,nft_chain_nat,nft_reject,nft_fib,nft_fib_inet qrtr 57344 2 sunrpc 921600 1 binfmt_misc 28672 1 uvcvideo 192512 1 snd_hda_codec_alc662 20480 1 snd_hda_codec_realtek_lib 65536 1 snd_hda_codec_alc662 snd_hda_codec_atihdmi 20480 1 uvc 12288 1 uvcvideo edac_mce_amd 40960 0 snd_hda_codec_generic 139264 2 snd_hda_codec_alc662,snd_hda_codec_realtek_lib videobuf2_vmalloc 20480 1 uvcvideo snd_hda_codec_hdmi 65536 1 snd_hda_codec_atihdmi at24 28672 0 videobuf2_memops 16384 1 videobuf2_vmalloc snd_usb_audio 684032 2 snd_hda_intel 73728 4 kvm_amd 253952 0 videobuf2_v4l2 40960 1 uvcvideo videobuf2_common 102400 4 videobuf2_vmalloc,videobuf2_v4l2,uvcvideo,videobuf2_memops snd_usbmidi_lib 57344 1 snd_usb_audio videodev 421888 2 videobuf2_v4l2,uvcvideo snd_hda_codec 233472 6 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_alc662,snd_hda_codec_realtek_lib,snd_hda_codec_atihdmi snd_ump 49152 1 snd_usb_audio vfat 24576 1 eeepc_wmi 12288 0 fat 126976 1 vfat mc 94208 6 videodev,snd_usb_audio,videobuf2_v4l2,uvcvideo,videobuf2_common snd_rawmidi 53248 2 snd_usbmidi_lib,snd_ump kvm 1495040 1 kvm_amd asus_wmi 122880 1 eeepc_wmi snd_hda_core 159744 7 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_alc662,snd_hda_codec,snd_hda_codec_realtek_lib,snd_hda_codec_atihdmi sparse_keymap 12288 1 asus_wmi snd_intel_dspcfg 45056 1 snd_hda_intel platform_profile 20480 1 asus_wmi irqbypass 16384 1 kvm snd_intel_sdw_acpi 16384 1 snd_intel_dspcfg rfkill 45056 4 asus_wmi snd_hwdep 24576 2 snd_usb_audio,snd_hda_codec wmi_bmof 12288 0 mxm_wmi 12288 0 r8169 151552 0 snd_seq 135168 7 snd_seq_dummy pcspkr 12288 0 acpi_cpufreq 32768 0 fam15h_power 20480 0 k10temp 16384 0 snd_seq_device 16384 3 snd_seq,snd_ump,snd_rawmidi realtek 53248 1 snd_pcm 212992 5 snd_hda_codec_hdmi,snd_hda_intel,snd_usb_audio,snd_hda_codec,snd_hda_core i2c_piix4 40960 0 snd_timer 57344 3 snd_seq,snd_hrtimer,snd_pcm i2c_smbus 20480 1 i2c_piix4 snd 155648 28 snd_hda_codec_generic,snd_seq,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_usb_audio,snd_usbmidi_lib,snd_hda_codec,snd_timer,snd_hda_codec_realtek_lib,snd_ump,snd_pcm,snd_rawmidi soundcore 12288 1 snd joydev 36864 0 loop 49152 0 nfnetlink 20480 3 nf_tables zram 73728 1 lz4hc_compress 20480 1 zram lz4_compress 24576 1 zram amdgpu 20701184 13 amdxcp 12288 1 amdgpu gpu_sched 69632 1 amdgpu drm_panel_backlight_quirks 12288 1 amdgpu drm_buddy 32768 1 amdgpu radeon 2457600 0 drm_ttm_helper 16384 3 amdgpu,radeon ttm 135168 3 amdgpu,radeon,drm_ttm_helper video 81920 3 asus_wmi,amdgpu,radeon polyval_clmulni 12288 0 drm_exec 12288 2 amdgpu,radeon ghash_clmulni_intel 12288 0 i2c_algo_bit 20480 2 amdgpu,radeon drm_suballoc_helper 20480 2 amdgpu,radeon drm_display_helper 331776 2 amdgpu,radeon sp5100_tco 20480 0 cec 106496 2 drm_display_helper,amdgpu wmi 32768 4 video,asus_wmi,wmi_bmof,mxm_wmi ntsync 20480 0 i2c_dev 28672 0 fuse 278528 5
Should be fixed with this https://gitlab.freedesktop.org/drm/amd/-/issues/4744 Somehow the bad commit got backported to the 6.17 kernel but the fix is only applied in the 6.19 kernel for now.