Description of problem: After installing from the F26 LiveCD with 4.11 kernel, everything worked fine - system booted to a graphical desktop and was normal. After running dnf update, and receiving the 4.13.4 kernel update, the system hangs on boot (black screen, no mouse/keyboard input) and needs to be manually power cycled. booting with nomodeset as a kernel parameter works. using the 4.11 kernel continues to work fine. Version-Release number of selected component (if applicable): kernel 4.13.4-200.fc26.x86_64 How reproducible: 100% Steps to Reproduce: 1. turn on computer, select fedora 4.13 kernel from GRUB 2. 3. Actual results: almost immediate hard lockup - black screen, no response to keyboard/mouse etc. Expected results: system should boot up as normal Additional info: No debug info attached as no logs are written due to the crash/hang occuring very early in the boot process.
System is a Dell Precision workstation T3500 (intel Xeon W3520), video card is Sapphire Radeon RX560 2GB (Polaris 11 I think) Monitor is connected via HDMI, and is an LG 29UM68-P with 2560x1080 resolution
Booting the kernel with modprobe.blacklist=amdgpu results in the kernel booting and a graphical desktop displayed on the framebuffer driver I guess. 'modprobe amdgpu' from a terminal window then results in amdgpu being loaded and running the display, which is kinda neat, but i digress. There are some ominous looking log messages that I have no context to decipher e.g. [ 152.785718] [drm] BIOS signature incorrect 68 60 [ 152.785725] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000 and [ 153.175522] amdgpu: [powerplay] [AVFS] Something is broken. See log! [ 153.178378] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table! [ 153.180053] amdgpu: [powerplay] VDDCI is larger than max VDDCI in VDDCI Voltage Table! [ 153.180057] amdgpu: [powerplay] VDDCI is larger than max VDDCI in VDDCI Voltage Table! and [ 153.182548] amdgpu: [powerplay] failed to send message 309 ret is 254 [ 153.182566] amdgpu: [powerplay] failed to send pre message 14e ret is 254 but as the display seems functional I don't know if they are relevant. Complete log: [ 152.735657] [drm] amdgpu kernel modesetting enabled. [ 152.759584] AMD IOMMUv2 driver by Joerg Roedel <jroedel> [ 152.759586] AMD IOMMUv2 functionality not available on this system [ 152.769634] CRAT table not found [ 152.769637] Finished initializing topology ret=0 [ 152.769668] kfd kfd: Initialized module [ 152.770494] [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1DA2:0xE348 0xCF). [ 152.770506] [drm] register mmio base: 0xF7DC0000 [ 152.770507] [drm] register mmio size: 262144 [ 152.770519] [drm] probing gen 2 caps for device 8086:340a = 393d02/0 [ 152.770521] [drm] probing mlw for device 8086:340a = 393d02 [ 152.770528] [drm] UVD is enabled in VM mode [ 152.770529] [drm] VCE enabled in VM mode [ 152.785718] [drm] BIOS signature incorrect 68 60 [ 152.785725] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000 [ 152.785747] ATOM BIOS: 113-34830H2-U02 [ 152.785755] [drm] GPU post is not needed [ 152.785900] [drm] vm size is 64 GB, block size is 13-bit [ 152.812952] amdgpu 0000:02:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used) [ 152.812957] amdgpu 0000:02:00.0: GTT: 3072M 0x0000000080000000 - 0x000000013FFFFFFF [ 152.812966] [drm] Detected VRAM RAM=2048M, BAR=256M [ 152.812968] [drm] RAM width 128bits GDDR5 [ 152.813156] [TTM] Zone kernel: Available graphics memory: 6147840 kiB [ 152.813159] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [ 152.813161] [TTM] Initializing pool allocator [ 152.813170] [TTM] Initializing DMA pool allocator [ 152.813212] [drm] amdgpu: 2048M of VRAM memory ready [ 152.813215] [drm] amdgpu: 3072M of GTT memory ready. [ 152.813234] [drm] GART: num cpu pages 786432, num gpu pages 786432 [ 152.814463] [drm] PCIE GART of 3072M enabled (table at 0x0000000000040000). [ 152.814491] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 152.814492] [drm] Driver supports precise vblank timestamp query. [ 152.814525] amdgpu 0000:02:00.0: amdgpu: using MSI. [ 152.814545] [drm] amdgpu: irq initialized. [ 153.012926] amdgpu: [powerplay] amdgpu: powerplay sw initialized [ 153.013233] [drm] AMDGPU Display Connectors [ 153.013235] [drm] Connector 0: [ 153.013236] [drm] DP-1 [ 153.013237] [drm] HPD5 [ 153.013238] [drm] DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b [ 153.013239] [drm] Encoders: [ 153.013240] [drm] DFP1: INTERNAL_UNIPHY1 [ 153.013241] [drm] Connector 1: [ 153.013242] [drm] HDMI-A-1 [ 153.013242] [drm] HPD3 [ 153.013244] [drm] DDC: 0x4874 0x4874 0x4875 0x4875 0x4876 0x4876 0x4877 0x4877 [ 153.013244] [drm] Encoders: [ 153.013245] [drm] DFP2: INTERNAL_UNIPHY1 [ 153.013246] [drm] Connector 2: [ 153.013247] [drm] DVI-D-1 [ 153.013247] [drm] HPD4 [ 153.013249] [drm] DDC: 0x4878 0x4878 0x4879 0x4879 0x487a 0x487a 0x487b 0x487b [ 153.013249] [drm] Encoders: [ 153.013250] [drm] DFP3: INTERNAL_UNIPHY [ 153.075091] amdgpu 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000080000008, cpu addr 0xffff90e64152d008 [ 153.075363] amdgpu 0000:02:00.0: fence driver on ring 1 use gpu addr 0x0000000080000018, cpu addr 0xffff90e64152d018 [ 153.075557] amdgpu 0000:02:00.0: fence driver on ring 2 use gpu addr 0x0000000080000028, cpu addr 0xffff90e64152d028 [ 153.075777] amdgpu 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000080000038, cpu addr 0xffff90e64152d038 [ 153.075982] amdgpu 0000:02:00.0: fence driver on ring 4 use gpu addr 0x0000000080000048, cpu addr 0xffff90e64152d048 [ 153.076183] amdgpu 0000:02:00.0: fence driver on ring 5 use gpu addr 0x0000000080000058, cpu addr 0xffff90e64152d058 [ 153.076394] amdgpu 0000:02:00.0: fence driver on ring 6 use gpu addr 0x0000000080000068, cpu addr 0xffff90e64152d068 [ 153.076596] amdgpu 0000:02:00.0: fence driver on ring 7 use gpu addr 0x0000000080000078, cpu addr 0xffff90e64152d078 [ 153.076787] amdgpu 0000:02:00.0: fence driver on ring 8 use gpu addr 0x0000000080000088, cpu addr 0xffff90e64152d088 [ 153.076849] amdgpu 0000:02:00.0: fence driver on ring 9 use gpu addr 0x000000008000009c, cpu addr 0xffff90e64152d09c [ 153.085088] amdgpu 0000:02:00.0: fence driver on ring 10 use gpu addr 0x00000000800000ac, cpu addr 0xffff90e64152d0ac [ 153.085396] amdgpu 0000:02:00.0: fence driver on ring 11 use gpu addr 0x00000000800000bc, cpu addr 0xffff90e64152d0bc [ 153.106522] [drm] Found UVD firmware Version: 1.79 Family ID: 16 [ 153.107258] amdgpu 0000:02:00.0: fence driver on ring 12 use gpu addr 0x000000000082d420, cpu addr 0xffffa3dd8a85a420 [ 153.114177] [drm] Found VCE firmware Version: 52.4 Binary ID: 3 [ 153.114503] amdgpu 0000:02:00.0: fence driver on ring 13 use gpu addr 0x00000000800000dc, cpu addr 0xffff90e64152d0dc [ 153.114573] amdgpu 0000:02:00.0: fence driver on ring 14 use gpu addr 0x00000000800000ec, cpu addr 0xffff90e64152d0ec [ 153.175522] amdgpu: [powerplay] [AVFS] Something is broken. See log! [ 153.178378] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table! [ 153.180053] amdgpu: [powerplay] VDDCI is larger than max VDDCI in VDDCI Voltage Table! [ 153.180057] amdgpu: [powerplay] VDDCI is larger than max VDDCI in VDDCI Voltage Table! [ 153.182548] amdgpu: [powerplay] failed to send message 309 ret is 254 [ 153.182566] amdgpu: [powerplay] failed to send pre message 14e ret is 254 [ 153.194792] [drm] ring test on 0 succeeded in 17 usecs [ 153.195385] [drm] ring test on 9 succeeded in 10 usecs [ 153.195409] [drm] ring test on 1 succeeded in 9 usecs [ 153.195423] [drm] ring test on 2 succeeded in 3 usecs [ 153.195458] [drm] ring test on 3 succeeded in 2 usecs [ 153.195493] [drm] ring test on 4 succeeded in 2 usecs [ 153.195514] [drm] ring test on 5 succeeded in 8 usecs [ 153.195533] [drm] ring test on 6 succeeded in 2 usecs [ 153.195582] [drm] ring test on 7 succeeded in 2 usecs [ 153.195607] [drm] ring test on 8 succeeded in 3 usecs [ 153.195687] [drm] ring test on 10 succeeded in 7 usecs [ 153.195696] [drm] ring test on 11 succeeded in 7 usecs [ 153.222474] [drm] ring test on 12 succeeded in 1 usecs [ 153.222476] [drm] UVD initialized successfully. [ 153.323526] [drm] ring test on 13 succeeded in 5 usecs [ 153.323540] [drm] ring test on 14 succeeded in 2 usecs [ 153.323542] [drm] VCE initialized successfully. [ 153.323993] [drm] ib test on ring 0 succeeded [ 153.324221] [drm] ib test on ring 1 succeeded [ 153.324297] [drm] ib test on ring 2 succeeded [ 153.324373] [drm] ib test on ring 3 succeeded [ 153.324449] [drm] ib test on ring 4 succeeded [ 153.324526] [drm] ib test on ring 5 succeeded [ 153.324580] [drm] ib test on ring 6 succeeded [ 153.324633] [drm] ib test on ring 7 succeeded [ 153.324745] [drm] ib test on ring 8 succeeded [ 153.826804] [drm] ib test on ring 9 succeeded [ 153.826892] [drm] ib test on ring 10 succeeded [ 153.826933] [drm] ib test on ring 11 succeeded [ 153.828267] [drm] ib test on ring 12 succeeded [ 153.828545] [drm] ib test on ring 13 succeeded [ 153.944300] [drm] fb mappable at 0xE0A37000 [ 153.944303] [drm] vram apper at 0xE0000000 [ 153.944305] [drm] size 11059200 [ 153.944307] [drm] fb depth is 24 [ 153.944308] [drm] pitch is 10240 [ 153.944623] fbcon: amdgpudrmfb (fb0) is primary device [ 153.944916] Console: switching to colour frame buffer device 320x67 [ 153.944924] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device [ 153.949159] [drm] Initialized amdgpu 3.18.0 20150101 for 0000:02:00.0 on minor 0
Moving to have the graphics team take a look
It looks to me like there is a problem loading firmware or otherwise initialising the card from the initrd, which is not evident when modprobing the driver from the main system. The amdgpu module can be loaded and run the display without a problem if it is modprobed after the kernel has booted (I do this my passing modprobe.blacklist=amdgpu to the kernel, and modprobing amdgpu after boot. Note this discrepancy: lsinitrd /mnt/initrd_debug/initramfs-4.13.4-200.fc26.x86_64.img |grep polaris11 -rw-r--r-- 2 root root 8832 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_ce.bin -rw-r--r-- 1 root root 130228 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_k_smc.bin -rw-r--r-- 1 root root 32724 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_mc.bin -rw-r--r-- 1 root root 17024 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_me.bin -rw-r--r-- 2 root root 262784 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_mec2.bin -rw-r--r-- 2 root root 0 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_mec.bin -rw-r--r-- 1 root root 17024 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_pfp.bin -rw-r--r-- 1 root root 23184 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_rlc.bin -rw-r--r-- 2 root root 12692 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_sdma1.bin -rw-r--r-- 2 root root 12692 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_sdma.bin -rw-r--r-- 1 root root 130196 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_smc.bin -rw-r--r-- 1 root root 130196 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_smc_sk.bin -rw-r--r-- 3 root root 0 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_uvd.bin -rw-r--r-- 3 root root 0 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_vce.bin ls /lib/firmware/amdgpu/ -l |grep polaris11 -rw-r--r--. 1 root root 8832 Sep 21 01:05 polaris11_ce.bin -rw-r--r--. 1 root root 130228 Sep 21 01:05 polaris11_k_smc.bin -rw-r--r--. 1 root root 32724 Sep 21 01:05 polaris11_mc.bin -rw-r--r--. 1 root root 17024 Sep 21 01:05 polaris11_me.bin -rw-r--r--. 1 root root 262784 Sep 21 01:05 polaris11_mec2.bin -rw-r--r--. 1 root root 262784 Sep 21 01:05 polaris11_mec.bin -rw-r--r--. 1 root root 17024 Sep 21 01:05 polaris11_pfp.bin -rw-r--r--. 1 root root 23184 Sep 21 01:05 polaris11_rlc.bin -rw-r--r--. 1 root root 12692 Sep 21 01:05 polaris11_sdma1.bin -rw-r--r--. 1 root root 12692 Sep 21 01:05 polaris11_sdma.bin -rw-r--r--. 1 root root 130196 Sep 21 01:05 polaris11_smc.bin -rw-r--r--. 1 root root 130196 Sep 21 01:05 polaris11_smc_sk.bin -rw-r--r--. 1 root root 369696 Sep 21 01:05 polaris11_uvd.bin -rw-r--r--. 1 root root 166816 Sep 21 01:05 polaris11_vce.bin polaris11_mec.bin in the initird firmware image has a zero length, along with the polaris11_uvd.bin and usr/lib/firmware/amdgpu/polaris11_vce.bin. I think this may be the problem?
Let move the comment to report #1499570 as the bug is identical using the same card i.e. RX 560 in this case. *** This bug has been marked as a duplicate of bug 1499570 ***
Reopening to facilitate the report
*** Bug 1499570 has been marked as a duplicate of this bug. ***
OK, I went way down the rabbit hole on this one, and this is possibly a very serious issue. It would be great if a maintainer could look at this and let me know if I am way off base here, or if this is potentially going to break a *lot* of stuff? It looks very much to me like there is a problem with either dracut, cpio, or the way dracut chains multiple cpio archives together. What seems to be happening is that the firmware files, and possibly random files (I note a number of zero length files in the initrd images produces by dracut on my system - the firmware is just the thing that most obviously breaks boot) are being corrupted. I tried a bunch of things, different compression settings etc., but adding the --nohardlink option when creating an initrd with dracut is the only thing I have found that gives me correct output here. Note i modifed the dracut shell script to remove the --quiet option which is why it displays block counts. e.g. [root@dogtooth /]# dracut --force --nohardlink 115270 blocks [root@dogtooth /]# lsinitrd /boot/initramfs-4.13.4-200.fc26.x86_64.img |grep polaris11 -rw-r--r-- 1 root root 8832 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_ce.bin -rw-r--r-- 1 root root 130228 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_k_smc.bin -rw-r--r-- 1 root root 32724 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mc.bin -rw-r--r-- 1 root root 17024 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_me.bin -rw-r--r-- 1 root root 262784 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec2.bin -rw-r--r-- 1 root root 262784 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec.bin -rw-r--r-- 1 root root 17024 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_pfp.bin -rw-r--r-- 1 root root 23184 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_rlc.bin -rw-r--r-- 1 root root 12692 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_sdma1.bin -rw-r--r-- 1 root root 12692 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_sdma.bin -rw-r--r-- 1 root root 130196 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_smc.bin -rw-r--r-- 1 root root 130196 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_smc_sk.bin -rw-r--r-- 1 root root 369696 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_uvd.bin -rw-r--r-- 1 root root 166816 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_vce.bin vs [root@dogtooth /]# dracut --force 105655 blocks [root@dogtooth /]# lsinitrd /boot/initramfs-4.13.4-200.fc26.x86_64.img |grep polaris11 -rw-r--r-- 2 root root 8832 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_ce.bin -rw-r--r-- 1 root root 130228 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_k_smc.bin -rw-r--r-- 1 root root 32724 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mc.bin -rw-r--r-- 1 root root 17024 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_me.bin -rw-r--r-- 2 root root 262784 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec2.bin -rw-r--r-- 2 root root 0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec.bin -rw-r--r-- 1 root root 17024 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_pfp.bin -rw-r--r-- 1 root root 23184 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_rlc.bin -rw-r--r-- 2 root root 12692 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_sdma1.bin -rw-r--r-- 2 root root 12692 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_sdma.bin -rw-r--r-- 1 root root 130196 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_smc.bin -rw-r--r-- 1 root root 130196 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_smc_sk.bin -rw-r--r-- 3 root root 0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_uvd.bin -rw-r--r-- 3 root root 0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_vce.bin
If I see correctly, the difference seems to be 0 file sizes for -rw-r--r-- 2 root root 0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec.bin -rw-r--r-- 3 root root 0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_uvd.bin -rw-r--r-- 3 root root 0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_vce.bin in the second output. Reassigning to dracut. Harald, can you tell us whether this is a dracut problem or somewhere else? Thanks.
could you please attach the initramfs image for further inspection?
$ cat ~/ttt.img | cpio --extract --verbose --quiet --list drwxrwxr-x 1 harald harald 0 Oct 12 14:33 . -rwxr-xr-x 2 harald harald 0 Oct 12 14:32 true -rwxr-xr-x 2 harald harald 32536 Oct 12 14:32 true2 $ cat ~/ttt.img | cpio --extract --verbose . true cpio: true2 linked to true true2 65 blocks $ ls -l total 64 -rwxr-xr-x 2 harald harald 32536 12. Okt 14:43 true -rwxr-xr-x 2 harald harald 32536 12. Okt 14:43 true2
To see if the kernel unpacks the cpio correctly boot with "rd.break=cmdline" on the kernel command line and then check $ ls -l /usr/lib/firmware/amdgpu
(In reply to Harald Hoyer from comment #10) > could you please attach the initramfs image for further inspection? not needed anymore
Booting with rd.break=cmdline and listing those files shows they are unpacked with the correct length, so possibly this is a red herring. These files are not hard links, at least according to stats inode reporting, though they do have the same name and md5sum as other files in the directory.
Further looking at the boot message, I found these reports: [ 2.023970] [drm] amdgpu kernel modesetting enabled. [ 2.025696] [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1DA2:0xE348 0xCF). [ 2.025707] [drm] register mmio base: 0xFBF80000 [ 2.025707] [drm] register mmio size: 262144 [ 2.025722] [drm] probing gen 2 caps for device 10de:778 = 313d02/0 [ 2.025727] [drm] probing mlw for device 10de:778 = 313d02 [ 2.025732] [drm] UVD is enabled in VM mode [ 2.025733] [drm] VCE enabled in VM mode [ 2.025975] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff [ 2.026613] ATOM BIOS: 113-13483HM-U01 [ 2.026624] [drm] GPU post is not needed [ 2.026644] [drm] vm size is 64 GB, block size is 13-bit, fragment size is 4-bit [ 2.026695] amdgpu 0000:02:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used) [ 2.026696] amdgpu 0000:02:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF [ 2.026699] [drm] Detected VRAM RAM=4096M, BAR=256M [ 2.026700] [drm] RAM width 128bits GDDR5 [ 2.026782] [TTM] Zone kernel: Available graphics memory: 4086272 kiB [ 2.026782] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [ 2.026783] [TTM] Initializing pool allocator [ 2.026786] [TTM] Initializing DMA pool allocator [ 2.026809] [drm] amdgpu: 4096M of VRAM memory ready [ 2.026810] [drm] amdgpu: 4096M of GTT memory ready. [ 2.026818] [drm] GART: num cpu pages 65536, num gpu pages 65536 [ 2.026865] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000). [ 2.026937] amdgpu 0000:02:00.0: amdgpu: using MSI. [ 2.026954] [drm] amdgpu: irq initialized. [ 2.026971] amdgpu: [powerplay] amdgpu: powerplay sw initialized [ 2.026987] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_pfp_2.bin failed with error -2 [ 2.027018] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_me_2.bin failed with error -2 [ 2.027045] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_ce_2.bin failed with error -2 [ 2.027094] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_mec_2.bin failed with error -2 [ 2.027237] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_mec2_2.bin failed with error -2 [ 2.027439] amdgpu 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0xffffbd9241413040 [ 2.027520] amdgpu 0000:02:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0xffffbd92414130c0 [ 2.027571] amdgpu 0000:02:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0xffffbd9241413140 [ 2.027635] amdgpu 0000:02:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0xffffbd92414131c0 [ 2.027687] amdgpu 0000:02:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0xffffbd9241413240 [ 2.027758] amdgpu 0000:02:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0xffffbd92414132c0 [ 2.027811] amdgpu 0000:02:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0xffffbd9241413340 [ 2.027867] amdgpu 0000:02:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0xffffbd92414133c0 [ 2.027918] amdgpu 0000:02:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0xffffbd9241413440 [ 2.027944] amdgpu 0000:02:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0xffffbd92414134e0 [ 2.029002] amdgpu 0000:02:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0xffffbd9241413560 [ 2.029069] amdgpu 0000:02:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0xffffbd92414135e0 [ 2.029276] [drm] Found UVD firmware Version: 1.79 Family ID: 16 [ 2.033262] amdgpu 0000:02:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e5420, cpu addr 0xffffbd9241e5a420 [ 2.033399] [drm] Found VCE firmware Version: 52.4 Binary ID: 3 [ 2.033517] amdgpu 0000:02:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0xffffbd92414136e0 [ 2.033559] amdgpu 0000:02:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0xffffbd9241413760 The issue seems suggesting a firmware bug on Polaris11 i.e. RX 560.
I added netconsole to the initrd image, and gathered a log of a failed (black screen, hang) 'modprobe amdgpu' from the initrd commandline. No obvious error stands out but the hang would appear to occur immediately before the: amdgpu: [powerplay] amdgpu: powerplay sw initialized line that is output when the driver is successfully modprobed outside the initrd environment, which reinforces the idea that there is a problem with the firmware. Possibly this is a clue for someone to look at? [ 753.282494] [drm] amdgpu kernel modesetting enabled. [ 753.285376] AMD IOMMUv2 driver by Joerg Roedel <jroedel> [ 753.285534] AMD IOMMUv2 functionality not available on this system [ 753.294760] CRAT table not found [ 753.294913] Finished initializing topology ret=0 [ 753.295096] kfd kfd: Initialized module [ 753.295847] [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1DA2:0xE348 0xCF). [ 753.296143] [drm] register mmio base: 0xF7DC0000 [ 753.296297] [drm] register mmio size: 262144 [ 753.296460] [drm] probing gen 2 caps for device 8086:340a = 393d02/0 [ 753.296619] [drm] probing mlw for device 8086:340a = 393d02 [ 753.296782] [drm] UVD is enabled in VM mode [ 753.296934] [drm] VCE enabled in VM mode [ 753.325614] [drm] BIOS signature incorrect 73 7 [ 753.325772] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000 [ 753.326054] ATOM BIOS: 113-34830H2-U02 [ 753.326220] [drm] GPU post is not needed [ 753.326511] [drm] vm size is 64 GB, block size is 13-bit [ 753.326703] amdgpu 0000:02:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used) [ 753.326968] amdgpu 0000:02:00.0: GTT: 3072M 0x0000000080000000 - 0x000000013FFFFFFF [ 753.327239] [drm] Detected VRAM RAM=2048M, BAR=256M [ 753.327394] [drm] RAM width 128bits GDDR5 [ 753.327653] [TTM] Zone kernel: Available graphics memory: 6147830 kiB [ 753.327812] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [ 753.327969] [TTM] Initializing pool allocator [ 753.328137] [TTM] Initializing DMA pool allocator [ 753.328311] [drm] amdgpu: 2048M of VRAM memory ready [ 753.328466] [drm] amdgpu: 3072M of GTT memory ready. [ 753.328628] [drm] GART: num cpu pages 786432, num gpu pages 786432 [ 753.329981] [drm] PCIE GART of 3072M enabled (table at 0x0000000000040000). [ 753.330143] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 753.330277] [drm] Driver supports precise vblank timestamp query. [ 753.330439] amdgpu 0000:02:00.0: amdgpu: using MSI. [ 753.330588] [drm] amdgpu: irq initialized.
Heres another interesting (if a little baffling) data point. I get exactly the same 'black screen, immediate hang' when modprobing amdgpu after boot with exactly the same log as above if the card is attached to my ultrawide monitor in 'PBP' mode e.g. with a 1280x1080 resolution. If amdgpu is modprobed when it is not in 'PBP' mode e.g. with a 2560x1080 display mode after boot (initialised to a 16:9 resolution (1920x1080?) by the framebuffer driver - it does not give a full-width image on 2560x1080), no hang, amdgpu comes up as normal. So - to summarise my symptoms amdgpu loading from initrd 1280x1080: FAIL HANG amdgpu loading from initrd 2560x1080: FAIL HANG amdgpu modprobed after boot 1280x1080: FAIL HANG amdgpu modprobed after boot 2560x1080: WORKS FINE
Switching back to AMD drivers. Pete, could you perhaps try Fedora 27 Beta?
Exact same issue here, using Fedora 27 pre-release (upgraded from Fedora 26). I can only boot with the "rescue" option in GRUB, which uses kernel 4.11. System: Dell Precision Tower 7810 Xeon E5 v3/Core i7, NVIDIA GF119. The system also won't boot if I try to use the proprietary drivers, fwiw.
I have the same problem (Polaris RX480). Adding modprobe.blacklist=amdgpu let me boot properly. Fedora 27 pre-release.
(In reply to Andrea Mastellone from comment #20) > I have the same problem (Polaris RX480). Adding modprobe.blacklist=amdgpu > let me boot properly. Fedora 27 pre-release. Doing modprobe amdgpu from a text console after the kernel has booted hangs the system :( Windows 10 on the same host works flamelessy :/
(In reply to Kamil Páral from comment #18) > Switching back to AMD drivers. > > Pete, could you perhaps try Fedora 27 Beta? In Fedora 27 pre-release it is the same situation (Polaris 10, RX480 card).
(In reply to Kamil Páral from comment #18) > Switching back to AMD drivers. > > Pete, could you perhaps try Fedora 27 Beta? I could, if you could you point to some update, patch or resolved issue around amdgpu, the initrd module loading or the kernel that might indicate there has been some change made in Fedora 27 beta that would fix the problem?
Machine still hangs on boot with latest F26 kernel (4.13.10)
(In reply to pete.marchingcubes from comment #24) > Machine still hangs on boot with latest F26 kernel (4.13.10) Same here.
Both reporter and I have same graphic cards with identical issue along Polaris owners and some Nvidia cards users as well. Interesting enough, a built kernel from COPR repository (https://copr.fedorainfracloud.org/coprs/mystro256/amd-staging-kernel/) based from one of AMD developers patches (https://cgit.freedesktop.org/~agd5f/linux/?h=amd-mainline-hybrid-4.11) yet to be upstream ran fines suggesting a driver issue
(In reply to Luya Tshimbalanga from comment #26) > Both reporter and I have same graphic cards with identical issue along > Polaris owners and some Nvidia cards users as well. > Interesting enough, a built kernel from COPR repository > (https://copr.fedorainfracloud.org/coprs/mystro256/amd-staging-kernel/) > based from one of AMD developers patches > (https://cgit.freedesktop.org/~agd5f/linux/?h=amd-mainline-hybrid-4.11) yet > to be upstream ran fines suggesting a driver issue Interesting, but it is not available to F27...
(In reply to Andrea Mastellone from comment #27) > Interesting, but it is not available to F27... I contacted the maintainer who will release new version by next week due to busy schedule. You can install kernel from F26 version just fine.
(In reply to Luya Tshimbalanga from comment #28) > (In reply to Andrea Mastellone from comment #27) > > > Interesting, but it is not available to F27... > > I contacted the maintainer who will release new version by next week due to > busy schedule. You can install kernel from F26 version just fine. Unfortunately that kernel does not work on my PC ! :(
A colleague of mine (fzatlouk) tested F27 Workstation Live with his Radeon 480, and had no problems with it. So this might affect just certain cards.
RX 480 is Polaris 10 while RX 560 is Polaris 11(In reply to Andrea Mastellone from comment #29) > Unfortunately that kernel does not work on my PC ! :( I just found this COPR repository: https://copr.fedorainfracloud.org/coprs/nadmartin/mesa/ Could you try to see if that works for you? (In reply to Kamil Páral from comment #30) > A colleague of mine (fzatlouk) tested F27 Workstation Live with his Radeon > 480, and had no problems with it. So this might affect just certain cards. RX 480 is Polaris 10 while RX 560 is Polaris 11.
(In reply to Luya Tshimbalanga from comment #31) > RX 480 is Polaris 10 while RX 560 is Polaris 11(In reply to Andrea > Mastellone from comment #29) > > Unfortunately that kernel does not work on my PC ! :( > > I just found this COPR repository: > https://copr.fedorainfracloud.org/coprs/nadmartin/mesa/ > > Could you try to see if that works for you? > Ok, thank you for the kind suggest. What packages should I install? There are various ones, besides the kernel itself.
(In reply to Luya Tshimbalanga from comment #31) > RX 480 is Polaris 10 while RX 560 is Polaris 11(In reply to Andrea > Mastellone from comment #29) > > Unfortunately that kernel does not work on my PC ! :( > > I just found this COPR repository: > https://copr.fedorainfracloud.org/coprs/nadmartin/mesa/ > > Could you try to see if that works for you? > YES! It worked!!! Thank you very much. The boot was slow (usual messages: [ 75.537069] amdgpu: [powerplay] failed to send message 146 ret is 0 [ 76.347047] amdgpu: [powerplay] failed to send pre message 145 ret is 0 [ 76.756723] amdgpu: [powerplay] failed to send message 145 ret is 0 [ 77.575421] amdgpu: [powerplay] failed to send pre message 146 ret is 0 [ 77.985350] amdgpu: [powerplay] failed to send message 146 ret is 0 [ 78.795324] amdgpu: [powerplay] failed to send pre message 145 ret is 0 [ 79.200189] amdgpu: [powerplay] failed to send message 145 ret is 0 [ 80.009924] amdgpu: [powerplay] failed to send pre message 146 ret is 0 [ 80.415060] amdgpu: [powerplay] failed to send message 146 ret is 0 [ 81.225029] amdgpu: [powerplay] failed to send pre message 145 ret is 0 [ 81.629890] amdgpu: [powerplay] failed to send message 145 ret is 0 and much more like these), but finally the system came up.
I just updated to 4.14.0-1.fc27.x86_64. The issue is no longer present. glxinfo | grep OpenGL OpenGL vendor string: X.Org OpenGL renderer string: AMD POLARIS11 (DRM 3.19.0 / 4.14.0-1.fc27.x86_64, LLVM 4.0.1) OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.2.4 OpenGL core profile shading language version string: 4.50 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 17.2.4 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.1 Mesa 17.2.4 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10 OpenGL ES profile extensions: Does it work for the reporter?
It doesn't look like kernel 4.14 is in Fedora 26, so no idea. I have pinned my kernel to 4.11 to keep the machine booting. I will update this bug if/when I move to Fedora 27.
Still hangs on boot with latest F26 kernel (4.13.12).
Kernel 4.14. will come in 2 release i.e. 4.14.2 for F26.
fedora 27 same system hang after select grub menu. Bug starts from one of 4.13.x kernel after update. AMD REDWOOD XT (radeon hd 5670 MSI). f27 kernels 4.13 and same for 4.14.3. one tome of 4-5 resets it boots normal. sometime can boot up 2 times normal. before fc27 4.14 kernel i was installed elrepo 4.14.x kernels and always was booted up normal.
Seems to boot up OK now with 4.14.5, though display turns off and enters powersave mode immediately after kernel starts. When sddm starts the screen comes back to life. So mostly this is fixed.
Shall we close this bug reports as the reporter mentioned the issue is resolved? I can also confirm the fix having an identical card.
I am now using kernel 4.15 and it has a better support than 4.14. The bug can be considered closed since the kernel 4.13 is no longer used, but it is not surely fixed.
Closing per comment 40.