1499580 – 4.13 kernel hangs on boot when attempting to modeset with AMD RX560

Bug 1499580 - 4.13 kernel hangs on boot when attempting to modeset with AMD RX560

Summary: 4.13 kernel hangs on boot when attempting to modeset with AMD RX560

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	xorg-x11-drv-ati
Sub Component:
Version:	26
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	X/OpenGL Maintenance List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1499570 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-08 22:54 UTC by pete.marchingcubes
Modified:	2018-01-15 10:05 UTC (History)
CC List:	41 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-01-15 10:05:50 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description pete.marchingcubes 2017-10-08 22:54:57 UTC

Description of problem:

After installing from the F26 LiveCD with 4.11 kernel, everything worked fine - system booted to a graphical desktop and was normal.

After running dnf update, and receiving the 4.13.4 kernel update, the system hangs on boot (black screen, no mouse/keyboard input) and needs to be manually power cycled.

booting with nomodeset as a kernel parameter works.

using the 4.11 kernel continues to work fine.


Version-Release number of selected component (if applicable):

kernel 4.13.4-200.fc26.x86_64

How reproducible:

100%

Steps to Reproduce:
1. turn on computer, select fedora 4.13 kernel from GRUB
2. 
3.

Actual results:

almost immediate hard lockup - black screen, no response to keyboard/mouse etc.

Expected results:

system should boot up as normal

Additional info:

No debug info attached as no logs are written due to the crash/hang occuring very early in the boot process.

Comment 1 pete.marchingcubes 2017-10-08 22:58:02 UTC

System is a Dell Precision workstation T3500 (intel Xeon W3520), video card is Sapphire Radeon RX560 2GB (Polaris 11 I think)

Monitor is connected via HDMI, and is an LG 29UM68-P with 2560x1080 resolution

Comment 2 pete.marchingcubes 2017-10-08 23:36:14 UTC

Booting the kernel with modprobe.blacklist=amdgpu results in the kernel booting and a graphical desktop displayed on the framebuffer driver I guess.

'modprobe amdgpu' from a terminal window then results in amdgpu being loaded and running the display, which is kinda neat, but i digress.

There are some ominous looking log messages that I have no context to decipher

e.g.

[  152.785718] [drm] BIOS signature incorrect 68 60
[  152.785725] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000  

and

[  153.175522] amdgpu: [powerplay] [AVFS] Something is broken. See log!
[  153.178378] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!
[  153.180053] amdgpu: [powerplay] VDDCI is larger than max VDDCI in VDDCI Voltage Table!
[  153.180057] amdgpu: [powerplay] VDDCI is larger than max VDDCI in VDDCI Voltage Table!


and

[  153.182548] amdgpu: [powerplay] 
                failed to send message 309 ret is 254 
[  153.182566] amdgpu: [powerplay] 
                failed to send pre message 14e ret is 254 

but as the display seems functional I don't know if they are relevant.

Complete log:


[  152.735657] [drm] amdgpu kernel modesetting enabled.
[  152.759584] AMD IOMMUv2 driver by Joerg Roedel <jroedel>
[  152.759586] AMD IOMMUv2 functionality not available on this system
[  152.769634] CRAT table not found
[  152.769637] Finished initializing topology ret=0
[  152.769668] kfd kfd: Initialized module
[  152.770494] [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1DA2:0xE348 0xCF).
[  152.770506] [drm] register mmio base: 0xF7DC0000
[  152.770507] [drm] register mmio size: 262144
[  152.770519] [drm] probing gen 2 caps for device 8086:340a = 393d02/0
[  152.770521] [drm] probing mlw for device 8086:340a = 393d02
[  152.770528] [drm] UVD is enabled in VM mode
[  152.770529] [drm] VCE enabled in VM mode
[  152.785718] [drm] BIOS signature incorrect 68 60
[  152.785725] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000                                                      
[  152.785747] ATOM BIOS: 113-34830H2-U02
[  152.785755] [drm] GPU post is not needed
[  152.785900] [drm] vm size is 64 GB, block size is 13-bit
[  152.812952] amdgpu 0000:02:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
[  152.812957] amdgpu 0000:02:00.0: GTT: 3072M 0x0000000080000000 - 0x000000013FFFFFFF
[  152.812966] [drm] Detected VRAM RAM=2048M, BAR=256M
[  152.812968] [drm] RAM width 128bits GDDR5
[  152.813156] [TTM] Zone  kernel: Available graphics memory: 6147840 kiB
[  152.813159] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[  152.813161] [TTM] Initializing pool allocator
[  152.813170] [TTM] Initializing DMA pool allocator
[  152.813212] [drm] amdgpu: 2048M of VRAM memory ready
[  152.813215] [drm] amdgpu: 3072M of GTT memory ready.
[  152.813234] [drm] GART: num cpu pages 786432, num gpu pages 786432
[  152.814463] [drm] PCIE GART of 3072M enabled (table at 0x0000000000040000).
[  152.814491] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[  152.814492] [drm] Driver supports precise vblank timestamp query.
[  152.814525] amdgpu 0000:02:00.0: amdgpu: using MSI.
[  152.814545] [drm] amdgpu: irq initialized.
[  153.012926] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[  153.013233] [drm] AMDGPU Display Connectors
[  153.013235] [drm] Connector 0:
[  153.013236] [drm]   DP-1
[  153.013237] [drm]   HPD5
[  153.013238] [drm]   DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b
[  153.013239] [drm]   Encoders:
[  153.013240] [drm]     DFP1: INTERNAL_UNIPHY1
[  153.013241] [drm] Connector 1:
[  153.013242] [drm]   HDMI-A-1
[  153.013242] [drm]   HPD3
[  153.013244] [drm]   DDC: 0x4874 0x4874 0x4875 0x4875 0x4876 0x4876 0x4877 0x4877
[  153.013244] [drm]   Encoders:
[  153.013245] [drm]     DFP2: INTERNAL_UNIPHY1
[  153.013246] [drm] Connector 2:
[  153.013247] [drm]   DVI-D-1
[  153.013247] [drm]   HPD4
[  153.013249] [drm]   DDC: 0x4878 0x4878 0x4879 0x4879 0x487a 0x487a 0x487b 0x487b
[  153.013249] [drm]   Encoders:
[  153.013250] [drm]     DFP3: INTERNAL_UNIPHY
[  153.075091] amdgpu 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000080000008, cpu addr 0xffff90e64152d008
[  153.075363] amdgpu 0000:02:00.0: fence driver on ring 1 use gpu addr 0x0000000080000018, cpu addr 0xffff90e64152d018
[  153.075557] amdgpu 0000:02:00.0: fence driver on ring 2 use gpu addr 0x0000000080000028, cpu addr 0xffff90e64152d028
[  153.075777] amdgpu 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000080000038, cpu addr 0xffff90e64152d038
[  153.075982] amdgpu 0000:02:00.0: fence driver on ring 4 use gpu addr 0x0000000080000048, cpu addr 0xffff90e64152d048
[  153.076183] amdgpu 0000:02:00.0: fence driver on ring 5 use gpu addr 0x0000000080000058, cpu addr 0xffff90e64152d058
[  153.076394] amdgpu 0000:02:00.0: fence driver on ring 6 use gpu addr 0x0000000080000068, cpu addr 0xffff90e64152d068
[  153.076596] amdgpu 0000:02:00.0: fence driver on ring 7 use gpu addr 0x0000000080000078, cpu addr 0xffff90e64152d078
[  153.076787] amdgpu 0000:02:00.0: fence driver on ring 8 use gpu addr 0x0000000080000088, cpu addr 0xffff90e64152d088
[  153.076849] amdgpu 0000:02:00.0: fence driver on ring 9 use gpu addr 0x000000008000009c, cpu addr 0xffff90e64152d09c
[  153.085088] amdgpu 0000:02:00.0: fence driver on ring 10 use gpu addr 0x00000000800000ac, cpu addr 0xffff90e64152d0ac
[  153.085396] amdgpu 0000:02:00.0: fence driver on ring 11 use gpu addr 0x00000000800000bc, cpu addr 0xffff90e64152d0bc
[  153.106522] [drm] Found UVD firmware Version: 1.79 Family ID: 16
[  153.107258] amdgpu 0000:02:00.0: fence driver on ring 12 use gpu addr 0x000000000082d420, cpu addr 0xffffa3dd8a85a420
[  153.114177] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[  153.114503] amdgpu 0000:02:00.0: fence driver on ring 13 use gpu addr 0x00000000800000dc, cpu addr 0xffff90e64152d0dc
[  153.114573] amdgpu 0000:02:00.0: fence driver on ring 14 use gpu addr 0x00000000800000ec, cpu addr 0xffff90e64152d0ec
[  153.175522] amdgpu: [powerplay] [AVFS] Something is broken. See log!
[  153.178378] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!
[  153.180053] amdgpu: [powerplay] VDDCI is larger than max VDDCI in VDDCI Voltage Table!
[  153.180057] amdgpu: [powerplay] VDDCI is larger than max VDDCI in VDDCI Voltage Table!
[  153.182548] amdgpu: [powerplay] 
                failed to send message 309 ret is 254 
[  153.182566] amdgpu: [powerplay] 
                failed to send pre message 14e ret is 254 
[  153.194792] [drm] ring test on 0 succeeded in 17 usecs
[  153.195385] [drm] ring test on 9 succeeded in 10 usecs
[  153.195409] [drm] ring test on 1 succeeded in 9 usecs
[  153.195423] [drm] ring test on 2 succeeded in 3 usecs
[  153.195458] [drm] ring test on 3 succeeded in 2 usecs
[  153.195493] [drm] ring test on 4 succeeded in 2 usecs
[  153.195514] [drm] ring test on 5 succeeded in 8 usecs
[  153.195533] [drm] ring test on 6 succeeded in 2 usecs
[  153.195582] [drm] ring test on 7 succeeded in 2 usecs
[  153.195607] [drm] ring test on 8 succeeded in 3 usecs
[  153.195687] [drm] ring test on 10 succeeded in 7 usecs
[  153.195696] [drm] ring test on 11 succeeded in 7 usecs
[  153.222474] [drm] ring test on 12 succeeded in 1 usecs
[  153.222476] [drm] UVD initialized successfully.
[  153.323526] [drm] ring test on 13 succeeded in 5 usecs
[  153.323540] [drm] ring test on 14 succeeded in 2 usecs
[  153.323542] [drm] VCE initialized successfully.
[  153.323993] [drm] ib test on ring 0 succeeded
[  153.324221] [drm] ib test on ring 1 succeeded
[  153.324297] [drm] ib test on ring 2 succeeded
[  153.324373] [drm] ib test on ring 3 succeeded
[  153.324449] [drm] ib test on ring 4 succeeded
[  153.324526] [drm] ib test on ring 5 succeeded
[  153.324580] [drm] ib test on ring 6 succeeded
[  153.324633] [drm] ib test on ring 7 succeeded
[  153.324745] [drm] ib test on ring 8 succeeded
[  153.826804] [drm] ib test on ring 9 succeeded
[  153.826892] [drm] ib test on ring 10 succeeded
[  153.826933] [drm] ib test on ring 11 succeeded
[  153.828267] [drm] ib test on ring 12 succeeded
[  153.828545] [drm] ib test on ring 13 succeeded
[  153.944300] [drm] fb mappable at 0xE0A37000
[  153.944303] [drm] vram apper at 0xE0000000
[  153.944305] [drm] size 11059200
[  153.944307] [drm] fb depth is 24
[  153.944308] [drm]    pitch is 10240
[  153.944623] fbcon: amdgpudrmfb (fb0) is primary device
[  153.944916] Console: switching to colour frame buffer device 320x67
[  153.944924] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device
[  153.949159] [drm] Initialized amdgpu 3.18.0 20150101 for 0000:02:00.0 on minor 0

Comment 3 Laura Abbott 2017-10-09 14:51:46 UTC

Moving to have the graphics team take a look

Comment 4 pete.marchingcubes 2017-10-10 19:36:26 UTC

It looks to me like there is a problem loading firmware or otherwise initialising the card from the initrd, which is not evident when modprobing the driver from the main system.

The amdgpu module can be loaded and run the display without a problem if it is modprobed after the kernel has booted (I do this my passing modprobe.blacklist=amdgpu to the kernel, and modprobing amdgpu after boot.

Note this discrepancy:

lsinitrd /mnt/initrd_debug/initramfs-4.13.4-200.fc26.x86_64.img |grep polaris11
-rw-r--r--   2 root     root         8832 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_ce.bin
-rw-r--r--   1 root     root       130228 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_k_smc.bin
-rw-r--r--   1 root     root        32724 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_mc.bin
-rw-r--r--   1 root     root        17024 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_me.bin
-rw-r--r--   2 root     root       262784 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_mec2.bin
-rw-r--r--   2 root     root            0 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_mec.bin
-rw-r--r--   1 root     root        17024 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_pfp.bin
-rw-r--r--   1 root     root        23184 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_rlc.bin
-rw-r--r--   2 root     root        12692 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_sdma1.bin
-rw-r--r--   2 root     root        12692 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_sdma.bin
-rw-r--r--   1 root     root       130196 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_smc.bin
-rw-r--r--   1 root     root       130196 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_smc_sk.bin
-rw-r--r--   3 root     root            0 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_uvd.bin
-rw-r--r--   3 root     root            0 Aug 12 00:19 usr/lib/firmware/amdgpu/polaris11_vce.bin

ls /lib/firmware/amdgpu/ -l |grep polaris11
-rw-r--r--. 1 root root   8832 Sep 21 01:05 polaris11_ce.bin
-rw-r--r--. 1 root root 130228 Sep 21 01:05 polaris11_k_smc.bin
-rw-r--r--. 1 root root  32724 Sep 21 01:05 polaris11_mc.bin
-rw-r--r--. 1 root root  17024 Sep 21 01:05 polaris11_me.bin
-rw-r--r--. 1 root root 262784 Sep 21 01:05 polaris11_mec2.bin
-rw-r--r--. 1 root root 262784 Sep 21 01:05 polaris11_mec.bin
-rw-r--r--. 1 root root  17024 Sep 21 01:05 polaris11_pfp.bin
-rw-r--r--. 1 root root  23184 Sep 21 01:05 polaris11_rlc.bin
-rw-r--r--. 1 root root  12692 Sep 21 01:05 polaris11_sdma1.bin
-rw-r--r--. 1 root root  12692 Sep 21 01:05 polaris11_sdma.bin
-rw-r--r--. 1 root root 130196 Sep 21 01:05 polaris11_smc.bin
-rw-r--r--. 1 root root 130196 Sep 21 01:05 polaris11_smc_sk.bin
-rw-r--r--. 1 root root 369696 Sep 21 01:05 polaris11_uvd.bin
-rw-r--r--. 1 root root 166816 Sep 21 01:05 polaris11_vce.bin


polaris11_mec.bin in the initird firmware image has a zero length, along with the polaris11_uvd.bin and usr/lib/firmware/amdgpu/polaris11_vce.bin. I think this may be the problem?

Comment 5 Luya Tshimbalanga 2017-10-10 22:35:26 UTC

Let move the comment to report #1499570 as the bug is identical using the same card i.e. RX 560 in this case.

*** This bug has been marked as a duplicate of bug 1499570 ***

Comment 6 Luya Tshimbalanga 2017-10-11 01:15:29 UTC

Reopening to facilitate the report

Comment 7 Luya Tshimbalanga 2017-10-11 01:16:59 UTC

*** Bug 1499570 has been marked as a duplicate of this bug. ***

Comment 8 pete.marchingcubes 2017-10-11 21:53:15 UTC

OK, I went way down the rabbit hole on this one, and this is possibly a very serious issue. It would be great if a maintainer could look at this and let me know if I am way off base here, or if this is potentially going to break a *lot* of stuff?

It looks very much to me like there is a problem with either dracut, cpio, or the way dracut chains multiple cpio archives together.

What seems to be happening is that the firmware files, and possibly random files (I note a number of zero length files in the initrd images produces by dracut on my system - the firmware is just the thing that most obviously breaks boot) are being corrupted.

I tried a bunch of things, different compression settings etc., but adding the --nohardlink option when creating an initrd with dracut is the only thing I have found that gives me correct output here.

Note i modifed the dracut shell script to remove the --quiet option which is why it displays block counts.

e.g.

[root@dogtooth /]# dracut --force --nohardlink 
115270 blocks
[root@dogtooth /]# lsinitrd /boot/initramfs-4.13.4-200.fc26.x86_64.img |grep polaris11
-rw-r--r--   1 root     root         8832 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_ce.bin
-rw-r--r--   1 root     root       130228 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_k_smc.bin
-rw-r--r--   1 root     root        32724 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mc.bin
-rw-r--r--   1 root     root        17024 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_me.bin
-rw-r--r--   1 root     root       262784 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec2.bin
-rw-r--r--   1 root     root       262784 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec.bin
-rw-r--r--   1 root     root        17024 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_pfp.bin
-rw-r--r--   1 root     root        23184 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_rlc.bin
-rw-r--r--   1 root     root        12692 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_sdma1.bin
-rw-r--r--   1 root     root        12692 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_sdma.bin
-rw-r--r--   1 root     root       130196 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_smc.bin
-rw-r--r--   1 root     root       130196 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_smc_sk.bin
-rw-r--r--   1 root     root       369696 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_uvd.bin
-rw-r--r--   1 root     root       166816 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_vce.bin


vs

[root@dogtooth /]# dracut --force
105655 blocks
[root@dogtooth /]# lsinitrd /boot/initramfs-4.13.4-200.fc26.x86_64.img |grep polaris11
-rw-r--r--   2 root     root         8832 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_ce.bin
-rw-r--r--   1 root     root       130228 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_k_smc.bin
-rw-r--r--   1 root     root        32724 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mc.bin
-rw-r--r--   1 root     root        17024 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_me.bin
-rw-r--r--   2 root     root       262784 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec2.bin
-rw-r--r--   2 root     root            0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec.bin
-rw-r--r--   1 root     root        17024 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_pfp.bin
-rw-r--r--   1 root     root        23184 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_rlc.bin
-rw-r--r--   2 root     root        12692 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_sdma1.bin
-rw-r--r--   2 root     root        12692 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_sdma.bin
-rw-r--r--   1 root     root       130196 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_smc.bin
-rw-r--r--   1 root     root       130196 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_smc_sk.bin
-rw-r--r--   3 root     root            0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_uvd.bin
-rw-r--r--   3 root     root            0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_vce.bin

Comment 9 Kamil Páral 2017-10-12 10:59:11 UTC

If I see correctly, the difference seems to be 0 file sizes for
-rw-r--r--   2 root     root            0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_mec.bin
-rw-r--r--   3 root     root            0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_uvd.bin
-rw-r--r--   3 root     root            0 Sep 21 01:05 usr/lib/firmware/amdgpu/polaris11_vce.bin

in the second output.

Reassigning to dracut. Harald, can you tell us whether this is a dracut problem or somewhere else? Thanks.

Comment 10 Harald Hoyer 2017-10-12 12:31:07 UTC

could you please attach the initramfs image for further inspection?

Comment 11 Harald Hoyer 2017-10-12 12:44:41 UTC

$ cat  ~/ttt.img | cpio --extract --verbose --quiet --list
drwxrwxr-x   1 harald   harald          0 Oct 12 14:33 .
-rwxr-xr-x   2 harald   harald          0 Oct 12 14:32 true
-rwxr-xr-x   2 harald   harald      32536 Oct 12 14:32 true2

$ cat  ~/ttt.img | cpio --extract --verbose 
.
true
cpio: true2 linked to true
true2
65 blocks

$ ls -l
total 64
-rwxr-xr-x 2 harald harald 32536 12. Okt 14:43 true
-rwxr-xr-x 2 harald harald 32536 12. Okt 14:43 true2

Comment 12 Harald Hoyer 2017-10-12 12:46:22 UTC

To see if the kernel unpacks the cpio correctly boot with "rd.break=cmdline" on the kernel command line and then check

$ ls -l /usr/lib/firmware/amdgpu

Comment 13 Harald Hoyer 2017-10-12 12:49:00 UTC

(In reply to Harald Hoyer from comment #10)
> could you please attach the initramfs image for further inspection?

not needed anymore

Comment 14 pete.marchingcubes 2017-10-12 19:28:32 UTC

Booting with rd.break=cmdline and listing those files shows they are unpacked with the correct length, so possibly this is a red herring.

These files are not hard links, at least according to stats inode reporting, though they do have the same name and md5sum as other files in the directory.

Comment 15 Luya Tshimbalanga 2017-10-14 23:03:36 UTC

Further looking at the boot message, I found these reports:

[    2.023970] [drm] amdgpu kernel modesetting enabled.
[    2.025696] [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1DA2:0xE348 0xCF).
[    2.025707] [drm] register mmio base: 0xFBF80000
[    2.025707] [drm] register mmio size: 262144
[    2.025722] [drm] probing gen 2 caps for device 10de:778 = 313d02/0
[    2.025727] [drm] probing mlw for device 10de:778 = 313d02
[    2.025732] [drm] UVD is enabled in VM mode
[    2.025733] [drm] VCE enabled in VM mode
[    2.025975] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[    2.026613] ATOM BIOS: 113-13483HM-U01
[    2.026624] [drm] GPU post is not needed
[    2.026644] [drm] vm size is 64 GB, block size is 13-bit, fragment size is 4-bit
[    2.026695] amdgpu 0000:02:00.0: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[    2.026696] amdgpu 0000:02:00.0: GTT: 256M 0x0000000000000000 - 0x000000000FFFFFFF
[    2.026699] [drm] Detected VRAM RAM=4096M, BAR=256M
[    2.026700] [drm] RAM width 128bits GDDR5
[    2.026782] [TTM] Zone  kernel: Available graphics memory: 4086272 kiB
[    2.026782] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    2.026783] [TTM] Initializing pool allocator
[    2.026786] [TTM] Initializing DMA pool allocator
[    2.026809] [drm] amdgpu: 4096M of VRAM memory ready
[    2.026810] [drm] amdgpu: 4096M of GTT memory ready.
[    2.026818] [drm] GART: num cpu pages 65536, num gpu pages 65536
[    2.026865] [drm] PCIE GART of 256M enabled (table at 0x000000F400040000).
[    2.026937] amdgpu 0000:02:00.0: amdgpu: using MSI.
[    2.026954] [drm] amdgpu: irq initialized.
[    2.026971] amdgpu: [powerplay] amdgpu: powerplay sw initialized
[    2.026987] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_pfp_2.bin failed with error -2
[    2.027018] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_me_2.bin failed with error -2
[    2.027045] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_ce_2.bin failed with error -2
[    2.027094] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_mec_2.bin failed with error -2
[    2.027237] amdgpu 0000:02:00.0: Direct firmware load for amdgpu/polaris11_mec2_2.bin failed with error -2
[    2.027439] amdgpu 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000000400040, cpu addr 0xffffbd9241413040
[    2.027520] amdgpu 0000:02:00.0: fence driver on ring 1 use gpu addr 0x00000000004000c0, cpu addr 0xffffbd92414130c0
[    2.027571] amdgpu 0000:02:00.0: fence driver on ring 2 use gpu addr 0x0000000000400140, cpu addr 0xffffbd9241413140
[    2.027635] amdgpu 0000:02:00.0: fence driver on ring 3 use gpu addr 0x00000000004001c0, cpu addr 0xffffbd92414131c0
[    2.027687] amdgpu 0000:02:00.0: fence driver on ring 4 use gpu addr 0x0000000000400240, cpu addr 0xffffbd9241413240
[    2.027758] amdgpu 0000:02:00.0: fence driver on ring 5 use gpu addr 0x00000000004002c0, cpu addr 0xffffbd92414132c0
[    2.027811] amdgpu 0000:02:00.0: fence driver on ring 6 use gpu addr 0x0000000000400340, cpu addr 0xffffbd9241413340
[    2.027867] amdgpu 0000:02:00.0: fence driver on ring 7 use gpu addr 0x00000000004003c0, cpu addr 0xffffbd92414133c0
[    2.027918] amdgpu 0000:02:00.0: fence driver on ring 8 use gpu addr 0x0000000000400440, cpu addr 0xffffbd9241413440
[    2.027944] amdgpu 0000:02:00.0: fence driver on ring 9 use gpu addr 0x00000000004004e0, cpu addr 0xffffbd92414134e0
[    2.029002] amdgpu 0000:02:00.0: fence driver on ring 10 use gpu addr 0x0000000000400560, cpu addr 0xffffbd9241413560
[    2.029069] amdgpu 0000:02:00.0: fence driver on ring 11 use gpu addr 0x00000000004005e0, cpu addr 0xffffbd92414135e0
[    2.029276] [drm] Found UVD firmware Version: 1.79 Family ID: 16
[    2.033262] amdgpu 0000:02:00.0: fence driver on ring 12 use gpu addr 0x000000f4001e5420, cpu addr 0xffffbd9241e5a420
[    2.033399] [drm] Found VCE firmware Version: 52.4 Binary ID: 3
[    2.033517] amdgpu 0000:02:00.0: fence driver on ring 13 use gpu addr 0x00000000004006e0, cpu addr 0xffffbd92414136e0
[    2.033559] amdgpu 0000:02:00.0: fence driver on ring 14 use gpu addr 0x0000000000400760, cpu addr 0xffffbd9241413760

The issue seems suggesting a firmware bug on Polaris11 i.e. RX 560.

Comment 16 pete.marchingcubes 2017-10-22 22:04:04 UTC

I added netconsole to the initrd image, and gathered a log of a failed (black screen, hang) 'modprobe amdgpu' from the initrd commandline.

No obvious error stands out but the hang would appear to occur immediately before the:

amdgpu: [powerplay] amdgpu: powerplay sw initialized

line that is output when the driver is successfully modprobed outside the initrd environment, which reinforces the idea that there is a problem with the firmware.

Possibly this is a clue for someone to look at?


[  753.282494] [drm] amdgpu kernel modesetting enabled.                                                                                                                                                                                                                                                                                                         
[  753.285376] AMD IOMMUv2 driver by Joerg Roedel <jroedel>                                                                                                                                                                                                                                                                                             
[  753.285534] AMD IOMMUv2 functionality not available on this system                                                                                                                                                                                                                                                                                           
[  753.294760] CRAT table not found                                                                                                                                                                                                                                                                                                                             
[  753.294913] Finished initializing topology ret=0                                                                                                                                                                                                                                                                                                             
[  753.295096] kfd kfd: Initialized module                                                                                                                                                                                                                                                                                                                      
[  753.295847] [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67FF 0x1DA2:0xE348 0xCF).                                                                                                                                                                                                                                                              
[  753.296143] [drm] register mmio base: 0xF7DC0000                                                                                                                                                                                                                                                                                                             
[  753.296297] [drm] register mmio size: 262144                                                                                                                                                                                                                                                                                                                 
[  753.296460] [drm] probing gen 2 caps for device 8086:340a = 393d02/0                                                                                                                                                                                                                                                                                         
[  753.296619] [drm] probing mlw for device 8086:340a = 393d02                                                                                                                                                                                                                                                                                                  
[  753.296782] [drm] UVD is enabled in VM mode                                                                                                                                                                                                                                                                                                                  
[  753.296934] [drm] VCE enabled in VM mode
[  753.325614] [drm] BIOS signature incorrect 73 7
[  753.325772] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000
[  753.326054] ATOM BIOS: 113-34830H2-U02
[  753.326220] [drm] GPU post is not needed
[  753.326511] [drm] vm size is 64 GB, block size is 13-bit
[  753.326703] amdgpu 0000:02:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
[  753.326968] amdgpu 0000:02:00.0: GTT: 3072M 0x0000000080000000 - 0x000000013FFFFFFF
[  753.327239] [drm] Detected VRAM RAM=2048M, BAR=256M
[  753.327394] [drm] RAM width 128bits GDDR5
[  753.327653] [TTM] Zone  kernel: Available graphics memory: 6147830 kiB
[  753.327812] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[  753.327969] [TTM] Initializing pool allocator
[  753.328137] [TTM] Initializing DMA pool allocator
[  753.328311] [drm] amdgpu: 2048M of VRAM memory ready
[  753.328466] [drm] amdgpu: 3072M of GTT memory ready.
[  753.328628] [drm] GART: num cpu pages 786432, num gpu pages 786432
[  753.329981] [drm] PCIE GART of 3072M enabled (table at 0x0000000000040000).
[  753.330143] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[  753.330277] [drm] Driver supports precise vblank timestamp query.
[  753.330439] amdgpu 0000:02:00.0: amdgpu: using MSI.
[  753.330588] [drm] amdgpu: irq initialized.

Comment 17 pete.marchingcubes 2017-10-22 22:45:25 UTC

Heres another interesting (if a little baffling) data point.

I get exactly the same 'black screen, immediate hang' when modprobing amdgpu after boot with exactly the same log as above if the card is attached to my ultrawide monitor in 'PBP' mode e.g. with a 1280x1080 resolution.

If amdgpu is modprobed when it is not in 'PBP' mode e.g. with a 2560x1080 display mode after boot (initialised to a 16:9 resolution (1920x1080?) by the framebuffer driver - it does not give a full-width image on 2560x1080), no hang, amdgpu comes up as normal.

So - to summarise my symptoms

amdgpu loading from initrd  1280x1080: FAIL HANG
amdgpu loading from initrd  2560x1080: FAIL HANG
amdgpu modprobed after boot 1280x1080: FAIL HANG
amdgpu modprobed after boot 2560x1080: WORKS FINE

Comment 18 Kamil Páral 2017-10-23 13:06:59 UTC

Switching back to AMD drivers.

Pete, could you perhaps try Fedora 27 Beta?

Comment 19 Xan López 2017-10-27 14:04:58 UTC

Exact same issue here, using Fedora 27 pre-release (upgraded from Fedora 26). I can only boot with the "rescue" option in GRUB, which uses kernel 4.11. System: Dell Precision Tower 7810 Xeon E5 v3/Core i7, NVIDIA GF119. The system also won't boot if I try to use the proprietary drivers, fwiw.

Comment 20 Andrea Mastellone 2017-11-01 15:33:30 UTC

I have the same problem (Polaris RX480). Adding modprobe.blacklist=amdgpu let me boot properly. Fedora 27 pre-release.

Comment 21 Andrea Mastellone 2017-11-01 15:38:54 UTC

(In reply to Andrea Mastellone from comment #20)
> I have the same problem (Polaris RX480). Adding modprobe.blacklist=amdgpu
> let me boot properly. Fedora 27 pre-release.

Doing modprobe amdgpu from a text console after the kernel has booted hangs the system :( Windows 10 on the same host works flamelessy :/

Comment 22 Andrea Mastellone 2017-11-03 15:03:04 UTC

(In reply to Kamil Páral from comment #18)
> Switching back to AMD drivers.
> 
> Pete, could you perhaps try Fedora 27 Beta?

In Fedora 27 pre-release it is the same situation (Polaris 10, RX480 card).

Comment 23 pete.marchingcubes 2017-11-03 22:27:52 UTC

(In reply to Kamil Páral from comment #18)
> Switching back to AMD drivers.
> 
> Pete, could you perhaps try Fedora 27 Beta?

I could, if you could you point to some update, patch or resolved issue around amdgpu, the initrd module loading or the kernel that might indicate there has been some change made in Fedora 27 beta that would fix the problem?

Comment 24 pete.marchingcubes 2017-11-03 22:40:30 UTC

Machine still hangs on boot with latest F26 kernel (4.13.10)

Comment 25 Luya Tshimbalanga 2017-11-04 02:41:17 UTC

(In reply to pete.marchingcubes from comment #24)
> Machine still hangs on boot with latest F26 kernel (4.13.10)

Same here.

Comment 26 Luya Tshimbalanga 2017-11-04 04:07:52 UTC

Both reporter and I have same graphic cards with identical issue along Polaris owners and some Nvidia cards users as well. 
Interesting enough, a built kernel from COPR repository (https://copr.fedorainfracloud.org/coprs/mystro256/amd-staging-kernel/) based from one of AMD developers patches (https://cgit.freedesktop.org/~agd5f/linux/?h=amd-mainline-hybrid-4.11) yet to be upstream ran fines suggesting a driver issue

Comment 27 Andrea Mastellone 2017-11-04 15:00:35 UTC

(In reply to Luya Tshimbalanga from comment #26)
> Both reporter and I have same graphic cards with identical issue along
> Polaris owners and some Nvidia cards users as well. 
> Interesting enough, a built kernel from COPR repository
> (https://copr.fedorainfracloud.org/coprs/mystro256/amd-staging-kernel/)
> based from one of AMD developers patches
> (https://cgit.freedesktop.org/~agd5f/linux/?h=amd-mainline-hybrid-4.11) yet
> to be upstream ran fines suggesting a driver issue

Interesting, but it is not available to F27...

Comment 28 Luya Tshimbalanga 2017-11-04 18:42:39 UTC

(In reply to Andrea Mastellone from comment #27)

> Interesting, but it is not available to F27...

I contacted the maintainer who will release new version by next week due to busy schedule. You can install kernel from F26 version just fine.

Comment 29 Andrea Mastellone 2017-11-05 18:35:39 UTC

(In reply to Luya Tshimbalanga from comment #28)
> (In reply to Andrea Mastellone from comment #27)
> 
> > Interesting, but it is not available to F27...
> 
> I contacted the maintainer who will release new version by next week due to
> busy schedule. You can install kernel from F26 version just fine.

Unfortunately that kernel does not work on my PC ! :(

Comment 30 Kamil Páral 2017-11-06 15:13:08 UTC

A colleague of mine (fzatlouk) tested F27 Workstation Live with his Radeon 480, and had no problems with it. So this might affect just certain cards.

Comment 31 Luya Tshimbalanga 2017-11-08 07:31:18 UTC

RX 480 is Polaris 10 while RX 560 is Polaris 11(In reply to Andrea Mastellone from comment #29)
> Unfortunately that kernel does not work on my PC ! :(

I just found this COPR repository:
https://copr.fedorainfracloud.org/coprs/nadmartin/mesa/

Could you try to see if that works for you?



(In reply to Kamil Páral from comment #30)
> A colleague of mine (fzatlouk) tested F27 Workstation Live with his Radeon
> 480, and had no problems with it. So this might affect just certain cards.

RX 480 is Polaris 10 while RX 560 is Polaris 11.

Comment 32 Andrea Mastellone 2017-11-08 20:59:02 UTC

(In reply to Luya Tshimbalanga from comment #31)
> RX 480 is Polaris 10 while RX 560 is Polaris 11(In reply to Andrea
> Mastellone from comment #29)
> > Unfortunately that kernel does not work on my PC ! :(
> 
> I just found this COPR repository:
> https://copr.fedorainfracloud.org/coprs/nadmartin/mesa/
> 
> Could you try to see if that works for you?
> 

Ok, thank you for the kind suggest. What packages should I install? There are various ones, besides the kernel itself.

Comment 33 Andrea Mastellone 2017-11-11 19:21:58 UTC

(In reply to Luya Tshimbalanga from comment #31)
> RX 480 is Polaris 10 while RX 560 is Polaris 11(In reply to Andrea
> Mastellone from comment #29)
> > Unfortunately that kernel does not work on my PC ! :(
> 
> I just found this COPR repository:
> https://copr.fedorainfracloud.org/coprs/nadmartin/mesa/
> 
> Could you try to see if that works for you?
> 

YES! It worked!!! Thank you very much. The boot was slow (usual messages:
[   75.537069] amdgpu: [powerplay] 
                failed to send message 146 ret is 0 
[   76.347047] amdgpu: [powerplay] 
                failed to send pre message 145 ret is 0 
[   76.756723] amdgpu: [powerplay] 
                failed to send message 145 ret is 0 
[   77.575421] amdgpu: [powerplay] 
                failed to send pre message 146 ret is 0 
[   77.985350] amdgpu: [powerplay] 
                failed to send message 146 ret is 0 
[   78.795324] amdgpu: [powerplay] 
                failed to send pre message 145 ret is 0 
[   79.200189] amdgpu: [powerplay] 
                failed to send message 145 ret is 0 
[   80.009924] amdgpu: [powerplay] 
                failed to send pre message 146 ret is 0 
[   80.415060] amdgpu: [powerplay] 
                failed to send message 146 ret is 0 
[   81.225029] amdgpu: [powerplay] 
                failed to send pre message 145 ret is 0 
[   81.629890] amdgpu: [powerplay] 
                failed to send message 145 ret is 0 
and much more like these), but finally the system came up.

Comment 34 Luya Tshimbalanga 2017-11-16 09:06:17 UTC

I just updated to 4.14.0-1.fc27.x86_64. The issue is no longer present.

glxinfo | grep OpenGL
OpenGL vendor string: X.Org
OpenGL renderer string: AMD POLARIS11 (DRM 3.19.0 / 4.14.0-1.fc27.x86_64, LLVM 4.0.1)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.2.4
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 17.2.4
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 17.2.4
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:


Does it work for the reporter?

Comment 35 pete.marchingcubes 2017-11-18 21:21:13 UTC

It doesn't look like kernel 4.14 is in Fedora 26, so no idea. I have pinned my kernel to 4.11 to keep the machine booting.

I will update this bug if/when I move to Fedora 27.

Comment 36 pete.marchingcubes 2017-11-18 21:43:53 UTC

Still hangs on boot with latest F26 kernel (4.13.12).

Comment 37 Luya Tshimbalanga 2017-11-18 22:44:25 UTC

Kernel 4.14. will come in 2 release i.e. 4.14.2 for F26.

Comment 38 Agris 2017-12-11 12:04:26 UTC

fedora 27 same system hang after select grub menu. Bug starts from one of 4.13.x kernel after update.
AMD REDWOOD XT (radeon hd 5670 MSI).
f27 kernels 4.13 and same for 4.14.3.

one tome of 4-5 resets it boots normal. sometime can boot up 2 times normal.

before fc27 4.14 kernel i was installed elrepo 4.14.x kernels and always was booted up normal.

Comment 39 pete.marchingcubes 2017-12-18 08:18:49 UTC

Seems to boot up OK now with 4.14.5, though display turns off and enters powersave mode immediately after kernel starts. 

When sddm starts the screen comes back to life. So mostly this is fixed.

Comment 40 Luya Tshimbalanga 2018-01-15 05:56:45 UTC

Shall we close this bug reports as the reporter mentioned the issue is resolved? I can also confirm the fix having an identical card.

Comment 41 Andrea Mastellone 2018-01-15 08:12:36 UTC

I am now using kernel 4.15 and it has a better support than 4.14. The bug can be considered closed since the kernel 4.13 is no longer used, but it is not surely fixed.

Comment 42 Kamil Páral 2018-01-15 10:05:50 UTC

Closing per comment 40.

Note You need to log in before you can comment on or make changes to this bug.

agrism
airlied
ajax
alexl
artu72
bskeggs
caillon+fedoraproject
dracut-maint-list
eparis
esandeen
harald
hdegoede
ichavero
itamar
ivancich
jarodwilson
jforbes
jglisse
john.j5live
jonathan
jorgeml
josef
jwboyer
kernel-maint
kparal
labbott
linville
luya
mchehab
mjg59
nhorman
pete.marchingcubes
quintela
rhughes
rstrode
sandmann
steved
tcallawa
xan
xgl-maint
zbyszek