1742960 – Frozen display (green) on AMD Ryzen 5 2400G

Bug 1742960 - Frozen display (green) on AMD Ryzen 5 2400G

Summary: Frozen display (green) on AMD Ryzen 5 2400G

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	30
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-18 03:47 UTC by Suvayu
Modified:	2020-05-27 07:59 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-05-26 18:25:22 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
dmesg log (3.30 MB, text/plain) 2019-08-18 03:47 UTC, Suvayu	no flags	Details
View All

Description Suvayu 2019-08-18 03:47:04 UTC

Created attachment 1605403 [details]
dmesg log

1. Please describe the problem:

When I boot up, everything seems fine; as in, I can move the mouse, type into the login dialog box, switch to a text terminal and login, etc.

But the moment I actually login to a graphical desktop, the screen becomes green, and everything becomes unresponsive.  I can't even switch to a text terminal.  I have also been unable to login remotely.  I have to hard shutdown the machine at this point.

2. What is the Version-Release number of the kernel:

This problem happens with the 5.2.x series of kernels.  The latest I have tried is: kernel-5.2.8-200.fc30.x86_64

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
      https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

I first experienced this in: kernel-5.2.5-200.fc30.x86_64

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

1. Boot on a machine with Ryzen 5 2400G
2. Try to login

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
      ``sudo dnf update --enablerepo=rawhide kernel``:

I couldn't install the kernel from rawhide due to gpg errors:

# dnf update --releasever=rawhide --enablerepo=rawhide kernel
...
Key imported successfully
Import of key(s) didn't help, wrong key(s)?
...

I tried looking for the keys on https://getfedora.org/en/security/, but importing https://getfedora.org/static/fedora.gpg makes no difference.

6. Are you running any modules that not shipped with directly Fedora's kernel?:

No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
      issue occurred on a previous boot, use the journalctl ``-b`` flag.

I have attached the full output of `journalctl -b -1 --no-hostname -k > dmesg.txt` (-b -1 was with the problematic kernel), but here are some excerpts:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 475 at drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1401 dcn_bw_update_from_pplib.cold+0x73/0x9c [amdgpu]
Modules linked in: amdgpu(+) hid_logitech_hidpp(+) amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper crc32c_intel drm r8169 uas usb_storage hid_logitech_dj>
CPU: 2 PID: 475 Comm: systemd-udevd Not tainted 5.2.8-200.fc30.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. AB350M-Gaming 3/AB350M-Gaming 3-CF, BIOS F23d 04/17/2018
RIP: 0010:dcn_bw_update_from_pplib.cold+0x73/0x9c [amdgpu]
Code: 48 8b 93 e0 02 00 00 db 42 78 83 f9 02 77 37 b8 02 00 00 00 8d 71 ff e9 1a 67 f7 ff 48 c7 c7 f8 e3 74 c0 31 c0 e8 9b 62 a9 ef <0f> 0b e9 94 67 f7 ff 48 c7>
RSP: 0018:ffffab0fc36f76b0 EFLAGS: 00010246
RAX: 0000000000000024 RBX: ffff94f56f8b6000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff94f580a97900
RBP: ffff94f56fa2e980 R08: 0000000000000001 R09: 00000000000003a8
R10: ffffffffb1bec958 R11: 0000000000000003 R12: ffffab0fc36f7750
R13: 0000000000000001 R14: 000000000000000a R15: ffffab0fc36f78d8
FS:  00007ff98b60d940(0000) GS:ffff94f580a80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005631e9403d68 CR3: 00000007f2f52000 CR4: 00000000003406e0
Call Trace:
 dcn10_create_resource_pool+0x975/0xa30 [amdgpu]
 ? lock_timer_base+0x61/0x80
 ? _cond_resched+0x15/0x30
 ? kmem_cache_alloc_trace+0x154/0x1c0
 ? firmware_parser_create+0x17e/0x5e0 [amdgpu]
 dc_create_resource_pool+0x188/0x230 [amdgpu]
 ? dal_gpio_service_create+0x95/0xe0 [amdgpu]
 dc_create+0x219/0x5e0 [amdgpu]
 ? amdgpu_cgs_create_device+0x23/0x50 [amdgpu]
 amdgpu_dm_init+0xeb/0x160 [amdgpu]
 dm_hw_init+0xe/0x20 [amdgpu]
 amdgpu_device_init.cold+0x128d/0x161f [amdgpu]
 amdgpu_driver_load_kms+0x88/0x270 [amdgpu]
 drm_dev_register+0x111/0x150 [drm]
 amdgpu_pci_probe+0xbd/0x120 [amdgpu]
 ? __pm_runtime_resume+0x58/0x80
 local_pci_probe+0x42/0x80
 pci_device_probe+0xfd/0x190
 really_probe+0xf0/0x380
 driver_probe_device+0x59/0xd0
 device_driver_attach+0x53/0x60
 __driver_attach+0x8a/0x150
 ? device_driver_attach+0x60/0x60
 bus_for_each_dev+0x78/0xc0
 bus_add_driver+0x14a/0x1e0
 driver_register+0x6c/0xb0
 ? 0xffffffffc089c000
 do_one_initcall+0x46/0x1f4
 ? _cond_resched+0x15/0x30
 ? kmem_cache_alloc_trace+0x154/0x1c0
 ? do_init_module+0x23/0x230
 load_module+0x233b/0x2930
 ? __do_sys_init_module+0x16e/0x1a0
 ? _cond_resched+0x15/0x30
 __do_sys_init_module+0x16e/0x1a0
 do_syscall_64+0x5f/0x1a0
 ? page_fault+0x8/0x30
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7ff98c60fd5e
Code: 48 8b 0d 2d 41 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3>
RSP: 002b:00007ffdc2c09908 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 00005631e842a090 RCX: 00007ff98c60fd5e
RDX: 00007ff98c26484d RSI: 00000000007058e6 RDI: 00005631e8cfe480
RBP: 00005631e8cfe480 R08: 00005631e8433700 R09: 0000000000000006
R10: 0000000000000007 R11: 0000000000000246 R12: 00007ff98c26484d
R13: 0000000000000001 R14: 00005631e8416580 R15: 00005631e8445e70
---[ end trace e68627bbb9265691 ]---

It still manages to load amdgpu, as subsequently I see this:

[drm] Initialized amdgpu 3.32.0 20150101 for 0000:06:00.0 on minor 0

However, further down, there is an unending list of backtraces until the very end.  I guess this is when I hard shutdown the machine.  An example from these backtraces:

------------[ cut here ]------------
WARNING: CPU: 4 PID: 206 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:854 dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Modules linked in: xt_CHECKSUM xt_MASQUERADE tun bridge stp llc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ip6t>
 crc32c_intel drm r8169 uas usb_storage hid_logitech_dj wmi video pinctrl_amd i2c_dev
CPU: 4 PID: 206 Comm: kworker/u32:7 Tainted: G        W         5.2.8-200.fc30.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. AB350M-Gaming 3/AB350M-Gaming 3-CF, BIOS F23d 04/17/2018
Workqueue: events_unbound commit_work [drm_kms_helper]
RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x229 [amdgpu]
Code: 83 c8 ff e9 01 b1 f9 ff 48 c7 c7 08 01 75 c0 e8 24 51 a9 ef 0f 0b 83 c8 ff e9 eb b0 f9 ff 48 c7 c7 08 01 75 c0 e8 0e 51 a9 ef <0f> 0b 80 bb 93 01 00 00 00>
RSP: 0018:ffffab0fc35e7b58 EFLAGS: 00010246
RAX: 0000000000000024 RBX: ffff94f56f8b6000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff94f580b17900
RBP: ffff94f56f8b6000 R08: 0000000000000001 R09: 0000000000000493
R10: ffffffffb1bf268c R11: 0000000000000003 R12: ffff94f550a081b8
R13: 0000000000000000 R14: ffff94f550a081b8 R15: 0000000000000004
FS:  0000000000000000(0000) GS:ffff94f580b00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3ef9429000 CR3: 00000007a655a000 CR4: 00000000003406e0
Call Trace:
 dcn10_pipe_control_lock.part.0+0x69/0x70 [amdgpu]
 dc_commit_updates_for_stream+0x84c/0xc10 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0xa79/0x1940 [amdgpu]
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? _cond_resched+0x15/0x30
 ? wait_for_completion_timeout+0x38/0x170
 ? finish_task_switch+0x7a/0x2a0
 ? commit_tail+0x3c/0x70 [drm_kms_helper]
 commit_tail+0x3c/0x70 [drm_kms_helper]
 process_one_work+0x19d/0x380
 worker_thread+0x50/0x3b0
 kthread+0xfb/0x130
 ? process_one_work+0x380/0x380
 ? kthread_park+0x80/0x80
 ret_from_fork+0x22/0x40
---[ end trace e68627bbb9265692 ]---

Hardware info:
$ sudo inxi -C -G
CPU:       Topology: Quad Core model: AMD Ryzen 5 2400G with Radeon Vega Graphics bits: 64 type: MT MCP L2 cache: 2048 KiB
           Speed: 2015 MHz min/max: 1600/3600 MHz Core speeds (MHz): 1: 1477 2: 1483 3: 1419 4: 1444 5: 1454 6: 1425 7: 1596
           8: 1597
Graphics:  Device-1: AMD Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] driver: amdgpu v: kernel
           Display: server: Fedora Project X.org 1.20.5 driver: amdgpu,ati unloaded: fbdev,modesetting,vesa
           resolution: 1920x1080~60Hz
           OpenGL: renderer: AMD RAVEN (DRM 3.30.0 5.1.20-300.fc30.x86_64 LLVM 8.0.0) v: 4.5 Mesa 19.1.4

Extra info:
I use lightdm as my login manager, and XFCE as my desktop environment.

$ rpm -q lightdm
lightdm-1.28.0-7.fc30.x86_64

$ rpm -q xfwm4
xfwm4-4.13.4-1.fc30.x86_64
$ rpm -q xfdesktop
xfdesktop-4.13.6-1.fc30.x86_64

Comment 1 Suvayu 2019-08-18 07:01:47 UTC

correction: after it hangs, I can login remotely, and shutdown.

I also tried vanilla kernels from Thorsten's repos, same issue.
- 5.2.9 from kernel-vanilla-stable
- 5.3.0 (rc4) from kernel-vanilla-mainline (I believe this is the closest to rawhide)

Comment 2 Suvayu 2019-11-03 03:43:21 UTC

I have now tried booting with:
- 5.3.7 from the fedora repos
- 5.4.0 (rc5) from kernel-vanilla-mainline

No luck.  What do I do to get someone to have a look?  F31 is out, but I can't upgrade because of this.

Comment 3 Justin M. Forbes 2020-03-03 16:37:40 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There are a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 30 kernel bugs.

Fedora 30 has now been rebased to 5.5.7-100.fc30.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 31, and are still experiencing this issue, please change the version to Fedora 31.

If you experience different issues, please open a new bug report for those.

Comment 4 Suvayu 2020-03-09 16:10:06 UTC

Hi,

Sorry about the delay, unfortunately I cannot provide any more info as I do not have access to this particular machine, it's in a different country.

That said, since then I have tried installing Fedora 31 (not 30) on another Ryzen with Vega IGP (ThinkPad T495s), and I have faced similar issues.

Hardware: AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx (-MT MCP-)
Kernel: 5.5.7-200.fc31.x86_64 x86_64
Login Manager: LightDM
Desktop: XFCE

I have tried with most of the 5.x series that have been pushed to fedora updates from January 2020. While initially I could get a login screen, but logging in would lead to a hard lock with a black screen.  The journal would show an unending sequence of kernel backtraces like below (this one was with kernel 5.4.20 from end of Feb):

------------[ cut here ]------------
WARNING: CPU: 5 PID: 60 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:967 dcn10_verify_allow_pstate_change_high+0x32/0x290 [amdgpu]
Modules linked in: ccm xt_MASQUERADE nf_conntrack_netlink xt_addrtype br_netfilter bridge stp llc ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables overlay ip6table_filter ip6_tables iptable_filter sunrpc squashfs zstd_decompress loop edac_mce_amd kvm_amd ccp kvm snd_hda_codec_realtek snd_hda_codec_generic iwlmvm snd_hda_codec_hdmi snd_hda_intel irqbypass snd_intel_dspcfg snd_hda_codec snd_usb_audio mac80211 snd_hda_core btusb crct10dif_pclmul snd_usbmidi_lib btrtl snd_rawmidi btbcm crc32_pclmul snd_hwdep uvcvideo btintel snd_seq ghash_clmulni_intel videobuf2_vmalloc videobuf2_memops snd_seq_device videobuf2_v4l2 libarc4 bluetooth pcspkr videobuf2_common wmi_bmof k10temp videodev snd_pcm iwlwifi thinkpad_acpi
 ecdh_generic ipmi_devintf sp5100_tco snd_timer ledtrig_audio mc joydev ecc i2c_piix4 snd_pci_acp3x cfg80211 snd ucsi_acpi typec_ucsi ipmi_msghandler rtsx_pci_ms soundcore memstick typec rfkill i2c_scmi acpi_cpufreq ip_tables amdgpu amd_iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper drm hid_logitech_hidpp rtsx_pci_sdmmc mmc_core crc32c_intel nvme rtsx_pci r8169 nvme_core serio_raw wmi video pinctrl_amd hid_logitech_dj fuse
CPU: 5 PID: 60 Comm: kworker/u32:1 Not tainted 5.4.20-200.fc31.x86_64 #1
Hardware name: LENOVO 20QJCTO1WW/20QJCTO1WW, BIOS R13ET39W(1.13 ) 10/11/2019
Workqueue: events_unbound commit_work [drm_kms_helper]
RIP: 0010:dcn10_verify_allow_pstate_change_high+0x32/0x290 [amdgpu]
Code: 8b 87 10 03 00 00 48 89 fb 48 8b b8 b0 01 00 00 e8 53 14 01 00 84 c0 0f 85 55 02 00 00 80 3d 64 02 27 00 00 0f 85 4b 02 00 00 <0f> 0b 80 bb a3 01 00 00 00 0f 84 39 02 00 00 48 8b 83 10 03 00 00
RSP: 0018:ffffb50f40347ad8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8b3d648f0000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8b3d70f57908 RDI: ffff8b3d70f57908
RBP: ffff8b3d648f0000 R08: ffff8b3d70f57908 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b3d313b01b8
R13: ffff8b3d648f0000 R14: 0000000000000004 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8b3d70f40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdef0009168 CR3: 000000036777c000 CR4: 00000000003406e0
Call Trace:
 dcn10_pipe_control_lock.part.0+0x69/0x70 [amdgpu]
 dc_commit_updates_for_stream+0xf34/0x14b0 [amdgpu]
 ? amdgpu_display_get_crtc_scanoutpos+0x85/0x190 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0xb5e/0x1d70 [amdgpu]
 ? __schedule+0x2da/0x730
 ? ttwu_do_wakeup+0x19/0x140
 ? schedule+0x39/0xa0
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? _cond_resched+0x15/0x30
 ? wait_for_completion_timeout+0x38/0x170
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? commit_tail+0x94/0x110 [drm_kms_helper]
 commit_tail+0x94/0x110 [drm_kms_helper]
 process_one_work+0x1b5/0x360
 worker_thread+0x50/0x3c0
 kthread+0xf9/0x130
 ? process_one_work+0x360/0x360
 ? kthread_park+0x90/0x90
 ret_from_fork+0x22/0x40
---[ end trace 682c1b2b8324159b ]---
[drm] pstate TEST_DEBUG_DATA: 0x36F60000

In the meantime I did find a workaround on the freedesktop gitlab instance (can't find the issue right now).  If I disable compositing (with `xfconf-query -c xfwm4 -p /general/use_compositing -s false'), I could login to a functioning desktop.  The condition has now changed to, I still can't login, but instead of a hard lock with black screen, LightDM segfaults.  Curiously, I cannot spot a kernel backtrace any more. The workaround of disabling compositing still works.  Should I update this bug report with more info, or is it better to go on a new issue?

Comment 5 Ben Cotton 2020-04-30 20:22:09 UTC

This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 6 Ben Cotton 2020-05-26 18:25:22 UTC

Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 7 Suvayu 2020-05-27 07:59:55 UTC

The Fedora bugzilla has been such a joke.  Over the last year and half I have reported several bugs, and *all* have gone this way, me talking to myself (on several occasions I have even tried to highlight them on IRC).  I have been a Fedora user for over 10 years, never has it been this bad.  Any day I would take the old community, struggling with drivers, pulseaudio, and systemd, and actually getting a response and fixing the problems together over this kind of neglect.  Disgraceful.  Maybe I should move to Arch.

Note You need to log in before you can comment on or make changes to this bug.