Bug 1596342 - amdgpu crash in X
Summary: amdgpu crash in X
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati
Version: 28
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-28 16:35 UTC by Matthew Wilson
Modified: 2019-05-28 23:20 UTC (History)
27 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-05-28 23:20:33 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg from boot -> X -> restart X (91.68 KB, text/plain)
2018-06-28 16:35 UTC, Matthew Wilson
no flags Details

Description Matthew Wilson 2018-06-28 16:35:27 UTC
Created attachment 1455382 [details]
dmesg from boot -> X -> restart X

Description of problem:
After updating to kernel 4.17.2-200.fc28.x86_64, booting up either:

1. leaves me with blank screens (monitors go to sleep, cannot be woken)
2. lets me log in, but X immediately becomes unresponsive (can type, but not click; restarting X then works)


Version-Release number of selected component (if applicable):

kernel-4.17.2-200.fc28.x86_64
xorg-x11-drv-amdgpu-18.0.1-1.fc28.x86_64

AMD RX570
AMD Ryzen 5 1600X

How reproducible:
100%

Steps to Reproduce:
1. Boot up the computer
2. Wait for boot to complete
3. Try logging in
4. Launch an application (e.g. a terminal)

Actual results:

X becomes unresponsive

Expected results:

Works as per kernel-4.16.16-300.fc28.x86_64.

Additional info:

Works perfectly when booting back to 4.16.16-300 or earlier.

Relevant trace:

[   44.629726] Call Trace:
[   44.629786]  dce110_stream_encoder_dp_blank+0x12c/0x1a0 [amdgpu]
[   44.629839]  core_link_disable_stream+0x54/0x270 [amdgpu]
[   44.629891]  dce110_reset_hw_ctx_wrap+0xb9/0x1b0 [amdgpu]
[   44.629944]  dce110_apply_ctx_to_hw+0x52/0xa30 [amdgpu]
[   44.629999]  ? dce_pipe_control_lock+0x184/0x1f0 [amdgpu]
[   44.630050]  dc_commit_state+0x2f5/0x590 [amdgpu]
[   44.630107]  amdgpu_dm_atomic_commit_tail+0x351/0xcf0 [amdgpu]
[   44.630112]  ? __wake_up_common_lock+0x89/0xc0
[   44.630114]  ? _cond_resched+0x15/0x30
[   44.630117]  ? wait_for_completion_timeout+0x3a/0x190
[   44.630118]  ? wait_for_completion_interruptible+0x35/0x1d0
[   44.630120]  ? _cond_resched+0x15/0x30
[   44.630174]  ? dm_plane_helper_cleanup_fb+0x120/0x120 [amdgpu]
[   44.630182]  commit_tail+0x3d/0x70 [drm_kms_helper]
[   44.630190]  drm_atomic_helper_commit+0x103/0x110 [drm_kms_helper]
[   44.630204]  drm_framebuffer_remove+0x2cc/0x3e0 [drm]
[   44.630220]  drm_mode_rmfb_work_fn+0x4f/0x60 [drm]
[   44.630223]  process_one_work+0x187/0x340
[   44.630225]  worker_thread+0x1c7/0x380
[   44.630228]  ? pwq_unbound_release_workfn+0xd0/0xd0
[   44.630229]  kthread+0x112/0x130
[   44.630231]  ? kthread_create_worker_on_cpu+0x70/0x70
[   44.630233]  ret_from_fork+0x22/0x40

Full dmesg attached.

Comment 1 jon 2018-07-18 03:15:31 UTC
I'm experiencing a similar problem.  I can boot the 4.17 kernels, currently using 4.17.6-200.  I don't have a pre-4.17 kernel I can try.

It runs for a few hours then freezes completely with the same error message:

Jul 17 21:30:23 pc003 kernel: [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:956
Jul 17 21:30:23 pc003 kernel: WARNING: CPU: 1 PID: 652 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:195 generic_reg_wait+0xe7/0x160 [amdgpu]
Jul 17 21:30:23 pc003 kernel: Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 devlink tun rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache nf_connt>
Jul 17 21:30:23 pc003 kernel:  crc32_pclmul snd_hda_core snd_rawmidi snd_hwdep snd_seq ghash_clmulni_intel snd_seq_device snd_pcm joydev snd_timer snd sp5100_tco k10temp soundcore i2c_piix4 >
Jul 17 21:30:23 pc003 kernel: CPU: 1 PID: 652 Comm: kworker/1:3 Tainted: G        W         4.17.6-200.fc28.x86_64 #1
Jul 17 21:30:23 pc003 kernel: Hardware name: Micro-Star International Co., Ltd. MS-7A34/B350 TOMAHAWK ARCTIC (MS-7A34), BIOS H.D0 05/02/2018

Comment 2 jon 2018-07-18 03:16:14 UTC
Hardware: 

MSI Tomahawk B350 Arctic
Ryzen 1700X
AMD RX 560

Comment 3 Matthew Wilson 2018-09-25 14:12:06 UTC
Still occurring regularly with 4.18.9-200.fc28.x86_64.

I'm wondering if this is related to the system freezes I get: see 1594595 or https://bugzilla.kernel.org/show_bug.cgi?id=199925

I've tried the amdgpu.dc=0 kernel flag, and will report back when I have some stability data to report.

Comment 4 James McDermott 2018-11-19 21:01:56 UTC
I am no longer seeing this with 4.19.2-300.fc29.x86_64.  Running ryzen 2600 and a vega 56.

I was getting: 
[drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 6000 tries - dce_mi_allocate_dmif line:599

Anytime power save blanked the screen.

Again this seems like its possibly resolved in the 4.19.2-300.fc29 kernel

Comment 5 jon 2018-11-25 15:05:01 UTC
James:

I'm guessing thats a Fedora 29 kernel?  I will try building 4.19.2 from source, I was seeing the same behavior in 4.19.1, so maybe it was fixed in .2

I will give it a shot, thanks.

Comment 6 James McDermott 2018-11-25 16:23:59 UTC
(In reply to jon from comment #5)
> James:
> 
> I'm guessing thats a Fedora 29 kernel?  I will try building 4.19.2 from
> source, I was seeing the same behavior in 4.19.1, so maybe it was fixed in .2
> 
> I will give it a shot, thanks.

Yes that is the latest Fedora 29 kernel.  Fedora 28 also has 4.19.2-200.fc28.x86_64, I dont have an easy way to test the Fed 28 kernel though.

--James

Comment 7 jon 2018-11-25 16:45:40 UTC
Thanks, James.  I've installed 4.19.2 from the official repos and I'll see if I have better luck with that.  If not, I've got a vanilla 4.19.4 kernel compiled from source that I'll try next.

Comment 8 Ben Cotton 2019-05-02 20:15:55 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 9 Ben Cotton 2019-05-28 23:20:33 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.