Bug 1592110 - Today after kernel upgrade stop working video output on AMD GPU Vega 56
Summary: Today after kernel upgrade stop working video output on AMD GPU Vega 56
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-17 06:52 UTC by Mikhail
Modified: 2018-07-30 14:56 UTC (History)
27 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 14:56:32 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
system log (287.48 KB, text/x-vhdl)
2018-06-17 07:26 UTC, Mikhail
no flags Details
system log (346.56 KB, text/plain)
2018-07-02 15:31 UTC, Mikhail
no flags Details
system log (4.18.0-0.rc3.git3.1.fc29.x86_64) (351.96 KB, text/plain)
2018-07-09 18:33 UTC, Mikhail
no flags Details

Description Mikhail 2018-06-17 06:52:10 UTC
Description of problem:
Today after kernel upgrade stop working video output on AMD GPU Vega 56 instead monitor said "no signal".

Video output on Intel GPU works fine.

Version-Release number of selected component (if applicable):
$ uname -r
4.18.0-0.rc0.git10.1.fc29.x86_64

$ rpm -qa | grep kernel | sort
abrt-addon-kerneloops-2.10.10-1.fc29.x86_64
kernel-4.18.0-0.rc0.git10.1.fc29.x86_64
kernel-4.18.0-0.rc0.git5.1.fc29.x86_64
kernel-4.18.0-0.rc0.git9.1.fc29.x86_64
kernel-core-4.18.0-0.rc0.git10.1.fc29.x86_64
kernel-core-4.18.0-0.rc0.git5.1.fc29.x86_64
kernel-core-4.18.0-0.rc0.git9.1.fc29.x86_64
kernel-headers-4.18.0-0.rc0.git10.1.fc29.x86_64
kernel-modules-4.18.0-0.rc0.git10.1.fc29.x86_64
kernel-modules-4.18.0-0.rc0.git5.1.fc29.x86_64
kernel-modules-4.18.0-0.rc0.git9.1.fc29.x86_64
kernel-modules-extra-4.18.0-0.rc0.git10.1.fc29.x86_64
kernel-modules-extra-4.18.0-0.rc0.git5.1.fc29.x86_64
kernel-modules-extra-4.18.0-0.rc0.git9.1.fc29.x86_64
libreport-plugin-kerneloops-2.9.5-1.fc29.x86_64


Latest workable kernel is 4.18.0-0.rc0.git9.1.fc29.x86_64

Comment 1 Mikhail 2018-06-17 07:26:03 UTC
Created attachment 1452338 [details]
system log

Comment 2 Jeremy Cline 2018-06-21 14:37:44 UTC
Thanks for the report. I'm re-assigning this so the graphics team can take a look.

4.18.0-0.rc0.git9.1 was built from upstream commit 2837461dbe6f4a9acc0d86f88825888109211c99 and 4.18.0-0.rc0.git10.1 was built from upstream commit 4c5e8fc62d6a63065eeae80808c498d1dcfea4f4, so if you could git bisect the kernel between those revisions that would be very helpful.

Comment 3 Mikhail 2018-06-21 21:17:12 UTC
Jeremy, thanks for respond.
How do bisect and not turns Fedora in slackware?
I means it's not good if I would compile kernel directly from git and for installation use command "make modules_install && make install".
Best option would be if Fedora packager `fedpkg` provide bisect and make rpm packages for testing automatically.

Comment 4 Jeremy Cline 2018-06-22 13:00:48 UTC
Hi Mikhail,

Unfortunately, at the moment the best way it do it via git. There's some documentation[0] to make it simpler, but it's not an automated task. There used to be a tool called 'fedbisect', but it's not maintained and definitely no longer works. I've been mulling over reviving it (or something like it), though. 

[0] https://docs.fedoraproject.org/quick-docs/en-US/kernel/troubleshooting.html#bisecting-the-kernel

Comment 5 Mikhail 2018-06-30 20:16:06 UTC
Hmm

$ git bisect bad
bfdec234047889f4f6af1ec45c7c502a4405b3fb is the first bad commit
commit bfdec234047889f4f6af1ec45c7c502a4405b3fb
Author: Harry Wentland <harry.wentland>
Date:   Fri May 18 17:07:06 2018 -0400

    drm/amd/display: Implement dm_pp_get_clock_levels_by_type_with_latency
    
    This is required so we use the correct minimum clocks for Vega. Without
    this pplib will never be able to enter the lowest clock states.
    
    Signed-off-by: Harry Wentland <harry.wentland>
    Acked-by: Alex Deucher <alexander.deucher>
    Signed-off-by: Alex Deucher <alexander.deucher>

:040000 040000 80634f8876ec51a1e6fb2e7e4c59de6bbe244544 c8ff74b360beb1436bf8a274d8cfb1584582aa99 M	drivers

Comment 6 Mikhail 2018-06-30 20:18:43 UTC
$ git bisect log
# bad: [4c5e8fc62d6a63065eeae80808c498d1dcfea4f4] Merge tag 'linux-kselftest-4.18-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
# good: [2837461dbe6f4a9acc0d86f88825888109211c99] Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect start '4c5e8fc62d6a63065eeae80808c498d1dcfea4f4' '2837461dbe6f4a9acc0d86f88825888109211c99'
# good: [b5d903c2d656e9bc54bc76554a477d796a63120d] Merge branch 'akpm' (patches from Andrew)
git bisect good b5d903c2d656e9bc54bc76554a477d796a63120d
# bad: [a0b2ac29415bb44d1c212184c1385a1abe68db5c] drm/amdgpu: fix the missed vcn fw version report
git bisect bad a0b2ac29415bb44d1c212184c1385a1abe68db5c
# bad: [0b19fdc45feffd7569c081fe32a258df3c8ebb9b] drm/amd/display: fix dscl_manual_ratio_init
git bisect bad 0b19fdc45feffd7569c081fe32a258df3c8ebb9b
# bad: [4c6530fd66399182d0332c5ed821ea473bdcd7c3] drm/amdgpu: remove unnecessary scheduler entity for VCN
git bisect bad 4c6530fd66399182d0332c5ed821ea473bdcd7c3
# bad: [10dd2b865393bb45526ca342fe69207341f89fd5] drm/amd/display: Fix wrong latency assignment for VEGA clock levels
git bisect bad 10dd2b865393bb45526ca342fe69207341f89fd5
# bad: [adea72c5046f7faffff969ece04c3f31e669edf4] drm/amdgpu: vcn_v1_0_is_idle() can be static
git bisect bad adea72c5046f7faffff969ece04c3f31e669edf4
# bad: [bfdec234047889f4f6af1ec45c7c502a4405b3fb] drm/amd/display: Implement dm_pp_get_clock_levels_by_type_with_latency
git bisect bad bfdec234047889f4f6af1ec45c7c502a4405b3fb
# first bad commit: [bfdec234047889f4f6af1ec45c7c502a4405b3fb] drm/amd/display: Implement dm_pp_get_clock_levels_by_type_with_latency

Comment 7 Jeremy Cline 2018-07-02 13:57:36 UTC
Thanks for bisecting. I see a problem in that patch that's already been fixed in upstream commit 10dd2b865393b, can you try a kernel with that patch included? It should be in rc1 and greater.

Comment 8 Mikhail 2018-07-02 15:29:11 UTC
(In reply to Jeremy Cline from comment #7)
> Thanks for bisecting. I see a problem in that patch that's already been
> fixed in upstream commit 10dd2b865393b, can you try a kernel with that patch
> included? It should be in rc1 and greater.

I am tried 4.18.0-0.rc2.git4.1.fc29.x86_64 and must noted that issue still not fixed.

Comment 9 Mikhail 2018-07-02 15:31:05 UTC
Created attachment 1455984 [details]
system log

Comment 10 Burstien 2018-07-03 22:02:04 UTC
Hello,

I have experienced the same issue as the reporter. 

GPU: Sapphire Vega 64 Nitro+ Limited Edition.
Monitor: AOC G2460PG

In my case, the last working kernel was 4.16.16. After updating to Kernel 4.17.2 (not rawhide), the "no display" issue was introduced.
When the GPU was connected to the monitor using a DisplayPort cable, the message displayed prior to loading into the OS was:

[drm:dm_logger_write [amdgpu]] *ERROR* No EDID read.

After that, the monitor displayed a "no signal" message and turned off.

With rawhide kernels 4.18-rc1 and 4.18-rc3, the message changed to:

[drm:dc_link_detect [amdgpu]] *ERROR* No EDID read

But this time, instead of the monitor displaying "no signal" and then turning off, the picture on the monitor gets frozen with the error message still displayed, and the terminal cursor is stuck (not blinking as it usually does when the monitor gets refreshed with new output).

I have also connected the GPU using HDMI cable to my TV (LG 32LG60UR), with which the display works, in both kernels 4.17.2 and 4.18-rc3 (I can't recall testing with 4.18-rc1).

Hope this information helps.

Comment 11 Mikhail 2018-07-09 18:32:00 UTC
Any updates? Has anybody responded?
One more week without new 4.18 kernel.

Comment 12 Mikhail 2018-07-09 18:33:45 UTC
Created attachment 1457562 [details]
system log (4.18.0-0.rc3.git3.1.fc29.x86_64)

Comment 13 Jeremy Cline 2018-07-09 20:15:37 UTC
I emailed upstream (and CCed you, so you should see any responses), but I've not heard anything yet. Last week was a holiday in the US and a bank holiday other places, so I wouldn't be surprised if people were out all last week and still catching up on email.

Comment 14 Burstien 2018-07-26 07:43:04 UTC
(In reply to Burstien from comment #10)
> Hello,
> 
> I have experienced the same issue as the reporter. 
> 
> GPU: Sapphire Vega 64 Nitro+ Limited Edition.
> Monitor: AOC G2460PG
> 
> In my case, the last working kernel was 4.16.16. After updating to Kernel
> 4.17.2 (not rawhide), the "no display" issue was introduced.
> When the GPU was connected to the monitor using a DisplayPort cable, the
> message displayed prior to loading into the OS was:
> 
> [drm:dm_logger_write [amdgpu]] *ERROR* No EDID read.
> 
> After that, the monitor displayed a "no signal" message and turned off.
> 
> With rawhide kernels 4.18-rc1 and 4.18-rc3, the message changed to:
> 
> [drm:dc_link_detect [amdgpu]] *ERROR* No EDID read
> 
> But this time, instead of the monitor displaying "no signal" and then
> turning off, the picture on the monitor gets frozen with the error message
> still displayed, and the terminal cursor is stuck (not blinking as it
> usually does when the monitor gets refreshed with new output).
> 
> I have also connected the GPU using HDMI cable to my TV (LG 32LG60UR), with
> which the display works, in both kernels 4.17.2 and 4.18-rc3 (I can't recall
> testing with 4.18-rc1).
> 
> Hope this information helps.

An update on my situation:

I used the latest rawhide-nodebug kernel to this date (4.18.0-0.rc6.git0.1.fc29.x86_64) and the monitor now works fine. the "No EDID read" message has dissappeared.

The last non-working kernel for me (the one I last checked), was:
kernel-4.18.0-0.rc4.git0.1.fc29.x86_64

Comment 15 Mikhail 2018-07-28 08:25:26 UTC
I can confirm problem was fixed in 4.18.0-0.rc5.git1.1

Anyway if someone is interested in a patch, then it's here: https://bugs.freedesktop.org/show_bug.cgi?id=107082#c3

Comment 16 Laura Abbott 2018-07-30 14:56:32 UTC
Thanks for letting us know.


Note You need to log in before you can comment on or make changes to this bug.