Bug 1656257

Summary: Today after kernel upgrade stop switching video mode on AMD GPU Vega 56
Product: [Fedora] Fedora Reporter: Mikhail <mikhail.v.gavrilov>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: airlied, awilliam, bskeggs, ego.cordatus, ewk, hdegoede, ichavero, itamar, jarodwilson, jcline, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mikhail.v.gavrilov, mjg59, pjones, rharwood, robatino, steved, yaneti
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-17 17:53:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1574713, 1574714    
Attachments:
Description Flags
How looks like output on my monitor
none
dmesg
none
system log none

Description Mikhail 2018-12-05 04:12:40 UTC
Created attachment 1511526 [details]
How looks like output on my monitor

Description of problem:
Today after kernel upgrade on the AMD GPU Vega 56 video card  stopped switching video mode to graphic.
Output stay in text mode. Last visible line is:
[    3.687872] fb0: switching to amdgpudrmfb from EFI VGA

First kernel where was introduced this issue is 4.20.0-0.rc5.git0.1.fc30.x86_64
Last workable kernel is 4.20.0-0.rc4.git2.1.fc30.x86_64

Comment 1 Mikhail 2018-12-05 04:13:26 UTC
Created attachment 1511527 [details]
dmesg

Comment 2 Mikhail 2018-12-05 04:13:54 UTC
Created attachment 1511528 [details]
system log

Comment 3 Fedora Blocker Bugs Application 2018-12-05 10:52:39 UTC
Proposed as a Blocker and Freeze Exception for 30-beta by Fedora user mikhail using the blocker tracking app because:

  Because of this bug impossible login to system with AMD GPU

Comment 4 Mikhail 2018-12-08 10:49:13 UTC
Which kernel commit are corresponds to 4.20.0-0.rc5.git0.1.fc30.x86_64 package?

If I am understand right from change log:

* Mon Dec 03 2018 Justin M. Forbes <jforbes> - 4.20.0-0.rc5.git0.1
- Linux v4.20-rc5

* Mon Dec 03 2018 Justin M. Forbes <jforbes>
- Disable debugging options.

 this is tag v4.20-rc5 which is corresponds to commit 2595646791c319cadfdbf271563aac97d0843dc7 but when I builded vanilla kernel during bisecting this issue at this commit I am not experienced this issue.

So I suppose problem in some Fedora patch. 


$ git bisect log
git bisect start
# bad: [2595646791c319cadfdbf271563aac97d0843dc7] Linux 4.20-rc5
git bisect bad 2595646791c319cadfdbf271563aac97d0843dc7
# good: [94f371cb73944b410a269d570d6946c042f2ddd0] Merge tag 'acpi-4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 94f371cb73944b410a269d570d6946c042f2ddd0
# good: [08be37b798921af207e78082fe261a6ca8be5024] mm/gup: finish consolidating error handling
git bisect good 08be37b798921af207e78082fe261a6ca8be5024
# good: [880584176ed7875117a5ba76cf316cb60f7ad30b] Merge tag 'for-linus-20181201' of git://git.kernel.dk/linux-block
git bisect good 880584176ed7875117a5ba76cf316cb60f7ad30b
# good: [292974c5acae330186cbf5a833385f666aeb12c0] Merge tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
git bisect good 292974c5acae330186cbf5a833385f666aeb12c0
# good: [292974c5acae330186cbf5a833385f666aeb12c0] Merge tag 'for-linus-4.20a-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
git bisect good 292974c5acae330186cbf5a833385f666aeb12c0
# good: [adb97bcdbdb2d42c90b5f11e08a9b5fbc017e5d7] Merge tag 'v4.20-rockchip-dts64fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into fixes
git bisect good adb97bcdbdb2d42c90b5f11e08a9b5fbc017e5d7
# good: [89acb56db4979e55380839c815566ddb9a01949b] Merge tag 'davinci-fixes-for-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes
git bisect good 89acb56db4979e55380839c815566ddb9a01949b
# good: [91e43395820baad80248987608216c35da9df65b] Merge branch 'fixes-dts' into omap-for-v4.20/fixes
git bisect good 91e43395820baad80248987608216c35da9df65b
# good: [7e76e65ce7e9405a9608e1b806be58a6cbf4a737] MAINTAINERS: Remove unused Qualcomm SoC mailing list
git bisect good 7e76e65ce7e9405a9608e1b806be58a6cbf4a737
# good: [bfed4d730823440d0da0cd21554efc2de831627d] Merge tag 'imx-fixes-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
git bisect good bfed4d730823440d0da0cd21554efc2de831627d
# good: [bfed4d730823440d0da0cd21554efc2de831627d] Merge tag 'imx-fixes-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
git bisect good bfed4d730823440d0da0cd21554efc2de831627d
# good: [6a512726090a5cfd8d5cd41652d2b98a222854b8] Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 6a512726090a5cfd8d5cd41652d2b98a222854b8
# first bad commit: [2595646791c319cadfdbf271563aac97d0843dc7] Linux 4.20-rc5


But vanilla commit 2595646791c319cadfdbf271563aac97d0843dc7 also good for me.

Comment 5 Yanko Kaneti 2018-12-12 12:10:55 UTC
Its probably the CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y  tunrned on after the rc5 fedora kernels.

mem_encrypt=off  on the kernel command line makes the boot work as usual here.

Comment 6 Mikhail 2018-12-12 21:20:41 UTC
I am confirm that I able reproduce this issue with vanilla kernel after recompiling with option CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y
Also mem_encrypt=off works as workaround for me.

This commit are blame: https://src.fedoraproject.org/rpms/kernel/c/f8216ee47abe1997e9fc7efb3d819d6105d898f5?branch=master

Comment 7 Mikhail 2018-12-13 20:01:12 UTC
Peter, why you enable option CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y ?

I asked AMD developers do they know about problem with option `CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y` with their GPU.

And I got short answer:

Yes, that is a known limitation.

Almost all GPU drivers are incompatible with memory encryption.

As a workaround you can either turn off memory encryption or force the slowpath for memory allocation in the GPU drivers.

Comment 8 Adam Williamson 2018-12-17 17:47:09 UTC
Well, the commit message says:

"This makes it so users don't have to do mem_encrypt=1 to enable SEV VMs."

which is a reference to AMD's 'secure encrypted virtualization' tech: https://developer.amd.com/sev/

so...that's why. He wants to let people use that feature (without a special kernel arg).

However, if that's going to break most AMD GPUs (per the reply Mikhail got), I'd agree it's unacceptable and probably a release blocker.

Comment 9 Jeremy Cline 2018-12-17 17:53:05 UTC
CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is off again starting in 4.20-rc6-82-g65e08c5e8631.