Bug 2020152 - amdgpu broken after latest kernel
Summary: amdgpu broken after latest kernel
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 34
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-04 08:06 UTC by Davide Corrado
Modified: 2022-06-07 22:50 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-07 22:50:34 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Davide Corrado 2021-11-04 08:06:52 UTC
1. Please describe the problem:
amdgpu not correctly initialized with latest kernel (5.14.15-200)


2. What is the Version-Release number of the kernel:
not working 5.14.15-20, works with previous versions


3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :

works with 5.14.14-200.fc34.x86_64 and previous kernels


4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:

workstation with A10 apu, 4k monitor and amdgpu. kernel started with:

vmlinuz-5.14.14-200.fc34.x86_64 root=/dev/mapper/rootvg-rootlv ro rd.lvm.lv=rootvg/rootlv rhgb quiet radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.bapm=1 amdgpu.dc=1

amdgpu=1 is required because I dont get a sceen in 4k without it. I debugged this years ago, always worked unitil today

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:

didn't try

6. Are you running any modules that not shipped with directly Fedora's kernel?:

nope


7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.


please check differences:
previuos kernel (dmesg|grep amdpgu)


WORKING

[    0.000000] Command line: BOOT_IMAGE=(hd3,gpt2)/vmlinuz-5.14.14-200.fc34.x86_64 root=/dev/mapper/rootvg-rootlv ro rd.lvm.lv=rootvg/rootlv rhgb quiet radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.bapm=1 amdgpu.dc=1
[    0.203879] Kernel command line: BOOT_IMAGE=(hd3,gpt2)/vmlinuz-5.14.14-200.fc34.x86_64 root=/dev/mapper/rootvg-rootlv ro rd.lvm.lv=rootvg/rootlv rhgb quiet radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.bapm=1 amdgpu.dc=1
[    4.206856] [drm] amdgpu kernel modesetting enabled.
[    4.216807] amdgpu: Topology: Add APU node [0x0:0x0]
[    4.216853] fb0: switching to amdgpudrmfb from EFI VGA
[    4.216972] amdgpu 0000:00:01.0: vgaarb: deactivate vga console
[    4.217109] amdgpu 0000:00:01.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    4.237791] amdgpu 0000:00:01.0: amdgpu: Fetched VBIOS from ROM BAR
[    4.237794] amdgpu: ATOM BIOS: 113-SPEC-102
[    4.237836] amdgpu 0000:00:01.0: amdgpu: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used)
[    4.237840] amdgpu 0000:00:01.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
[    4.237882] [drm] amdgpu: 1024M of VRAM memory ready
[    4.237884] [drm] amdgpu: 3072M of GTT memory ready.
[    4.277730] [drm] amdgpu: dpm initialized
[    4.553864] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    4.553903] kfd kfd: amdgpu: error getting iommu info. is the iommu enabled?
[    4.553905] kfd kfd: amdgpu: Error initializing iommuv2
[    4.554374] kfd kfd: amdgpu: device 1002:130f NOT added due to errors
[    4.554390] amdgpu 0000:00:01.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, active_cu_number 8
[    4.558231] fbcon: amdgpu (fb0) is primary device
[    4.558235] amdgpu 0000:00:01.0: [drm] fb0: amdgpu frame buffer device
[    4.564843] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:00:01.0 on minor 0
[    8.160980] snd_hda_intel 0000:00:01.1: bound 0000:00:01.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])


NOT WORKING (LATEST KERNEL)
[    0.000000] Command line: BOOT_IMAGE=(hd3,gpt2)/vmlinuz-5.14.15-200.fc34.x86_64 root=/dev/mapper/rootvg-rootlv ro rd.lvm.lv=rootvg/rootlv rhgb quiet radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.bapm=1 amdgpu.dc=1
[    0.205000] Kernel command line: BOOT_IMAGE=(hd3,gpt2)/vmlinuz-5.14.15-200.fc34.x86_64 root=/dev/mapper/rootvg-rootlv ro rd.lvm.lv=rootvg/rootlv rhgb quiet radeon.cik_support=0 amdgpu.cik_support=1 amdgpu.bapm=1 amdgpu.dc=1
[    3.848372] [drm] amdgpu kernel modesetting enabled.
[    3.858316] amdgpu: Topology: Add APU node [0x0:0x0]
[    3.858365] fb0: switching to amdgpudrmfb from EFI VGA
[    3.858476] amdgpu 0000:00:01.0: vgaarb: deactivate vga console
[    3.858618] amdgpu 0000:00:01.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    3.878215] amdgpu 0000:00:01.0: amdgpu: Fetched VBIOS from ROM BAR
[    3.878218] amdgpu: ATOM BIOS: 113-SPEC-102
[    3.878264] amdgpu 0000:00:01.0: amdgpu: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used)
[    3.878267] amdgpu 0000:00:01.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
[    3.878313] [drm] amdgpu: 1024M of VRAM memory ready
[    3.878315] [drm] amdgpu: 3072M of GTT memory ready.
[    3.916012] [drm] amdgpu: dpm initialized
[    4.197336] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    4.197375] kfd kfd: amdgpu: error getting iommu info. is the iommu enabled?
[    4.197377] kfd kfd: amdgpu: Error initializing iommuv2
[    4.197849] kfd kfd: amdgpu: device 1002:130f NOT added due to errors
[    4.197853] kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:130f
[    4.197856] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed
[    4.197859] amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init
[    4.197863] amdgpu 0000:00:01.0: amdgpu: amdgpu: finishing device.
[    4.201402] [drm] amdgpu: ttm finalized


I guess these lines are relevant to the issue:

[    4.197856] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed
[    4.197859] amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init
[    4.197863] amdgpu 0000:00:01.0: amdgpu: amdgpu: finishing device.

I dont get a screen but I can log remotely. If I remove kernel argument amdgpu.dc=1 with latest kernel the system does not boot (hang)

Comment 1 Davide Corrado 2021-11-04 08:28:05 UTC
I'm not the first one to report this:

https://www.reddit.com/r/archlinux/comments/qkoolz/linux_51415_breaks_display_modesetting_on_ryzen/

Comment 2 Basil Mohamed Gohar 2021-11-04 14:59:08 UTC
I don't have a way to remotely login to my laptop at the moment to see the exact location where it stopped, but I also run the amdgpu module and 5.14.15-200 boot halted at the point when LUKS would normally prompt me.  Manually selecting 5.14.14-200 from the GRUB menu successfully booted.

All devices in my household and for work use the amdgpu module, so I'd love to help debug this if there's anything I can do to help figure this out.

Comment 3 Davide Corrado 2021-11-07 11:21:36 UTC
Fixed: IOMMU must be enabled in bios. The default value for my mb is disabled. Is it a feature or a bug?

Comment 4 Basil Mohamed Gohar 2021-11-08 16:46:16 UTC
I have experienced the same issue with kernel-5.14.16-201.fc34.x86_64, so not change yet at least for me.  I have not checked my laptop's IOMMU option in the BIOS, nor do I even know if this option is exposed.  Kernel version 5.14.14-200.fc34.x86_64 remains the most recent one to work without this issue.  Another kernel upgrade, however, will cause this version to be wiped-out with default options of only preserving n-2 versions of the kernels.

Comment 5 Ben Cotton 2022-05-12 15:56:33 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 6 Ben Cotton 2022-06-07 22:50:34 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.