Bug 2247154 - Graphical artifacting on resume 7840u
Summary: Graphical artifacting on resume 7840u
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 39
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: Framework
TreeView+ depends on / blocked
 
Reported: 2023-10-31 01:13 UTC by Michael D
Modified: 2024-04-09 12:40 UTC (History)
30 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-04-09 12:40:40 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg from resue to shutdown for the instance the screen had artifacting. (45.09 KB, text/plain)
2023-10-31 01:13 UTC, Michael D
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab drm amd issues 3187 0 None opened Blinking/Flashing white screen after suspend 2024-02-27 17:36:14 UTC

Description Michael D 2023-10-31 01:13:32 UTC
Created attachment 1996302 [details]
dmesg from resue to shutdown for the instance the screen had artifacting.

Created attachment 1996302 [details]
dmesg from resue to shutdown for the instance the screen had artifacting.

Created attachment 1996302 [details]
dmesg from resue to shutdown for the instance the screen had artifacting.

Description of problem:

Framework Laptop 13

CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 7840U w/ Radeon  780M Graphics



Version-Release number of selected component (if applicable):

- 6.5.6-300.fc39.x86_64
- amd-gpu-firmware-20230919-1.fc39.noarch

How reproducible:
Sporadic, but appears to reliably happen within a two day window.

Steps to Reproduce:
1. Use the machine, suspending and resuming throughout the day.
2. Continue to keep the machine charged throughout this usage for several days.
3. On resume, look at the screen or use the machine before suspending again.

Actual results:
On a random resume, the screen will have large white flickering artifacts covering most of the screen.

Expected results:
Suspending and resuming through normal usage is able to be performed indefinitely so long as the battery does not die.


Additional info:

Comment 1 mattwork 2023-10-31 01:26:38 UTC
Thanks @midefran 

@mario.limonciello not sure if you have seen this before elsewhere, but I have had no luck reproducing this. Is this something you've seen elsewhere?

Here is our main thread for general context.
https://community.frame.work/t/tracking-graphical-corruption-in-fedora-39-amd-3-03-bios/39073/4

Comment 2 Peter Robinson 2023-10-31 09:57:52 UTC
Why do you think this is firmware and not kernel or mesa?

There's a new set of AMD firmware "from the 5.7 branch" in the linux-firmware-20231030-1 builds, it may be worth seeing if they improve anything.

Comment 3 Mario Limonciello 2023-10-31 15:12:56 UTC
> @mario.limonciello not sure if you have seen this before elsewhere, but I have had no luck reproducing this. Is this something you've seen elsewhere?

FYI I happen to have come across this bug, but it's only by luck.  Please CC me directly if you want me to see bugs.

> Here is our main thread for general context.
> https://community.frame.work/t/tracking-graphical-corruption-in-fedora-39-amd-3-03-bios/39073/4

This thread is really confusing to follow, it seems like it's got a bunch of different threads merged or something.

Anyway - visibly looks "somewhat" similar to the behavior that was observed with S/G in >= 64GB memory cases.
The patches for that have been brought into 6.5.5 though.  So if this type of behavior is happening in 6.5.6 and Framework BIOS 3.03 it's a different issue.

> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc00000 flags=0x0000]
> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc01000 flags=0x0000]
> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc02000 flags=0x0000]
> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc03000 flags=0x0000]
> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc04000 flags=0x0000]
> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc05000 flags=0x0000]
> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc06000 flags=0x0000]
> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc07000 flags=0x0000]
> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc08000 flags=0x0000]
> amdgpu 0000:c1:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xfffffc14000 flags=0x0000]

This is something new that I've never seen before on resume.
Is it specifically with certain apps running over suspend?  Or VRAM pressure under suspend?

I suggest to report this to the AMD DRM bug tracker.  https://gitlab.freedesktop.org/drm/amd/-/issues

Anyone affected by this issue should be able to use amdgpu.sg_display=0 as a workaround until this has a reliable reproducer and root cause.
The unfortunate side effect is this may cause "black screen" when under VRAM pressure.

Comment 4 mattwork 2023-10-31 22:29:01 UTC
Thanks, yeah, that thread is a mess. And that is after we cleaned it up. 

> Is it specifically with certain apps running over suspend? Or VRAM pressure under suspend?

I suspect it's applications, but we're still trying to get everyone to drill down on what they have in common to arrive at this state.

I will have those affected get their findings (as I still can't seem to repro this) to https://gitlab.freedesktop.org/drm/amd/-/issues

Comment 5 Adam Williamson 2023-12-11 16:20:27 UTC
Do we have an AMD gitlab issue for this yet? Thanks!

Comment 6 Michael D 2023-12-13 03:13:51 UTC
Hello Adam,

I have opened an AMD gitlab issue here:
https://gitlab.freedesktop.org/drm/amd/-/issues/3003

I've installed the suggested 6.7.0-0.rc2.22.fc40.x86_64 and have not been able to reproduce the issue since.
I am looking forward to the 6.7 test days to see if continues to work well when released.

As such, I feel we can close this issue as it seems to be within AMD drm.

Comment 7 Mario Limonciello 2023-12-13 05:56:42 UTC
I don't think it's appropriate to close in Fedora just because it's an "upstream AMD" bug.  If there is a solution that is outlined in the AMD issue the patches should be brought to Fedora, and this is the best way to ensure that happens.

Comment 8 Adam Williamson 2023-12-13 06:37:11 UTC
That will happen naturally as 6.7 comes to F39 then F38, but if Justin feels like it and the patches are identifiable and backportable, he *could* backport them to 6.6 to speed the process a bit. Re-assigning to kernel since it seems like it's a kernel issue.

Comment 9 JF002 2024-02-09 10:50:47 UTC
Hi,

I went to say hello at the Fedora stand at FOSDEM last week-end, and I mentioned how happy I am with Fedora support on my Framework 13 AMD (Ryzen 5 7640U, 32GB RAM). The only issue I have (had?) is this artifacting issue at wake up (the display is all white and blinks when I move the mouse or type of on the keyboard). The kind folks told me I should let the developers know about this, so, here I am :)

However, since then I updated the kernel from 6.6.x to 6.7.3.200 and I couldn't reproduce the issue. So this seems to confirm Michael's observations with kernel 6.7. I'll keep an eye on this issue and provide more info in case I reproduce it again.

Thanks again for the amazing support of the Framework AMD on Fedora!

Comment 10 Lukas Ruzicka 2024-02-27 08:05:52 UTC
I am experiencing basically the same issue. After waking up the computer from suspend, both my screens (internal and external) are fully white. There is nothing I can do with it, except switch to VVT and restart the computer. At least I did not come up with a workaround yet. I have pulled out the process from the journalctl and I am adding it as an attachment.

Comment 12 Baptiste Mille-Mathias 2024-02-27 09:26:56 UTC
(In reply to Lukas Ruzicka from comment #10)
> There is nothing I can do with it, except switch to VVT and restart the computer.
> At least I did not come up with a workaround yet. I have pulled out the
> process from the journalctl and I am adding it as an attachment.

on framework laptop there is a configuration to perfom in the Bios, which is put the iGPU in mode UMA_GAME_OPTIMIZED which allocated more memory.
Don't know if this item is available on all manufacturers.

Comment 13 Mario Limonciello 2024-02-27 17:35:54 UTC
There are some reports upstream https://gitlab.freedesktop.org/drm/amd/-/issues/3187#note_2294589 that it's fixed in 6.8-rc5.
Can you please cross reference 6.8-rc6 for a test?

Comment 14 Gurney Buchanan 2024-03-26 18:27:33 UTC
I'm experiencing the same issue when plugging my FW13 into a 75hz 1440p Samsung Display with power delivery.  In my case - I'm able to reproduce this issue 100% of the time!

Comment 15 Michael D 2024-03-28 03:25:11 UTC
(In reply to Mario Limonciello from comment #13)
> There are some reports upstream
> https://gitlab.freedesktop.org/drm/amd/-/issues/3187#note_2294589 that it's
> fixed in 6.8-rc5.
> Can you please cross reference 6.8-rc6 for a test?

Hello,

I have retested against F40 6.8.1-300.fc40.x86_64.
This still happens, and more frequently, on these newer kernels. I can trigger it after suspending for one evening, changing gnomes scaling factor, or by plugging in a usb4 -> displayport monitor.

I have since enabled UMA_GAME_OPTIMIZED and all flickering issues went away on my machine uptime is about 4 days currently.

Comment 16 mattwork 2024-04-01 21:40:16 UTC
Please cross post any updates for those still seeing this issue to https://gitlab.freedesktop.org/drm/amd/-/issues/3187 so AMD will see it.

Comment 17 Mario Limonciello 2024-04-09 12:40:40 UTC
This is not a kernel bug.  It's confirmed to be a BIOS bug that is fixed by Framework BIOS 3.05.

https://community.frame.work/t/framework-laptop-13-ryzen-7040-bios-3-05-release-and-driver-bundle-beta/48276


Note You need to log in before you can comment on or make changes to this bug.