Bug 1656905

Summary: screen not coming back from dpms
Product: Red Hat Enterprise Linux 8 Reporter: Jiri Koten <jkoten>
Component: mutterAssignee: Jonas Ådahl <jadahl>
Status: CLOSED CURRENTRELEASE QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: fmuellner, jwboyer, rstrode, tpelka
Target Milestone: rc   
Target Release: 8.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: mutter-3.28.3-9.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-14 00:55:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1635157, 1657660    
Attachments:
Description Flags
session log
none
inhibit-frame-clockwhen-power-saving.patch
none
renderer-native: Skip eglSwapBuffers when last EGLStream flip was busy
none
inhibit-frame-clockwhen-power-saving.patch
none
avoid-eglstream-deadlock.patch
none
inhibit-frame-clockwhen-power-saving.patch
none
avoid-eglstream-deadlock.patch
none
eglstream-mailbox-mode.patch
none
eglstream-mailbox-mode.patch (v2) none

Description Jiri Koten 2018-12-06 16:04:07 UTC
Created attachment 1512175 [details]
session log

Description of problem:
Running Wayland with nvidia binary drivers, the screen is not coming back from dpms after idle or lock screen.


Version-Release number of selected component (if applicable):
mutter-3.28.3-5.el8
kernel-4.18.0-48.el8
nvidia drivers 410.78

How reproducible:
100%

Steps to Reproduce:
1. Disable the gdm udev rule for nvidia
2. enable nvidia modeset
3. Login to session
4. Lock screen
5. Wake up the screen

Actual results:
The screen remains blank

Expected results:
The screen turns on and shows lock screen

Additional info:

Comment 1 Jonas Ådahl 2019-01-03 16:43:18 UTC
Created attachment 1518187 [details]
inhibit-frame-clockwhen-power-saving.patch

This patch fixes suspend/resume issue. The problem was that we'll skip page flipping when power saving, but might already had scheduled a paint, meaning we'd still eglSwapBuffers(). When resuming, we'd paint again, call eglSwapBuffers(), which would dead lock since there is no room in the FIFO, as the previous buffer was not page flipped out.

Comment 2 Jonas Ådahl 2019-01-03 16:46:37 UTC
Created attachment 1518188 [details]
renderer-native: Skip eglSwapBuffers when last EGLStream flip was busy

This patch fixes a similar dead lock to the power saving one, but one that can't be avoided, but must be worked around.

The problem is that when changing mode, we recreate streams with the new resolution, and the first page flip seems to always fail. We work around this by pretending nothing happened, except avoid calling eglSwapBuffers() until a page flip finally worked. Currently there is no way to get notified when a page flip will actually be possible, so some way of try-until-it-works is currently necessary, but nvidia is looking into alternative ways to solve this.

Comment 3 Ray Strode [halfline] 2019-01-03 17:47:41 UTC
(In reply to Jonas Ådahl from comment #1)
> Created attachment 1518187 [details]
> inhibit-frame-clockwhen-power-saving.patch
Looks like this is just a list of patches, and not the patches

(In reply to Jonas Ådahl from comment #2)
> The problem is that when changing mode, we recreate streams with the new
> resolution, and the first page flip seems to always fail. We work around
> this by pretending nothing happened, except avoid calling eglSwapBuffers()
> until a page flip finally worked. Currently there is no way to get notified
> when a page flip will actually be possible, so some way of
> try-until-it-works is currently necessary, but nvidia is looking into
> alternative ways to solve this.
Hmm so it's not that the next flip fails, but that flips for the next N ms's fail?

I mean, does a sleep (1); call fix it too?

The thing is, we may end up losing drawing with the current approach, I think. there's nothing to enforce another flip after the one we ignored, right?

Comment 4 Jonas Ådahl 2019-01-03 17:48:56 UTC
Created attachment 1518189 [details]
inhibit-frame-clockwhen-power-saving.patch

Now attaching with actual content.

Comment 5 Jonas Ådahl 2019-01-03 17:52:20 UTC
(In reply to Ray Strode [halfline] from comment #3)
> (In reply to Jonas Ådahl from comment #1)
> > Created attachment 1518187 [details]
> > inhibit-frame-clockwhen-power-saving.patch
> Looks like this is just a list of patches, and not the patches
> 
> (In reply to Jonas Ådahl from comment #2)
> > The problem is that when changing mode, we recreate streams with the new
> > resolution, and the first page flip seems to always fail. We work around
> > this by pretending nothing happened, except avoid calling eglSwapBuffers()
> > until a page flip finally worked. Currently there is no way to get notified
> > when a page flip will actually be possible, so some way of
> > try-until-it-works is currently necessary, but nvidia is looking into
> > alternative ways to solve this.
> Hmm so it's not that the next flip fails, but that flips for the next N ms's
> fail?
> 
> I mean, does a sleep (1); call fix it too?
> 
> The thing is, we may end up losing drawing with the current approach, I
> think. there's nothing to enforce another flip after the one we ignored,
> right?

We'll flip somewhat old content (the one from the first eglSwapBuffers() instead of what seems to be always be the second). I'm not sure sleep(1) will avoid the issue, but I guess I can test.

Comment 8 Jonas Ådahl 2019-01-04 10:48:59 UTC
Created attachment 1518330 [details]
avoid-eglstream-deadlock.patch

This is the alternative solution, which does the following:

* If eglstream page flipping fails with 'busy', retry again once after 17ms
* If eglstream page flipping fails for any other reason or on the second try, continue as normal but inhibit eglSwapBuffers() calls

The first part should for the most part already avoid the dead lock, and the second part is more of a backup plan.

Comment 9 Jonas Ådahl 2019-01-07 09:45:54 UTC
Created attachment 1518936 [details]
inhibit-frame-clockwhen-power-saving.patch

Updated power saving frame clock inhibitation patch, as the old one had a bug.

Comment 10 Jonas Ådahl 2019-01-07 09:46:44 UTC
Created attachment 1518938 [details]
avoid-eglstream-deadlock.patch

Updated the eglSwapBuffers() deadlock patch after having rebased it on top of the frame clock inhibitation patch.

Comment 13 Jonas Ådahl 2019-01-11 15:17:31 UTC
Created attachment 1520049 [details]
eglstream-mailbox-mode.patch

After some discussion with an Nvidia engineer, here is an alternative, much less invasive, patch, that avoids the dead locks. It works by changing the EGLStream to operate in mailbox mode, instead of FIFO mode.

Comment 14 Ray Strode [halfline] 2019-01-11 15:23:06 UTC
much nicer.  I think i would drop the g_warning, since it's going to show up in normal operation right ?

I imagine since we're buffering less frames this may have some performance considerations? but I guess it just brings us to par with the other drivers anyway?

Let's do it.

Comment 15 Ray Strode [halfline] 2019-01-11 15:24:25 UTC
I just want to add, that not messing with the frame clock seems like a very good thing from a "less change is less risk" perspective.

Comment 16 Jonas Ådahl 2019-01-11 15:42:12 UTC
Created attachment 1520050 [details]
eglstream-mailbox-mode.patch (v2)

New version of the EGLStream mailbox patch. Changes since last version is only warning on non "busy" flip errors, and swapping the order of the commits. As for risk vs performance, concur that the decreased risk of a lot less invasive changes has bigger value than the (so far non-noticable) risk of worsened performance.

Comment 18 Jiri Koten 2019-02-12 13:16:03 UTC
Verified in mutter-3.28.3-17.el8.

kernel-4.18.0-67.el8
NVidia 410.93