Bug 1656905 - screen not coming back from dpms
Summary: screen not coming back from dpms
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: mutter
Version: 8.0
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Jonas Ådahl
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks: 1635157 1657660
TreeView+ depends on / blocked
 
Reported: 2018-12-06 16:04 UTC by Jiri Koten
Modified: 2019-06-14 00:55 UTC (History)
4 users (show)

Fixed In Version: mutter-3.28.3-9.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-14 00:55:29 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
session log (34.60 KB, text/x-vhdl)
2018-12-06 16:04 UTC, Jiri Koten
no flags Details
inhibit-frame-clockwhen-power-saving.patch (123 bytes, patch)
2019-01-03 16:43 UTC, Jonas Ådahl
no flags Details | Diff
renderer-native: Skip eglSwapBuffers when last EGLStream flip was busy (2.95 KB, patch)
2019-01-03 16:46 UTC, Jonas Ådahl
no flags Details | Diff
inhibit-frame-clockwhen-power-saving.patch (4.99 KB, patch)
2019-01-03 17:48 UTC, Jonas Ådahl
no flags Details | Diff
avoid-eglstream-deadlock.patch (6.60 KB, patch)
2019-01-04 10:48 UTC, Jonas Ådahl
no flags Details | Diff
inhibit-frame-clockwhen-power-saving.patch (7.71 KB, patch)
2019-01-07 09:45 UTC, Jonas Ådahl
no flags Details | Diff
avoid-eglstream-deadlock.patch (6.60 KB, patch)
2019-01-07 09:46 UTC, Jonas Ådahl
no flags Details | Diff
eglstream-mailbox-mode.patch (3.06 KB, patch)
2019-01-11 15:17 UTC, Jonas Ådahl
no flags Details | Diff
eglstream-mailbox-mode.patch (v2) (2.91 KB, patch)
2019-01-11 15:42 UTC, Jonas Ådahl
no flags Details | Diff

Description Jiri Koten 2018-12-06 16:04:07 UTC
Created attachment 1512175 [details]
session log

Description of problem:
Running Wayland with nvidia binary drivers, the screen is not coming back from dpms after idle or lock screen.


Version-Release number of selected component (if applicable):
mutter-3.28.3-5.el8
kernel-4.18.0-48.el8
nvidia drivers 410.78

How reproducible:
100%

Steps to Reproduce:
1. Disable the gdm udev rule for nvidia
2. enable nvidia modeset
3. Login to session
4. Lock screen
5. Wake up the screen

Actual results:
The screen remains blank

Expected results:
The screen turns on and shows lock screen

Additional info:

Comment 1 Jonas Ådahl 2019-01-03 16:43:18 UTC
Created attachment 1518187 [details]
inhibit-frame-clockwhen-power-saving.patch

This patch fixes suspend/resume issue. The problem was that we'll skip page flipping when power saving, but might already had scheduled a paint, meaning we'd still eglSwapBuffers(). When resuming, we'd paint again, call eglSwapBuffers(), which would dead lock since there is no room in the FIFO, as the previous buffer was not page flipped out.

Comment 2 Jonas Ådahl 2019-01-03 16:46:37 UTC
Created attachment 1518188 [details]
renderer-native: Skip eglSwapBuffers when last EGLStream flip was busy

This patch fixes a similar dead lock to the power saving one, but one that can't be avoided, but must be worked around.

The problem is that when changing mode, we recreate streams with the new resolution, and the first page flip seems to always fail. We work around this by pretending nothing happened, except avoid calling eglSwapBuffers() until a page flip finally worked. Currently there is no way to get notified when a page flip will actually be possible, so some way of try-until-it-works is currently necessary, but nvidia is looking into alternative ways to solve this.

Comment 3 Ray Strode [halfline] 2019-01-03 17:47:41 UTC
(In reply to Jonas Ådahl from comment #1)
> Created attachment 1518187 [details]
> inhibit-frame-clockwhen-power-saving.patch
Looks like this is just a list of patches, and not the patches

(In reply to Jonas Ådahl from comment #2)
> The problem is that when changing mode, we recreate streams with the new
> resolution, and the first page flip seems to always fail. We work around
> this by pretending nothing happened, except avoid calling eglSwapBuffers()
> until a page flip finally worked. Currently there is no way to get notified
> when a page flip will actually be possible, so some way of
> try-until-it-works is currently necessary, but nvidia is looking into
> alternative ways to solve this.
Hmm so it's not that the next flip fails, but that flips for the next N ms's fail?

I mean, does a sleep (1); call fix it too?

The thing is, we may end up losing drawing with the current approach, I think. there's nothing to enforce another flip after the one we ignored, right?

Comment 4 Jonas Ådahl 2019-01-03 17:48:56 UTC
Created attachment 1518189 [details]
inhibit-frame-clockwhen-power-saving.patch

Now attaching with actual content.

Comment 5 Jonas Ådahl 2019-01-03 17:52:20 UTC
(In reply to Ray Strode [halfline] from comment #3)
> (In reply to Jonas Ådahl from comment #1)
> > Created attachment 1518187 [details]
> > inhibit-frame-clockwhen-power-saving.patch
> Looks like this is just a list of patches, and not the patches
> 
> (In reply to Jonas Ådahl from comment #2)
> > The problem is that when changing mode, we recreate streams with the new
> > resolution, and the first page flip seems to always fail. We work around
> > this by pretending nothing happened, except avoid calling eglSwapBuffers()
> > until a page flip finally worked. Currently there is no way to get notified
> > when a page flip will actually be possible, so some way of
> > try-until-it-works is currently necessary, but nvidia is looking into
> > alternative ways to solve this.
> Hmm so it's not that the next flip fails, but that flips for the next N ms's
> fail?
> 
> I mean, does a sleep (1); call fix it too?
> 
> The thing is, we may end up losing drawing with the current approach, I
> think. there's nothing to enforce another flip after the one we ignored,
> right?

We'll flip somewhat old content (the one from the first eglSwapBuffers() instead of what seems to be always be the second). I'm not sure sleep(1) will avoid the issue, but I guess I can test.

Comment 8 Jonas Ådahl 2019-01-04 10:48:59 UTC
Created attachment 1518330 [details]
avoid-eglstream-deadlock.patch

This is the alternative solution, which does the following:

* If eglstream page flipping fails with 'busy', retry again once after 17ms
* If eglstream page flipping fails for any other reason or on the second try, continue as normal but inhibit eglSwapBuffers() calls

The first part should for the most part already avoid the dead lock, and the second part is more of a backup plan.

Comment 9 Jonas Ådahl 2019-01-07 09:45:54 UTC
Created attachment 1518936 [details]
inhibit-frame-clockwhen-power-saving.patch

Updated power saving frame clock inhibitation patch, as the old one had a bug.

Comment 10 Jonas Ådahl 2019-01-07 09:46:44 UTC
Created attachment 1518938 [details]
avoid-eglstream-deadlock.patch

Updated the eglSwapBuffers() deadlock patch after having rebased it on top of the frame clock inhibitation patch.

Comment 13 Jonas Ådahl 2019-01-11 15:17:31 UTC
Created attachment 1520049 [details]
eglstream-mailbox-mode.patch

After some discussion with an Nvidia engineer, here is an alternative, much less invasive, patch, that avoids the dead locks. It works by changing the EGLStream to operate in mailbox mode, instead of FIFO mode.

Comment 14 Ray Strode [halfline] 2019-01-11 15:23:06 UTC
much nicer.  I think i would drop the g_warning, since it's going to show up in normal operation right ?

I imagine since we're buffering less frames this may have some performance considerations? but I guess it just brings us to par with the other drivers anyway?

Let's do it.

Comment 15 Ray Strode [halfline] 2019-01-11 15:24:25 UTC
I just want to add, that not messing with the frame clock seems like a very good thing from a "less change is less risk" perspective.

Comment 16 Jonas Ådahl 2019-01-11 15:42:12 UTC
Created attachment 1520050 [details]
eglstream-mailbox-mode.patch (v2)

New version of the EGLStream mailbox patch. Changes since last version is only warning on non "busy" flip errors, and swapping the order of the commits. As for risk vs performance, concur that the decreased risk of a lot less invasive changes has bigger value than the (so far non-noticable) risk of worsened performance.

Comment 18 Jiri Koten 2019-02-12 13:16:03 UTC
Verified in mutter-3.28.3-17.el8.

kernel-4.18.0-67.el8
NVidia 410.93


Note You need to log in before you can comment on or make changes to this bug.