Bug 1367666 - [Wayland] Stability is worse compared to X11 session due to intolerance for display server or gnome-shell crashes
[Wayland] Stability is worse compared to X11 session due to intolerance for d...
Status: NEW
Product: Fedora
Classification: Fedora
Component: gnome-shell (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Owen Taylor
Fedora Extras Quality Assurance
: Reopened, Tracking
: 1479408 (view as bug list)
Depends On:
Blocks: WaylandRelated
  Show dependency treegraph
 
Reported: 2016-08-17 03:45 EDT by Christian Stadelmann
Modified: 2018-07-17 02:13 EDT (History)
32 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-10-18 08:15:14 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
GNOME Bugzilla 759538 None None None 2018-04-25 11:31 EDT

  None (edit)
Description Christian Stadelmann 2016-08-17 03:45:15 EDT
Description of problem:
On X11, when the X server dies, gnome-shell is able to restart it. On wayland, when the wayland compositor dies, all GUI applications crash and the session dies, so all work is lost.

On X11, when gnome-shell crashes, it can be restarted. Apart from some windows not remembering their previous position or workspace (happens sometimes), no work is lost. On wayland, when gnome-shell crashes, all GUI applications crash as a result and the session dies, so all work is lost.

This is a regression of Gnome+Wayland compared to Gnome+X11 sessions. Since both wayland and gnome-shell happen to crash a lot, this regression seriously affects user experience with Gnome. Please consider working on this before making Gnome+Wayland the default.

Version-Release number of selected component (if applicable):
Any gnome-shell version up to 3.20.x (3.21.x not tested) with wayland support.
Any wayland version up to 1.10.0 (newer versions not yet tested).

How reproducible:
Always due to design.

Steps to Reproduce:
1. Press Alt+F2
2. enter "r", confirm with [Enter] key

Actual results:
On wayland, gnome-shell tells you "Restarting is not available on Wayland".

Expected results:
Restart gnome-shell

Additional info:
For getting an idea how often gnome-shell crashes, have a look at the retrace server:
https://retrace.fedoraproject.org/faf/problems/?component_names=gnome-shell&associate=__None&daterange=2016-08-03%3A2016-08-17&bug_filter=None&function_names=&binary_names=&source_file_names=&since_version=&since_release=&to_version=&to_release=
https://retrace.fedoraproject.org/faf/stats/last_week/#Fedora%2024 (gnome-shell is often within top 5 of bugs per Fedora release
https://retrace.fedoraproject.org/faf/reports/?component_names=gnome-shell&associate=__None&first_occurrence_daterange=&last_occurrence_daterange=&order_by=last_occurrence

Thanks to Sébastien Wilmet for bringing this up.

One idea to achieve this goal is to split up gnome-shell into two processes:
1. the compositor / window manager
2. the GUI which is using crash-risky technologies like JavaScript
Comment 1 Christian Stadelmann 2016-08-17 03:59:06 EDT
An old bug closed with disabling restart on wayland: https://bugzilla.gnome.org/show_bug.cgi?id=741665
Comment 2 Olivier Fourdan 2016-08-17 05:21:07 EDT
(In reply to Christian Stadelmann from comment #0)
> On X11, when the X server dies, gnome-shell is able to restart it. On
> wayland, when the wayland compositor dies, all GUI applications crash and
> the session dies, so all work is lost.

Nope, that sentence is not accurate, on X11 if the X server dies, all X11 applications lose their connection with the X server, including the session manager, so the entire session goes along with it. Easy to try, log in GNOME on X11 and kill the X server....

If the window manager dies, though, the session manager can restart it, but it's not as bad as the X server crashing.

On Wayland, the window manger (mutter/gnome-shell) also plays the role of Wayland compositor so losing the compositor in Wayland will take the session with it (just like losing the X server in X11).
Comment 3 Olivier Fourdan 2016-08-17 05:25:02 EDT
Another issue that affects mutter/gnome-shell is its dependencies on Xwayland, so if Xwayland dies in GNOME on Wayland, mutter/gnome-shell also dies and the rest of the session follows.

Weston, for example, can survive a crash of Xwayland, X11 apps will be lost but native Wayland clients will continue to work.

See https://bugzilla.gnome.org/show_bug.cgi?id=759538
Comment 4 Matthias Clasen 2016-08-17 09:35:01 EDT
(In reply to Christian Stadelmann from comment #0)

> One idea to achieve this goal is to split up gnome-shell into two processes:
> 1. the compositor / window manager
> 2. the GUI which is using crash-risky technologies like JavaScript

Is it possible to architect wayland compositors in this way ? Certainly. Is it feasible to change gnome-shell in this way as a 'bug fix' ? Almost certainly not.

Contributions to gnome-shell stability are more than welcome, but as it is, the suggestion is WONTFIX, I think.
Comment 5 Jonas Ådahl 2016-08-18 09:33:37 EDT
(In reply to Matthias Clasen from comment #4)
> (In reply to Christian Stadelmann from comment #0)
> 
> > One idea to achieve this goal is to split up gnome-shell into two processes:
> > 1. the compositor / window manager
> > 2. the GUI which is using crash-risky technologies like JavaScript
> 
> Is it possible to architect wayland compositors in this way ? Certainly. Is
> it feasible to change gnome-shell in this way as a 'bug fix' ? Almost
> certainly not.
> 
> Contributions to gnome-shell stability are more than welcome, but as it is,
> the suggestion is WONTFIX, I think.

I know I have been talking about this before but I think eventually we really want to do the compositing/UI split. Not now, since it's a huge undertaking and will involve a massive amount of work, but eventually it's the solution that I think we should work towards - long term. The reason is not only because of stability, but for responsiveness and other things as well.

I think for this bug, a WONTFIX is appropriate. There is no single "fix" for stability to do here. We just need to hunt down as many mutter/gnome-shell crashes as we can.
Comment 6 Jonas Ådahl 2016-08-18 09:39:46 EDT
FWIW, for just making Alt-F2 "r" work, we could probably do something similar to what Enlightenment does: https://blogs.s-osg.org/recovery-journey-discovery/ . It'd involve protocol work as well as compositor and client support. To make it work for Xwayland is a completely different story (would need to design the protocol to take Xwayland into account, implement support in Xwayland and rewrite how Xwayland and mutter integrates so that Xwayland can survive a mutter restart). So it's also a non-trivial task that would take a lot of resources to fulfill, and wouldn't work for all clients anyway.
Comment 7 Christian Stadelmann 2016-08-19 17:41:54 EDT
(In reply to Jonas Ådahl from comment #5)
> I know I have been talking about this before but I think eventually we
> really want to do the compositing/UI split.

Any upstream plans on this?

> I think for this bug, a WONTFIX is appropriate. There is no single "fix" for
> stability to do here. We just need to hunt down as many mutter/gnome-shell
> crashes as we can.

This won't help much unless gnome-shell gets something like the "Session Recovery Extension" from the article you linked against. Right now, even if gnome-shell@wayland had just 10% of the bugs gnome-shell@x11 has (or had), user experience would suffer. The current design on wayland is "crash-prone", but it doesn't have to be this way.

(In reply to Jonas Ådahl from comment #6)
> FWIW, for just making Alt-F2 "r" work, we could probably do something
> similar to what Enlightenment does:
> https://blogs.s-osg.org/recovery-journey-discovery/

That looks like a possible resolution for this bug.
Comment 8 Roger Odle 2016-08-20 14:01:28 EDT
(In reply to Matthias Clasen from comment #4)
> (In reply to Christian Stadelmann from comment #0)
> 
> > One idea to achieve this goal is to split up gnome-shell into two processes:
> > 1. the compositor / window manager
> > 2. the GUI which is using crash-risky technologies like JavaScript
> 
> Is it possible to architect wayland compositors in this way ? Certainly. Is
> it feasible to change gnome-shell in this way as a 'bug fix' ? Almost
> certainly not.
> 
> Contributions to gnome-shell stability are more than welcome, but as it is,
> the suggestion is WONTFIX, I think.

It should be possible or why use Wayland? Separation is necessary for a good reliable system.  Hunting down bugs is a good thing but that does not replace the need for a good recovery strategy.  I have been experiencing a memory leak in gnome-shell since fedora 23.  This results in a general slow down.  I recover the performance by restarting gnome-shell.  I can do this without crashing applications.  I am not sure what the issue is but it seems to have something to do the GPU drivers leaking memory when playing video content on Firefox.  This issue has been reported else where but that is not my point.  My point is that gnome-shell must have the ability to perform a warm restart.

The compositor part of gnome-shell needs to be the most simplest, bullet-proof thing you can imagine. Keep the features to a minimum and separate the window instances from gnome-shell gui stuff.  Keep enough information so that gnome-shell-gui stuff can find the windows again by discovery when it restarts.  It may be done by having these compositor resources owned by the applications.  Then gnome-shell just needs to query the running applications to rebuild the desktop.  It might be as simple as sending each application a "window-damage" event. Above all, keep the display and keyboard alive so support system maintenance and recovery.
Comment 9 Matthias Clasen 2016-10-18 08:15:14 EDT
Not going to happen in f24
Comment 10 Jean-François Fortin Tam 2016-10-28 09:16:19 EDT
Well, this will also affect F25 and any further version until this is fixed (it was reported upstream as https://bugzilla.gnome.org/show_bug.cgi?id=741665 but just disabling the ability for the user to restart the shell is not fixing the core problem, just hiding it)
Comment 11 Olivier Crête 2016-10-28 09:19:18 EDT
Maybe we should investigate implementing something similar to what EFL-Wayland has? Which is to have app startup not be handled by the compositor (systemd-user would be the obvious candidate here).. And then add some code to be able to reconnect apps to the compositor when it crashes. 

Something like:
https://blogs.s-osg.org/recovery-journey-discovery/

To a lot of people, this is a pretty bad blocker to using Wayland, as gnome-shell will never be stable enough to be a "never crashes" component of the system.
Comment 12 Anass Ahmed 2016-10-28 10:04:12 EDT
If this is feasible before F25, it would be great, but I think it would be a lot of rewrite and re-factoring for a code never written with this feature in mind.

I believe that GNOME Shell on Wayland now is very stable, it hadn't crash on me except one time when I tried to use HDMI in a middle of a session and it returned me back to GDM, then it worked just fine.

But it's annoying that I lose the shell extensions every time one of them is crashing, and I need to save my work and close all apps to log out and log in again in order to restore them. I know, it's not GNOME Shell fault that some extensions are buggy but those extensions make our workflow more usable and I need some of them in my work.

I used to keep my session up for maybe a month without rebooting the laptop on Xorg, because if GNOME Shell crashed, or one of the extensions caused others to be disabled, or GNOME Shell just got slow over time, I can just use `Alt + F2 + 'r'` to fix the issue and move on.
Comment 13 Matthias Clasen 2016-10-28 10:57:40 EDT
(In reply to Olivier Crête from comment #11)

> To a lot of people, this is a pretty bad blocker to using Wayland, as
> gnome-shell will never be stable enough to be a "never crashes" component of
> the system.

Pretty negative attitude... good luck engineering workarounds then. As far as I am concerned, this is still WONTFIX. I'll leave this bug open now, but don't expect action here.
Comment 14 Matthias Clasen 2016-10-28 11:10:33 EDT
the efl thing doesn't actually help for crashes. and it is quite a bit of engineering just for a restart button...
Comment 15 Matthew Miller 2016-10-31 09:11:58 EDT
Matthias, where's the best place for upstream discussion of this?
Comment 16 Nate Case 2017-04-07 11:38:51 EDT
I've had about 1-2 gnome-shell crashes per month in Fedora 25.  It's extremely frustrating when it happens since it kills everything I'm working on.  From a functional standpoint I'd say it's about 75% as painful as a kernel crash, whereas a gnome-shell crash under Xorg was < 5% as painful as a kernel crash.

In general, achieving crash independence between applications and gnome-shell is a great goal, and Gnome 3 + Wayland's current behavior is a regression in this regard.  Remember how painful web browsing was before we had separate processes per tab/page?  It was a killer feature when introduced.

I agree with Matthias though that it makes sense to be a WONTFIX for the Red Hat / Fedora bugzillas.  Fixing this is a big upstream change that should probably be discussed on https://bugzilla.gnome.org/ or gnome-shell-list@gnome.org.
Comment 17 Matthew Miller 2017-11-03 15:34:55 EDT
*** Bug 1479408 has been marked as a duplicate of this bug. ***
Comment 18 Matthew Miller 2017-11-03 15:41:53 EDT
Could we consider prioritizing this? I currently see 332 open abrt bugs against gnome-shell, and the FAF report consistently shows thousands of crashes a week. I don't think gnome-shell is particularly crashy as software goes, but it's in a uniquely prominent position, and the fact that this goes from "minor blip" to "lose your session" is a cause for concern in the Wayland shift.

https://retrace.fedoraproject.org/faf/summary/?component_names=gnome-shell&daterange=2016-11-04%3A2017-11-03&resolution=w

We're also seeing a lot of F27 crash reports (see chart) despite F27 still being in beta and having an install base about ¹⁄₁₀₀th of F26 or F25. I'm concerned that this will lead to negative perception of the release overall.
Comment 19 Jonas Thiem 2017-11-03 19:07:07 EDT
For what it's worth, this issue and the lack of pen support in XWayland is the only reason I'm still running the GNOME Xorg Session on all of my Fedora computers..
Comment 20 Christian Stadelmann 2017-11-04 08:20:28 EDT
(In reply to Matthew Miller from comment #18)
> Could we consider prioritizing this? I currently see 332 open abrt bugs
> against gnome-shell, and the FAF report consistently shows thousands of
> crashes a week. I don't think gnome-shell is particularly crashy as software
> goes, but it's in a uniquely prominent position, and the fact that this goes
> from "minor blip" to "lose your session" is a cause for concern in the
> Wayland shift.

+1, so I've nominated it.
Comment 21 Jan Kurik 2017-11-09 07:05:42 EST
This bug has been accepted on the list of Prioritized bugs: https://meetbot.fedoraproject.org/fedora-meeting/2017-11-08/fedora_prioritized_bugs_and_issues.2017-11-08-15.06.log.html#l-30
Comment 22 Fedora End Of Life 2017-11-16 14:42:19 EST
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
Comment 23 Jan Kurik 2018-04-25 15:35:18 EDT
As this is an architectural issue and upstream is aware and working on it we are removing this bug from the Prioritized list
Comment 24 Jonas Thiem 2018-04-27 06:42:03 EDT
Will this include considerations to do a universal compositor <-> wm protocol for this, and unification of KDE/GNOME/... compositor efforts into a single project?

Right now there seems to be a lot of duplicate efforts (which at least to me as an outsider looks like a bad idea), and all the smaller window managers like xfce, i3, ... without the manpower to play along with this appear to be left behind.
Comment 25 Matthew Miller 2018-04-27 08:58:26 EDT
@Jonas I'll certainly bring up that idea with the developers, but I think that's out of scope for this bug.

Note You need to log in before you can comment on or make changes to this bug.