Red Hat Bugzilla – Bug 1250040
Sudden crashes of gnome-shell (possibly when new window opened)
Last modified: 2016-12-20 09:20:31 EST
Since upgrading to F22 I've experienced regular (every day or two) crashes of Gnome shell. I was running F21 for nearly a year on the same hardware without issue.
Anecdotally, it happens when I'm opening a new "window". e.g. immediately after clicking on "Document Viewer" when opening an attachment in Evolution; or opening the overview and starting typing to find an application.
abrt doesn't pick anything up. I *don't* get a Gnome "oops something went wrong message". It just immediately drops out to a black screen and after some flickering I'm back at the gdm login screen.
1) Where should I be looking to gather debug? I don't see anything in /var/log/messages or searching for gnome-shell with journalctl
2) How should I reset my Gnome configuration to get a clean/as-new config? See below for what I tried to delete on upgrade. I'm happy to test from a clean gnome config, but testing with a new user is harder given this take a day or two to crop up and this is my working machine (though it does reliably happen within a few days).
Thoughts about possible low incident setup that may be less tested on my laptop: upgraded from F21; has a touchscreen; rarely shut down, nearly always suspend to RAM.
When this started happening after upgrade I nuked my: .cache, .config/dconf/user, .gnome, .gnome2, .thumbnails.
Anecdotally, it feels like a get a longer period of stability if I delete .config/dconf/user -- I've done this a couple of times more since the upgrade -- but I don't have hard data for that, just a gut feeling. Could be coincidence.
I am experiencing the same gnome-shell crashes. abrt also doesn't pick up anything, and it does seem like it's at a time when a new window (was about to) open. Two conditions where I've noticed it:
- Click on firefox link that prompts for application to open file; pick gedit; gnome-shell crash.
- SELinux AVC notification; hover over notification, it expands with button that offers to open the SELinux AVC denial viewer application. Click on that, gnome-shell crashes.
The second (click to view AVC denial) has crashed gnome-shell every time I tried it. Ironically I can't figure out how to cause an AVC denial on purpose, help on that needed if it helps debug this.
System is also F22 upgraded from F21 where this crash was not happening. No touchscreen here, but I do use a second/external monitor that's configured as the primary when connected.
Another condition I've just experience it. In a terminal window and running
$ evince ~/path/to/files/*.pdf
Still nothing picked up by abrt (and I'm still not sure what debug to gather/check).
Another anecdotal observation: I'm pretty sure that before the crash there wasn't the notification filled circle beside my clock on the top bar; but after the crash and Gnome restart there was a notification circle.
I've had a period of a couple of weeks where everything seemed stable (and I thought the issue might have been resolved), but alas no. I'll check my package upgrade log and see if there's anything I can try reverting; and also consider any usage changes during the period (I was on leave from work; which doesn't change much in terms of applications used, although my diary had much less in [fewer notifications?])
Note that I have had abrt pick up bug #1243011 previously, but only once. The behaviour here has happened many time, so I don't know if these are related.
Indeed, this sounds awfully like Bug 1243011. Looks like there's a common cause that doesn't get caught by ABRT. See my comment on that bug about how to get at least system messages to get more details.
Noting that I've therefore added logs and commentary for bug #1243011.
And also wondering whether anything came through in updates in the last few days that have exacerbated the situation. For the last couple of days I've had this occur several times a day (perhaps after every resume). See my speculation in the other bug that it might possibly be load correlated.
I've moved the description back here, as bug #1243011 is about the after effects.
(In reply to Owen Taylor from bug #1243011 comment #33)
> It's interesting that you are hitting crashes (that particular crash?) so
> much - from our incoming bug reports this is not typical - so there's
> probably something about your setup which is making the bug more likely. Are
> you using any extensions?
I did have the "Launch new instance" option on, now turned off; and the "Lock Keys" extension installed but disabled, now removed.
The crash still occurs. It did happen again (some time after, but same session) a period of the laptop feeling laggy.
(In reply to Owen Taylor from comment #34)
> backtraces - this crash looks like if object initialization fails a
> subsequent garbage collection might crash like that.
"WARNING: Lost name on bus: org.gnome.SessionManager" is consistently the moment in the log when the problem occurs, every time.
(In reply to Kevin R. Page from comment #5)
> I've moved the description back here, as bug #1243011 is about the after
> (In reply to Owen Taylor from bug #1243011 comment #33)
> > It's interesting that you are hitting crashes (that particular crash?) so
> > much - from our incoming bug reports this is not typical - so there's
> > probably something about your setup which is making the bug more likely. Are
> > you using any extensions?
> I did have the "Launch new instance" option on, now turned off; and the
> "Lock Keys" extension installed but disabled, now removed.
> The crash still occurs. It did happen again (some time after, but same
> session) a period of the laptop feeling laggy.
> (In reply to Owen Taylor from comment #34)
> > backtraces - this crash looks like if object initialization fails a
> > subsequent garbage collection might crash like that.
> search for? (also see my logs attached to bug #1243011).
> "WARNING: Lost name on bus: org.gnome.SessionManager" is consistently the
> moment in the log when the problem occurs, every time.
Looking at those logs, I suspect that the abrt backtrace in https://bugzilla.redhat.com/show_bug.cgi?id=1243011#c32 is independent from the regular crashes you are seeing. That crash probably didn't even log you out from the session. If you can find the exact point in your systemd logs where that specific crash occurred, then a JS backtrace would look like:
Oct 22 15:22:04 unused gnome-session: (gnome-shell:1554): Gjs-WARNING **: JS ERROR: TypeError: a._connection is null
Oct 22 15:22:04 unused gnome-session: NMVPNSection<.setActiveConnections/<@resource:///org/gnome/shell/ui/status/network.js:1534
Oct 22 15:22:04 unused gnome-session: NMVPNSection<.setActiveConnections@resource:///org/gnome/shell/ui/status/network.js:1533
Oct 22 15:22:04 unused gnome-session: wrapper@resource:///org/gnome/gjs/modules/lang.js:169
Oct 22 15:22:04 unused gnome-session: NMApplet<._syncVPNConnections@resource:///org/gnome/shell/ui/status/network.js:1822
Oct 22 15:22:04 unused gnome-session: wrapper@resource:///org/gnome/gjs/modules/lang.js:169
(Not *in exactly* - this is just a random trace I found in my log - but in the Gjs-Warning **: JS ERROR followed by a number of lines giving the detail fo the backtrace)
My best guess for the regular crashes is that your X server is crashing, and that's taking the GNOME session down - the "Lost name on bus" message seems like that would be one of the first messages in that case.
What I don't understand is why there is no logging from the X server in the logs and no indication of an abnormal exit of the X server.
* What graphics card and drivers are you using?
* When/with what Fedora version was this machine originally installed? (The timestamp on /root/anaconda-ks.cfg will tell you if you forgot)
* Are current logs found in /var/log/Xorg.<n>.log? Do you see any indication of an abnormal exit there?
Thanks for your comments and help. I put journald logs in:
and subsequent comments, but I can post more if needed. I'd done a system log dump just after the crash on a number of other occasions.
There's a Jjs warning in https://bugzilla.redhat.com/attachment.cgi?id=1077350
although after the crash. I'm not seeing any JS messages near the problem. This is a grep for js from the last log I took:
Oct 21 17:35:04 wordsworth gnome-session: (gnome-shell:1548): Gjs-WARNING **: JS ERROR: could not get remote objects for service org.gnome.SettingsDaemon.Smartcard path /org/gnome/SettingsDaemon/Smartcard: Gio.DBusError: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SettingsDaemon.Smartcard was not provided by any .service files
Oct 21 17:35:04 wordsworth gnome-session: _proxyInvoker/asyncCallback@resource:///org/gnome/gjs/modules/overrides/Gio.js:83
Oct 21 23:55:58 wordsworth firefox.desktop: pbu_isWindowPrivate@resource://gre/modules/PrivateBrowsingUtils.jsm:25:14
Oct 21 23:55:58 wordsworth firefox.desktop: nsBrowserAccess.prototype.openURI@chrome://browser/content/browser.js:15876:21
Oct 21 23:57:23 wordsworth firefox.desktop: pbu_isWindowPrivate@resource://gre/modules/PrivateBrowsingUtils.jsm:25:14
Oct 21 23:57:23 wordsworth firefox.desktop: nsBrowserAccess.prototype.openURI@chrome://browser/content/browser.js:15876:21
Where the crash then happened Oct 22 10:49:43 (from the Lost name on bus). i.e. these were all the day before.
Laptop is a Thinkpad X210 with Intel i915 "Haswell-ULT Integrated Graphics Controller". Thoughts about possible low incident setup that may be less tested on my laptop: rarely shut down, nearly always suspend to RAM; home directories are on ecryptfs automounted via pam.
anaconda-ks is dates 22nd October 2014, which tallies with my recollection of installing Fedora 20, which was stable for me , as was Fedora 21.
My Xorg logs look pretty stale:
-rw-r--r--. 1 root root 67451 Feb 6 2015 Xorg.0.log
-rw-r--r--. 1 root root 601472 Feb 2 2015 Xorg.0.log.old
-rw-r--r--. 1 root root 33015 Dec 12 2014 Xorg.1.log
One other anecdotal piece of information: this always seems to happen when launching a new window. It's definitely happened with opening an evince window (sometime from a firefox download dialog, sometimes from a terminal). I'm pretty sure it's happened for other applications, and definitely from the overview. I'm pretty sure it's never happened opening a new firefox window (where one is already exists). Mentioning as the sample size is now large enough that I'd have thought it would have happened by now if for any window.
The only time this stopped happening for any significant period was when I was on leave. At that time I had disabled my work email account in Evolution (exchange). It could be coincidence, and obviously it shouldn't blow out the shell even if it is connected. Evolution does seem to use a lot of resource reconnecting; and I can't remember an occurrence of the crash that wasn't preceded by a laggy feeling machine for a period (even if a few hours earlier).
Owen: I've experienced similar crashes. I'm using the intel GPU driver. That's indeed quite puzzling, as gnome-session and gnome-shell appear to exit without any error nor crash.
I had reported it at https://bugzilla.gnome.org/show_bug.cgi?id=752722 I blamed journalctl because I had seen it crash, while I'm not sure at all. What's certain is that memory pressure and/or high I/O load is a factor, as Kevin noted. This might explain why I've not seen any crashes for a while, having stopped working with large data in RAM for a few weeks.
Also to note I perceive this as being less frequent over the last couple of weeks. I think I got from last Friday to Tuesday without a crash. There has been at least one kernel update applied before that. Could be coincidence, of course.
(In reply to Kevin R. Page from comment #9)
> Also to note I perceive this as being less frequent over the last couple of
> weeks. I think I got from last Friday to Tuesday without a crash. There has
> been at least one kernel update applied before that. Could be coincidence,
> of course.
Do you mean you have experienced the crash at least once since the kernel update? That wouldn't be a good sign. Other than that, I also haven't observed the crash for some time, quite possibly due to kernel updates, and Philipp said the same on bug 1243011,
(In reply to Milan Bouchet-Valat from comment #10)
> (In reply to Kevin R. Page from comment #9)
> > Also to note I perceive this as being less frequent over the last couple of
> > weeks. I think I got from last Friday to Tuesday without a crash. There has
> > been at least one kernel update applied before that. Could be coincidence,
> > of course.
> Do you mean you have experienced the crash at least once since the kernel
> update? That wouldn't be a good sign. Other than that, I also haven't
> observed the crash for some time, quite possibly due to kernel updates, and
> Philipp said the same on bug 1243011,
For the previous update, which I was referring to at the time (4.1.10), no, I'm afraid I had one crash in the period since it's update. But this was much more stable that the preceding few weeks when things were much worse (typically 1 crash/day).
Since then there's been another kernel update (4.2.3). No crash so far.
To report this has just happened again, so not fixed by the natural course of updates. Happened after/during memory pressure and/or high I/O load after resuming from suspend.
Created attachment 1088337 [details]
gdb trace during hang
I've just seen it again too, so it's not fixed by newer kernels.
One interesting point is that the GDM gnome-shell hung using 100% CPU for about one minute. I got a gdb trace showing the activity was related to D-Bus. Then the process received a SIGPIPE when calling write(), and got back to being usable again soon or immediately after. Maybe the bug is in D-Bus after all? That would explain the sudden "lost name on bus" message.
Having updated to F23 late last week, I'm afraid this issue is still occurring.
gnome-session-binary: WARNING: Lost name on bus: org.gnome.SessionManager
is still the logged message at the point of the crash, with no logged message in the moment immediately before.
This wasn't happening on F21 during many months of use; it started immediately following the upgrade to F22.
Created attachment 1096455 [details]
After upgrading F23 I've tried running Gnome on Wayland to see if that would resolve the crashes. It hasn't.
See attached for a log of the crash tonight, again shortly after suspend, and I strongly suspect occuring at the point the captive portal window should have been created. But this time running Gnome on Wayland.
The "wordsworth gnome-session-binary: WARNING: Lost name on bus: org.gnome.SessionManager" is the moment or the crash. There is a segfault just following, though.
The laptop had actually been up much longer than I've recently been experiencing. This might be due to running on Wayland as I was for this session, or it could of course just be coincidence.
Created attachment 1103814 [details]
journalctl at moment of crash
Well, the bad news is this problem is still happening.
The better news is that:
- it's happening less frequently. I think I got through about 2 weeks without issue before today's crash
- interestingly Evolution was killed by OOM just before the shell was blown away; I've not seen this before. There was ~15 minutes of a heavily loaded system (which always seems to foreshadow the crash), then I had control of my desktop back for ~1 minutes, in which I noticed Evolution had exited, before the shell suddenly quit "as usual".
- the system log (attached) is more potentially more informative than usual. The heavy load started around 09:08. There are warnings from Evo:
Dec 09 09:03:22 wordsworth evolution.desktop: (evolution:9838): evolution-util-WARNING **: Event already in progress.
an gdm-x-session backtrace:
Dec 09 09:15:29 wordsworth /usr/libexec/gdm-x-session: (EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
some rt monitor warnings:
Dec 09 09:16:14 wordsworth rtkit-daemon: The canary thread is apparently starving. Taking action.
kernel out of memory warnings onwards from:
Dec 09 09:22:26 wordsworth kernel: Unable to purge GPU memory due lock contention.
A kernel memory report with a big number for Evo:
Dec 09 09:22:26 wordsworth kernel: [ 9838] 1976 9838 4683259 1578922 7141 22 1867920 0 evolution
Then some more Evo related warnings before the lost connection when the shell crashes:
Dec 09 09:25:59 wordsworth kernel: Pid 25468(evolution-addre) over core_pipe_limit
Dec 09 09:25:59 wordsworth kernel: Skipping core dump
Dec 09 09:26:02 wordsworth gnome-session-binary: WARNING: Lost name on bus: org.gnome.SessionManager
Full log attached. Happy to upload more if helpful.
I have good news for you, I faced the same issue with Fedora 23 and Dell laptop.
I got every day gnome-session crashes when typing to find an application or opening new window. What gave me hint to solve the issue, is when it crashed during dnf update...
What I saw (for a reason that I don't get yet), lot of duplicate packages on my system with both i686 and x86_64 architecture were present... maybe one application that all messed up recently (steam ??)
To solve the issue I removed the duple i686 intel driver package and kept only x86_64 (sudo dnf list installed | grep libva-intel)
Then I followed step by step this recent blog post (seems we are not alone):
Hope will solve it too for you !
Laurent, thank you for the tip, I will give this a try.
Also noting that the incidence of this problem has been noticeably lower in the last few weeks anyway.
I did indeed have a number of i686 packages installed, which I have removed (listed below for reference).
It looks like most, if not all, of these packages are pulled in as dependencies for the skype.i586 package. I'll need to reinstall that at some point, but will certainly test for as long as I can without these packages.
Even if there is causation, this would seem like a bug, surely?
This is occurring less frequently for me of late, although I suspect this may be due to a reduction in whatever is causing the high load situation than a change in the underlying bug.
It happened again today, so is still ongoig. Same context: recent high load/swapping/lag; in overview about to open a new application; telltale "Lost name on bus" in logs at point of crash.
I've taken the suggestion to remove i686 packages as circumstantial. I did have to reinstall a subset of the i686 packages a couple of weeks ago to get my hplip printer drivers to work -- if there's a solid reason to try further testing with the i686 packages uninstalled please say (obviously I'd need to find a workaround to not being able to print directly for multiple weeks).
See Bug 1300212. I really suspect this is related to systemd-journald aborting on high I/O load.
Just for information, I did not have anymore occurence of the issue since my last comment (and for sure lot of display driver/kernel updates since that time). I indeed have also i686 packages, seem it's not the root cause, maybe it was just related to removal of dupes found by:
dnf repoquery --duplicated
I don't know how I can help you as I don't reproduce it anymore (even on heavy load of my laptop).I will notify here if it comes back.
(In reply to Milan Bouchet-Valat from comment #20)
> See Bug 1300212. I really suspect this is related to systemd-journald
> aborting on high I/O load.
Thanks. Is there any way I can test to confirm this is the root cause (or not)?
Might I assume that increaing the watchdog timeout could therefore be a workaround?
(also noting that bug 1300212 "most likely indicates some other issues on the system" without any specific check)
I don't have any details, unfortunately, but I have repeatedly seen systemd messages in the logs at the same time as the session logout (bug 1297229, bug 1199442).
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora 'version'
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 23 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
Thank you for reporting this bug and we are sorry it could not be fixed.