Since upgrading to F22 I've experienced regular (every day or two) crashes of Gnome shell. I was running F21 for nearly a year on the same hardware without issue. Anecdotally, it happens when I'm opening a new "window". e.g. immediately after clicking on "Document Viewer" when opening an attachment in Evolution; or opening the overview and starting typing to find an application. abrt doesn't pick anything up. I *don't* get a Gnome "oops something went wrong message". It just immediately drops out to a black screen and after some flickering I'm back at the gdm login screen. 1) Where should I be looking to gather debug? I don't see anything in /var/log/messages or searching for gnome-shell with journalctl 2) How should I reset my Gnome configuration to get a clean/as-new config? See below for what I tried to delete on upgrade. I'm happy to test from a clean gnome config, but testing with a new user is harder given this take a day or two to crop up and this is my working machine (though it does reliably happen within a few days). Thoughts about possible low incident setup that may be less tested on my laptop: upgraded from F21; has a touchscreen; rarely shut down, nearly always suspend to RAM. When this started happening after upgrade I nuked my: .cache, .config/dconf/user, .gnome, .gnome2, .thumbnails. Anecdotally, it feels like a get a longer period of stability if I delete .config/dconf/user -- I've done this a couple of times more since the upgrade -- but I don't have hard data for that, just a gut feeling. Could be coincidence. Thank you.
I am experiencing the same gnome-shell crashes. abrt also doesn't pick up anything, and it does seem like it's at a time when a new window (was about to) open. Two conditions where I've noticed it: - Click on firefox link that prompts for application to open file; pick gedit; gnome-shell crash. - SELinux AVC notification; hover over notification, it expands with button that offers to open the SELinux AVC denial viewer application. Click on that, gnome-shell crashes. The second (click to view AVC denial) has crashed gnome-shell every time I tried it. Ironically I can't figure out how to cause an AVC denial on purpose, help on that needed if it helps debug this. System is also F22 upgraded from F21 where this crash was not happening. No touchscreen here, but I do use a second/external monitor that's configured as the primary when connected.
Another condition I've just experience it. In a terminal window and running $ evince ~/path/to/files/*.pdf Still nothing picked up by abrt (and I'm still not sure what debug to gather/check). Another anecdotal observation: I'm pretty sure that before the crash there wasn't the notification filled circle beside my clock on the top bar; but after the crash and Gnome restart there was a notification circle. I've had a period of a couple of weeks where everything seemed stable (and I thought the issue might have been resolved), but alas no. I'll check my package upgrade log and see if there's anything I can try reverting; and also consider any usage changes during the period (I was on leave from work; which doesn't change much in terms of applications used, although my diary had much less in [fewer notifications?]) Note that I have had abrt pick up bug #1243011 previously, but only once. The behaviour here has happened many time, so I don't know if these are related.
Indeed, this sounds awfully like Bug 1243011. Looks like there's a common cause that doesn't get caught by ABRT. See my comment on that bug about how to get at least system messages to get more details.
Noting that I've therefore added logs and commentary for bug #1243011. And also wondering whether anything came through in updates in the last few days that have exacerbated the situation. For the last couple of days I've had this occur several times a day (perhaps after every resume). See my speculation in the other bug that it might possibly be load correlated.
I've moved the description back here, as bug #1243011 is about the after effects. (In reply to Owen Taylor from bug #1243011 comment #33) > It's interesting that you are hitting crashes (that particular crash?) so > much - from our incoming bug reports this is not typical - so there's > probably something about your setup which is making the bug more likely. Are > you using any extensions? I did have the "Launch new instance" option on, now turned off; and the "Lock Keys" extension installed but disabled, now removed. The crash still occurs. It did happen again (some time after, but same session) a period of the laptop feeling laggy. (In reply to Owen Taylor from comment #34) > I'd also be interested if your your journalctl output shows any Javascript > backtraces - this crash looks like if object initialization fails a > subsequent garbage collection might crash like that. I can't see anything relating to Javascript -- is there anything specific to search for? (also see my logs attached to bug #1243011). "WARNING: Lost name on bus: org.gnome.SessionManager" is consistently the moment in the log when the problem occurs, every time.
(In reply to Kevin R. Page from comment #5) > I've moved the description back here, as bug #1243011 is about the after > effects. > > (In reply to Owen Taylor from bug #1243011 comment #33) > > It's interesting that you are hitting crashes (that particular crash?) so > > much - from our incoming bug reports this is not typical - so there's > > probably something about your setup which is making the bug more likely. Are > > you using any extensions? > > I did have the "Launch new instance" option on, now turned off; and the > "Lock Keys" extension installed but disabled, now removed. > > The crash still occurs. It did happen again (some time after, but same > session) a period of the laptop feeling laggy. > > (In reply to Owen Taylor from comment #34) > > I'd also be interested if your your journalctl output shows any Javascript > > backtraces - this crash looks like if object initialization fails a > > subsequent garbage collection might crash like that. > > I can't see anything relating to Javascript -- is there anything specific to > search for? (also see my logs attached to bug #1243011). > > "WARNING: Lost name on bus: org.gnome.SessionManager" is consistently the > moment in the log when the problem occurs, every time. Looking at those logs, I suspect that the abrt backtrace in https://bugzilla.redhat.com/show_bug.cgi?id=1243011#c32 is independent from the regular crashes you are seeing. That crash probably didn't even log you out from the session. If you can find the exact point in your systemd logs where that specific crash occurred, then a JS backtrace would look like: Oct 22 15:22:04 unused gnome-session[1547]: (gnome-shell:1554): Gjs-WARNING **: JS ERROR: TypeError: a._connection is null Oct 22 15:22:04 unused gnome-session[1547]: NMVPNSection<.setActiveConnections/<@resource:///org/gnome/shell/ui/status/network.js:1534 Oct 22 15:22:04 unused gnome-session[1547]: NMVPNSection<.setActiveConnections@resource:///org/gnome/shell/ui/status/network.js:1533 Oct 22 15:22:04 unused gnome-session[1547]: wrapper@resource:///org/gnome/gjs/modules/lang.js:169 Oct 22 15:22:04 unused gnome-session[1547]: NMApplet<._syncVPNConnections@resource:///org/gnome/shell/ui/status/network.js:1822 Oct 22 15:22:04 unused gnome-session[1547]: wrapper@resource:///org/gnome/gjs/modules/lang.js:169 (Not *in exactly* - this is just a random trace I found in my log - but in the Gjs-Warning **: JS ERROR followed by a number of lines giving the detail fo the backtrace) My best guess for the regular crashes is that your X server is crashing, and that's taking the GNOME session down - the "Lost name on bus" message seems like that would be one of the first messages in that case. What I don't understand is why there is no logging from the X server in the logs and no indication of an abnormal exit of the X server. * What graphics card and drivers are you using? * When/with what Fedora version was this machine originally installed? (The timestamp on /root/anaconda-ks.cfg will tell you if you forgot) * Are current logs found in /var/log/Xorg.<n>.log? Do you see any indication of an abnormal exit there?
Thanks for your comments and help. I put journald logs in: https://bugzilla.redhat.com/show_bug.cgi?id=1243011#c27 and subsequent comments, but I can post more if needed. I'd done a system log dump just after the crash on a number of other occasions. There's a Jjs warning in https://bugzilla.redhat.com/attachment.cgi?id=1077350 although after the crash. I'm not seeing any JS messages near the problem. This is a grep for js from the last log I took: Oct 21 17:35:04 wordsworth gnome-session[1477]: (gnome-shell:1548): Gjs-WARNING **: JS ERROR: could not get remote objects for service org.gnome.SettingsDaemon.Smartcard path /org/gnome/SettingsDaemon/Smartcard: Gio.DBusError: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SettingsDaemon.Smartcard was not provided by any .service files Oct 21 17:35:04 wordsworth gnome-session[1477]: _proxyInvoker/asyncCallback@resource:///org/gnome/gjs/modules/overrides/Gio.js:83 Oct 21 23:55:58 wordsworth firefox.desktop[3790]: pbu_isWindowPrivate@resource://gre/modules/PrivateBrowsingUtils.jsm:25:14 Oct 21 23:55:58 wordsworth firefox.desktop[3790]: nsBrowserAccess.prototype.openURI@chrome://browser/content/browser.js:15876:21 Oct 21 23:57:23 wordsworth firefox.desktop[3790]: pbu_isWindowPrivate@resource://gre/modules/PrivateBrowsingUtils.jsm:25:14 Oct 21 23:57:23 wordsworth firefox.desktop[3790]: nsBrowserAccess.prototype.openURI@chrome://browser/content/browser.js:15876:21 Where the crash then happened Oct 22 10:49:43 (from the Lost name on bus). i.e. these were all the day before. Laptop is a Thinkpad X210 with Intel i915 "Haswell-ULT Integrated Graphics Controller". Thoughts about possible low incident setup that may be less tested on my laptop: rarely shut down, nearly always suspend to RAM; home directories are on ecryptfs automounted via pam. anaconda-ks is dates 22nd October 2014, which tallies with my recollection of installing Fedora 20, which was stable for me , as was Fedora 21. My Xorg logs look pretty stale: -rw-r--r--. 1 root root 67451 Feb 6 2015 Xorg.0.log -rw-r--r--. 1 root root 601472 Feb 2 2015 Xorg.0.log.old -rw-r--r--. 1 root root 33015 Dec 12 2014 Xorg.1.log One other anecdotal piece of information: this always seems to happen when launching a new window. It's definitely happened with opening an evince window (sometime from a firefox download dialog, sometimes from a terminal). I'm pretty sure it's happened for other applications, and definitely from the overview. I'm pretty sure it's never happened opening a new firefox window (where one is already exists). Mentioning as the sample size is now large enough that I'd have thought it would have happened by now if for any window. The only time this stopped happening for any significant period was when I was on leave. At that time I had disabled my work email account in Evolution (exchange). It could be coincidence, and obviously it shouldn't blow out the shell even if it is connected. Evolution does seem to use a lot of resource reconnecting; and I can't remember an occurrence of the crash that wasn't preceded by a laggy feeling machine for a period (even if a few hours earlier).
Owen: I've experienced similar crashes. I'm using the intel GPU driver. That's indeed quite puzzling, as gnome-session and gnome-shell appear to exit without any error nor crash. I had reported it at https://bugzilla.gnome.org/show_bug.cgi?id=752722 I blamed journalctl because I had seen it crash, while I'm not sure at all. What's certain is that memory pressure and/or high I/O load is a factor, as Kevin noted. This might explain why I've not seen any crashes for a while, having stopped working with large data in RAM for a few weeks.
Also to note I perceive this as being less frequent over the last couple of weeks. I think I got from last Friday to Tuesday without a crash. There has been at least one kernel update applied before that. Could be coincidence, of course.
(In reply to Kevin R. Page from comment #9) > Also to note I perceive this as being less frequent over the last couple of > weeks. I think I got from last Friday to Tuesday without a crash. There has > been at least one kernel update applied before that. Could be coincidence, > of course. Do you mean you have experienced the crash at least once since the kernel update? That wouldn't be a good sign. Other than that, I also haven't observed the crash for some time, quite possibly due to kernel updates, and Philipp said the same on bug 1243011,
(In reply to Milan Bouchet-Valat from comment #10) > (In reply to Kevin R. Page from comment #9) > > Also to note I perceive this as being less frequent over the last couple of > > weeks. I think I got from last Friday to Tuesday without a crash. There has > > been at least one kernel update applied before that. Could be coincidence, > > of course. > > Do you mean you have experienced the crash at least once since the kernel > update? That wouldn't be a good sign. Other than that, I also haven't > observed the crash for some time, quite possibly due to kernel updates, and > Philipp said the same on bug 1243011, For the previous update, which I was referring to at the time (4.1.10), no, I'm afraid I had one crash in the period since it's update. But this was much more stable that the preceding few weeks when things were much worse (typically 1 crash/day). Since then there's been another kernel update (4.2.3). No crash so far.
To report this has just happened again, so not fixed by the natural course of updates. Happened after/during memory pressure and/or high I/O load after resuming from suspend.
Created attachment 1088337 [details] gdb trace during hang I've just seen it again too, so it's not fixed by newer kernels. One interesting point is that the GDM gnome-shell hung using 100% CPU for about one minute. I got a gdb trace showing the activity was related to D-Bus. Then the process received a SIGPIPE when calling write(), and got back to being usable again soon or immediately after. Maybe the bug is in D-Bus after all? That would explain the sudden "lost name on bus" message.
Having updated to F23 late last week, I'm afraid this issue is still occurring. gnome-session-binary[2420]: WARNING: Lost name on bus: org.gnome.SessionManager is still the logged message at the point of the crash, with no logged message in the moment immediately before. This wasn't happening on F21 during many months of use; it started immediately following the upgrade to F22.
Created attachment 1096455 [details] journalctl log After upgrading F23 I've tried running Gnome on Wayland to see if that would resolve the crashes. It hasn't. See attached for a log of the crash tonight, again shortly after suspend, and I strongly suspect occuring at the point the captive portal window should have been created. But this time running Gnome on Wayland. The "wordsworth gnome-session-binary[1751]: WARNING: Lost name on bus: org.gnome.SessionManager" is the moment or the crash. There is a segfault just following, though. The laptop had actually been up much longer than I've recently been experiencing. This might be due to running on Wayland as I was for this session, or it could of course just be coincidence.
Created attachment 1103814 [details] journalctl at moment of crash Well, the bad news is this problem is still happening. The better news is that: - it's happening less frequently. I think I got through about 2 weeks without issue before today's crash - interestingly Evolution was killed by OOM just before the shell was blown away; I've not seen this before. There was ~15 minutes of a heavily loaded system (which always seems to foreshadow the crash), then I had control of my desktop back for ~1 minutes, in which I noticed Evolution had exited, before the shell suddenly quit "as usual". - the system log (attached) is more potentially more informative than usual. The heavy load started around 09:08. There are warnings from Evo: Dec 09 09:03:22 wordsworth evolution.desktop[9838]: (evolution:9838): evolution-util-WARNING **: Event already in progress. an gdm-x-session backtrace: Dec 09 09:15:29 wordsworth /usr/libexec/gdm-x-session[2353]: (EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed. some rt monitor warnings: Dec 09 09:16:14 wordsworth rtkit-daemon[976]: The canary thread is apparently starving. Taking action. kernel out of memory warnings onwards from: Dec 09 09:22:26 wordsworth kernel: Unable to purge GPU memory due lock contention. A kernel memory report with a big number for Evo: Dec 09 09:22:26 wordsworth kernel: [ 9838] 1976 9838 4683259 1578922 7141 22 1867920 0 evolution Then some more Evo related warnings before the lost connection when the shell crashes: Dec 09 09:25:59 wordsworth kernel: Pid 25468(evolution-addre) over core_pipe_limit Dec 09 09:25:59 wordsworth kernel: Skipping core dump Dec 09 09:26:02 wordsworth gnome-session-binary[2471]: WARNING: Lost name on bus: org.gnome.SessionManager Full log attached. Happy to upload more if helpful.
Hi Kevin, I have good news for you, I faced the same issue with Fedora 23 and Dell laptop. I got every day gnome-session crashes when typing to find an application or opening new window. What gave me hint to solve the issue, is when it crashed during dnf update... What I saw (for a reason that I don't get yet), lot of duplicate packages on my system with both i686 and x86_64 architecture were present... maybe one application that all messed up recently (steam ??) To solve the issue I removed the duple i686 intel driver package and kept only x86_64 (sudo dnf list installed | grep libva-intel) Then I followed step by step this recent blog post (seems we are not alone): http://pknowles.heuristic42.com/2015/10/fixing-duplicate-dnf-packages.html Hope will solve it too for you !
Laurent, thank you for the tip, I will give this a try. Also noting that the incidence of this problem has been noticeably lower in the last few weeks anyway. I did indeed have a number of i686 packages installed, which I have removed (listed below for reference). It looks like most, if not all, of these packages are pulled in as dependencies for the skype.i586 package. I'll need to reinstall that at some point, but will certainly test for as long as I can without these packages. Even if there is causation, this would seem like a bug, surely? alsa-lib i686 alsa-plugins-pulseaudio i686 audit-libs i686 bzip2-libs i686 cairo i686 cdparanoia-libs i686 clucene09-core i686 cracklib i686 dbus-libs i686 elfutils-libelf i686 elfutils-libs i686 expat i686 flac-libs i686 fontconfig i686 freetype i686 glib2 i686 glibc i686 graphite2 i686 gsm i686 gstreamer i686 gstreamer-plugins-base i686 gstreamer1 i686 gstreamer1-plugins-base i686 harfbuzz i686 jbigkit-libs i686 json-c i686 keyutils-libs i686 krb5-libs i686 lcms-libs i686 lcms2 i686 libICE i686 libSM i686 libX11 i686 libXScrnSaver i686 libXau i686 libXcursor i686 libXdamage i686 libXext i686 libXfixes i686 libXft i686 libXi i686 libXinerama i686 libXrandr i686 libXrender i686 libXtst i686 libXv i686 libXxf86vm i686 libacl i686 libasyncns i686 libattr i686 libcap i686 libcom_err i686 libdatrie i686 libdb i686 libdrm i686 libedit i686 libffi i686 libgcrypt i686 libgpg-error i686 libidn i686 libjpeg-turbo i686 libmng i686 libogg i686 libpciaccess i686 libpng i686 libseccomp i686 libselinux i686 libsndfile i686 libstdc++ i686 libthai i686 libtheora i686 libtiff i686 libuuid i686 libverto i686 libvisual i686 libvorbis i686 libwayland-client i686 libwayland-server i686 libwebp i686 libxcb i686 libxml2 i686 libxshmfence i686 libxslt i686 llvm-libs i686 mesa-libEGL i686 mesa-libGL i686 mesa-libgbm i686 mesa-libglapi i686 mmdtsdec i686 ncurses-libs i686 nss-softokn-freebl i686 openssl-libs i686 orc i686 pam i686 pango i686 pcre i686 pixman i686 proj i686 pulseaudio-libs i686 qt i686 qt-mobility-common i686 qt-mobility-location i686 qt-mobility-sensors i686 qt-x11 i686 qtwebkit i686 readline i686 sqlite i686 systemd-libs i686 tcp_wrappers-libs i686 texlive-kpathsea-lib i686 xz-libs i686 zlib i686
This is occurring less frequently for me of late, although I suspect this may be due to a reduction in whatever is causing the high load situation than a change in the underlying bug. It happened again today, so is still ongoig. Same context: recent high load/swapping/lag; in overview about to open a new application; telltale "Lost name on bus" in logs at point of crash. I've taken the suggestion to remove i686 packages as circumstantial. I did have to reinstall a subset of the i686 packages a couple of weeks ago to get my hplip printer drivers to work -- if there's a solid reason to try further testing with the i686 packages uninstalled please say (obviously I'd need to find a workaround to not being able to print directly for multiple weeks).
See Bug 1300212. I really suspect this is related to systemd-journald aborting on high I/O load.
Just for information, I did not have anymore occurence of the issue since my last comment (and for sure lot of display driver/kernel updates since that time). I indeed have also i686 packages, seem it's not the root cause, maybe it was just related to removal of dupes found by: dnf repoquery --duplicated I don't know how I can help you as I don't reproduce it anymore (even on heavy load of my laptop).I will notify here if it comes back.
(In reply to Milan Bouchet-Valat from comment #20) > See Bug 1300212. I really suspect this is related to systemd-journald > aborting on high I/O load. Thanks. Is there any way I can test to confirm this is the root cause (or not)? Might I assume that increaing the watchdog timeout could therefore be a workaround? (also noting that bug 1300212 "most likely indicates some other issues on the system" without any specific check)
I don't have any details, unfortunately, but I have repeatedly seen systemd messages in the logs at the same time as the session logout (bug 1297229, bug 1199442).
This message is a reminder that Fedora 23 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 23. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '23'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 23 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.