Bug 1593782

Summary: gnome-shell crash after mutter update
Product: Red Hat Enterprise Linux 7 Reporter: Tomas Hudziec <thudziec>
Component: gnome-shellAssignee: Florian Müllner <fmuellner>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: jbastian, jkoten, rstrode, thudziec, tpelka
Target Milestone: alphaKeywords: TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: mutter-3.28.2-4.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-30 10:26:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
backtrace of gnome-shell
none
backtrace of gnome-shell with debuginfos
none
backtrace of gnome-shell with gnome-desktop3 debuginfo
none
/var/log/messages
none
valgrind log file
none
valgrind log file with edited dconf package none

Description Tomas Hudziec 2018-06-21 15:09:14 UTC
Created attachment 1453528 [details]
backtrace of gnome-shell

Description of problem:
gnome-shell crashes after mutter update, either right after yum update or after gnome-shell restart (alt+f2, r).

Version-Release number of selected component (if applicable):
mutter-3.28.2-2.el7.x86_64
gnome-shell-3.28.2-1.el7.x86_64
xorg-x11-server-Xorg-1.20.0-0.1.el7.x86_64
gnome-desktop3-3.28.2-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. update mutter to 3.28.2-2.el7.x86_64
2. restart gnome-shell

Actual results:
Program received signal SIGSEGV, Segmentation fault.
0x00007fffbc6f9f73 in gnome_wall_clock_init () from /lib64/libgnome-desktop-3.so.17

Expected results:
no crash occurs

Additional info:
00:02.0 VGA compatible controller [0300]: Intel Corporation Skylake GT2 [HD Graphics 520] [8086:1916] (rev 07)

After downgrading mutter to version 3.28.2-1.el7.x86_64 problem is solved.

Comment 2 Tomas Pelka 2018-06-21 15:11:43 UTC
I can see this on 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02) too

Comment 3 Tomas Pelka 2018-06-25 07:08:28 UTC
Booting with selinux=0 won't help.

Comment 4 Ray Strode [halfline] 2018-06-26 13:41:42 UTC
do you have a file /etc/localtime ?  is it readable?

can you install gnome-desktop3 debuginfo and get an updated backtrace ?

Comment 5 Ray Strode [halfline] 2018-06-26 13:44:25 UTC
what's the output of 

╎❯ cat /proc/sys/fs/inotify/max_* 

on your system?

Comment 6 Tomas Pelka 2018-06-26 14:19:05 UTC
$ ls -lZ /etc/localtime
lrwxrwxrwx. root root system_u:object_r:locale_t:s0    /etc/localtime -> ../usr/share/zoneinfo/Europe/Prague

$ cat /proc/sys/fs/inotify/max_* 
16384
128
8192

Comment 7 Ray Strode [halfline] 2018-06-26 20:43:45 UTC
thanks. i'm still interested in the updated backtrace with gnome-desktop3 debuginfo installed. i'm also curious if autologin works ?  I don't think this is related to mutter. the changes between 1 and 2 are pretty small and look related.

Also what's the output of 

$ stat -L /etc/localtime

Comment 8 Tomas Pelka 2018-06-27 07:49:32 UTC
$ stat -L /etc/localtime
  File: ‘/etc/localtime’
  Size: 2312      	Blocks: 8          IO Block: 4096   regular file
Device: fd04h/64772d	Inode: 4402        Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:locale_t:s0
Access: 2018-06-27 09:24:40.850056985 +0200
Modify: 2018-05-07 09:03:35.000000000 +0200
Change: 2018-06-21 08:24:10.354316496 +0200
 Birth: -

Me or Tomas will get the backtrace as soon as we will be in mood to crash UI :P Kidding will get it today.

Comment 9 Tomas Hudziec 2018-06-27 15:12:30 UTC
Created attachment 1455066 [details]
backtrace of gnome-shell with debuginfos

I installed some debuginfos, but one for gnome-desktop3 could not be found to install: "Could not find debuginfo for main pkg: gnome-desktop3-3.28.2-2.el7.x86_64". Will try again tomorrow.

Comment 10 Tomas Pelka 2018-06-27 15:31:55 UTC
Seems to be in brew (https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=704838) please try to install from there.

Comment 11 Tomas Hudziec 2018-06-28 10:10:13 UTC
Created attachment 1455229 [details]
backtrace of gnome-shell with gnome-desktop3 debuginfo

Comment 12 Tomas Pelka 2018-07-12 09:08:57 UTC
Ray any other idea?

Comment 13 Ray Strode [halfline] 2018-07-13 13:57:32 UTC
So the crash is here:

```
0x00007fffbd2f9f73 in gnome_wall_clock_init (self=0x161d4d0) at gnome-wall-clock.c:74
74  self->priv->timezone = g_time_zone_new_local ();
```

That suggests that `self->priv` (or `self`) is `free`'d.  But self->priv literally just got allocated in the line before.

My best guess is that `g_time_zone_new_local ()` is `free`ing a dangling pointer or something like that.

I guess what we really need is a `valgrind` run…

Can you open up `/usr/share/applications/org.gnome.Shell.desktop`

and change the `Exec` line from:

```
Exec=/usr/bin/gnome-shell
```

to

```
Exec=valgrind /usr/bin/gnome-shell
```

and then post `/var/log/messages` after reproducing

Comment 14 Tomas Hudziec 2018-07-16 12:59:36 UTC
Created attachment 1459159 [details]
/var/log/messages

Messages from from valgrind run of gnome-shell.

Comment 15 Tomas Hudziec 2018-07-16 15:29:24 UTC
Created attachment 1459210 [details]
valgrind log file

Log file from
valgrind --log-file=/tmp/shell-%p.log /usr/bin/gnome-shell
in /usr/share/applications/org.gnome.Shell.desktop

Comment 16 Ray Strode [halfline] 2018-07-16 17:26:41 UTC
So your log shows gnome-shell getting killed instead of crashing.  Someone on irc said this to me the other day:

<hedgepigdaniel[m]> Have a look at the three MRs here: https://gitlab.gnome.org/GNOME/dconf/merge_requests
<hedgepigdaniel[m]> if you imagine that subscribing to changed notifications from within JS in gnome-shell can in itself cause changed signals to be emitted, and that all over gnome-shell (and extensions) things are done in response to changed signals... hopefully you can see the issue?
<hedgepigdaniel[m]> The most annoying part is that in my case (and I think many others) the only time its really guaranteed to happen is when you run the shell inside valgrind - in which case the shell never starts at all, it just gets stuck in infinite loops handling false changed signals

So, mind giving those merge requests a shot and seeing if they make valgrind work better?

Task info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=17196363

Comment 17 Tomas Hudziec 2018-07-16 18:04:53 UTC
Created attachment 1459232 [details]
valgrind log file with edited dconf package

Have installed edited dconf package and reproduced the issue with valgrind.

Comment 20 Tomas Pelka 2018-07-18 11:10:18 UTC
Just FYI it seems 

 Fix support for external monitor configurations - Resolves: #1585230

introduced the issue as -1 mutter was ok

Comment 21 Tomas Pelka 2018-07-18 14:11:48 UTC
Marking as ALPHA blocker, we should either resolve it or include -1 build to Aplha.

Comment 22 Ray Strode [halfline] 2018-07-18 14:18:38 UTC
but the external monitor fix wasn't added until -3 mutter, and comment 0 said the problem happens in -2 mutter

Comment 23 Ray Strode [halfline] 2018-07-18 16:47:42 UTC
so this ended up just being a buildroot issue.  the 7.6 gnome-desktop3 wasn't pushed to the main buildroot and mutter wasn't built against rhel-7.6-gnome  buildroot so gnome-shell ended up mapping two different copies of gnome-desktop library

Comment 25 Ray Strode [halfline] 2018-07-21 00:09:32 UTC
*** Bug 1606655 has been marked as a duplicate of this bug. ***

Comment 26 Tomas Hudziec 2018-07-30 13:09:16 UTC
Does not occur with mutter-3.28.2-4.el7 and higher. Switching to verified.

Comment 28 errata-xmlrpc 2018-10-30 10:26:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3140