Bug 1540986 - gnome-shell fails to start Xwayland with stalled entries in /tmp/.X11-unix/
gnome-shell fails to start Xwayland with stalled entries in /tmp/.X11-unix/
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: mutter (Show other bugs)
7.5
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Florian Müllner
Desktop QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-02-01 08:13 EST by Martin Krajnak
Modified: 2018-05-17 10:39 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-04-10 09:11:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
core dump (944.39 KB, application/x-gzip)
2018-02-01 08:16 EST, Martin Krajnak
no flags Details
backtrace and core dump (4.93 MB, application/zip)
2018-02-01 10:34 EST, Martin Krajnak
no flags Details
joutnalctl (233.91 KB, text/x-vhdl)
2018-02-01 12:07 EST, Martin Krajnak
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0770 None None None 2018-04-10 09:12 EDT

  None (edit)
Description Martin Krajnak 2018-02-01 08:13:11 EST
Description of problem:
I have 3 users on laptop. Joining the wayland session for first user ended up on login screen, see the following log provided by journactl:

Feb 01 13:23:57 localhost.localdomain accounts-daemon[815]: g_dbus_interface_skeleton_unexport: assertion 'interface_->priv->connections != NULL' failed
Feb 01 13:23:58 localhost.localdomain kernel: radeon_dp_aux_transfer_native: 158 callbacks suppressed
Feb 01 13:23:58 localhost.localdomain gnome-shell[2594]: Failed to apply DRM plane transform 0: Invalid argument
Feb 01 13:23:58 localhost.localdomain gnome-shell[2594]: Failed to apply DRM plane transform 0: Invalid argument
Feb 01 13:23:58 localhost.localdomain gnome-shell[2594]: failed to bind to /tmp/.X11-unix/X1: Address already in use
Feb 01 13:23:58 localhost.localdomain kernel: traps: gnome-shell[2594] trap int3 ip:7f3179f94381 sp:7fff8c8b51b0 error:0
Feb 01 13:23:58 localhost.localdomain gnome-shell[2594]: Failed to start X Wayland
Feb 01 13:23:58 localhost.localdomain abrt-hook-ccpp[2602]: Process 2594 (gnome-shell) of user 1000 killed by SIGTRAP - dumping core
Feb 01 13:23:58 localhost.localdomain gnome-session[2520]: gnome-session-binary[2520]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 5
Feb 01 13:23:58 localhost.localdomain gnome-session-binary[2520]: WARNING: Application 'org.gnome.Shell.desktop' killed by signal 5
Feb 01 13:23:58 localhost.localdomain gnome-session-binary[2520]: Unrecoverable failure in required component org.gnome.Shell.desktop
Feb 01 13:23:58 localhost.localdomain org.gtk.vfs.Daemon[2515]: A connection to the bus can't be made
Feb 01 13:23:58 localhost.localdomain gdm-password][2488]: pam_unix(gdm-password:session): session closed for user test
Feb 01 13:23:58 localhost.localdomain gdm[1307]: GdmDisplay: display lasted 1.869975 seconds
Feb 01 13:23:59 localhost.localdomain systemd-logind[821]: Removed session 3.
Feb 01 13:23:59 localhost.localdomain systemd[1]: Removed slice User Slice of test.
Feb 01 13:23:59 localhost.localdomain systemd[1]: Stopping User Slice of test.
Feb 01 13:23:59 localhost.localdomain abrt-server[2603]: Duplicate: core backtrace
Feb 01 13:23:59 localhost.localdomain abrt-server[2603]: DUP_OF_DIR: /var/spool/abrt/ccpp-2018-02-01-12:26:47-3795
Feb 01 13:23:59 localhost.localdomain abrt-server[2603]: Deleting problem directory ccpp-2018-02-01-13:23:58-2594 (dup of ccpp-2018-02-01-12:26:47-3795)
Feb 01 13:23:59 localhost.localdomain abrt-server[2603]: Email address of sender was not specified. Would you like to do so now? If not, 'user@localhost' is to be used [y/N] 
Feb 01 13:23:59 localhost.localdomain abrt-server[2603]: Email address of receiver was not specified. Would you like to do so now? If not, 'root@localhost' is to be used [y/N] 
Feb 01 13:23:59 localhost.localdomain abrt-server[2603]: Sending an email...
Feb 01 13:23:59 localhost.localdomain abrt-server[2603]: Sending a notification email to: root@localhost
Feb 01 13:23:59 localhost.localdomain postfix/pickup[1861]: 6F63661F3271: uid=0 from=<user@localhost>
Feb 01 13:23:59 localhost.localdomain abrt-server[2603]: Email was sent to: root@localhost

If I try one of the other users the session always started with successful login but after running xrandr I found out that the session is actually running on X.

I've also tried running mutter -r wayland and it was running fine but without gnome-shell visible.


Version-Release number of selected component (if applicable):
kernel-3.10.0-838.el7.x86_64
xorg-x11-server-Xwayland-1.19.5-2.el7.x86_64
gnome-session-wayland-session-3.26.1-10.el7.x86_64
gnome-shell-3.26.2-2.el7.x86_64


How reproducible:
always

Steps to Reproduce:
1.install gnome-session-wayland-session
2.logout/reboot
3.change session to wayland
4.try to login

Actual results:
Login attempt will end up back on the login screen or in the X session

Expected results:
Wayland session should load successfully 

Additional info:
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Thames [Radeon HD 7550M/7570M/7650M] [1002:6841]
Comment 2 Martin Krajnak 2018-02-01 08:16 EST
Created attachment 1389493 [details]
core dump
Comment 3 Olivier Fourdan 2018-02-01 10:12:35 EST
The core file (attachment 1389493 [details]) gives:

(gdb) bt
#0  0x00007fb65cdcb1b7 in raise () from /usr/lib64/libc.so.6
#1  0x00007fb65cdcc8a8 in abort () from /usr/lib64/libc.so.6
#2  0x000000000058f1da in OsAbort () at utils.c:1361
#3  0x0000000000594ce3 in AbortServer () at log.c:877
#4  0x0000000000595b2d in FatalError (f=f@entry=0x59cf78 "failed to read Wayland events: %s\n") at log.c:1015
#5  0x0000000000424a42 in xwl_read_events (xwl_screen=0x1dec750) at xwayland.c:594
#6  0x000000000058cef2 in ospoll_wait (ospoll=0x1de3130, timeout=<optimized out>) at ospoll.c:412
#7  0x0000000000585fb3 in WaitForSomething (are_ready=0) at WaitFor.c:226
#8  0x00000000005531e1 in Dispatch () at dispatch.c:422
#9  0x000000000055744a in dix_main (argc=11, argv=0x7fff7802e158, envp=<optimized out>) at main.c:287
#10 0x00007fb65cdb7385 in __libc_start_main () from /usr/lib64/libc.so.6
#11 0x00000000004240fe in _start ()

This is typical of the Wayland compositor dying, what this tells is that the Wayland compositor is dead and Xwayland cannot read from the Wayland socket.

I thinkg this is the problem:

Feb 01 13:23:58 localhost.localdomain kernel: traps: gnome-shell[2594] trap int3 ip:7f3179f94381 sp:7fff8c8b51b0 error:0
Comment 4 Olivier Fourdan 2018-02-01 10:16:16 EST
Feb 01 13:23:58 localhost.localdomain abrt-hook-ccpp[2602]: Process 2594 (gnome-shell) of user 1000 killed by SIGTRAP - dumping core

Can you provide this core file from gnome-shell please?
Comment 5 Martin Krajnak 2018-02-01 10:34 EST
Created attachment 1389600 [details]
backtrace and core dump

I think that should be it
Comment 6 Olivier Fourdan 2018-02-01 11:37:26 EST
Now that's funny, gnome-shell fails with “Failed to start X Wayland” and Xwayland fails because it can't talk to gnome-shell...

I wonder if that could be a dupe of bug 1529175, can you please provide the full journalctl logs?
Comment 7 Martin Krajnak 2018-02-01 12:07 EST
Created attachment 1389636 [details]
joutnalctl

Here it is just note that's I wasn't able to retrieve the logs from previous boots so I reproduced the crash again.
Comment 8 Olivier Fourdan 2018-02-01 12:46:13 EST
Can you check (for that given user) if you have:

  ~/.config/gnome-session/saved-session/org.gnome.Shell.desktop

And if so, paste its content in here?
Comment 9 Olivier Fourdan 2018-02-02 05:47:20 EST
Alternatively, can you reproduce with a freshly created user?

1. Create a new user on the system
2. Select that user in gdm
3. Chose the session “GNOME on Wayland”

That would rule out a possible session issue
Comment 10 Tomas Pelka 2018-02-02 05:50:02 EST
(In reply to Olivier Fourdan from comment #8)
> Can you check (for that given user) if you have:
> 
>   ~/.config/gnome-session/saved-session/org.gnome.Shell.desktop
> 
> And if so, paste its content in here?

~/.config/gnome-session/saved-session/ for all three users test, test2, test3 are empty so there are no saved sessions.
Comment 11 Olivier Fourdan 2018-02-02 06:38:18 EST
Might be stalled entries in “/tmp/.X11-unix/” which oprevent Xwayland from starting, what gives “ls /tmp/.X11-unix/” and “fuser /tmp/.X11-unix/X?” 

If thse X1, X2 entries are not used by any process, can you try removing them and then retry login in GNOME on Wayland?
Comment 12 Martin Krajnak 2018-02-02 07:06:56 EST
Yes, that will be it, after removing X1 and X2 entries wayland session starts successfully for all users.
Comment 13 Olivier Fourdan 2018-02-02 07:58:53 EST
Xwayland (like all Xorg servers) has the “-displayfd” options which let Xwayland pick the right display available and provide its number to the caller via the provided fd.

mutter should use this to avoid dealing (badly) with stalled entries in /tmp/.X11-unix/ as in this issue.
Comment 14 Olivier Fourdan 2018-02-02 08:29:31 EST
I should have looked closer to the code, mutter is already using -displayfd, so the problem lies in the way mutter tries to find the display.
Comment 17 Martin Krajnak 2018-02-08 09:47:55 EST
mutter-3.26.2-8.el7.x86_64
gnome-shell-3.26.2-3.el7.x86_64
gnome-session-wayland-session-3.26.1-10.el7.x86_64

I can successfully log to wayland
Entries for justification:
➜  ~ ls /tmp/.X11-unix/
X0  X1  X2
Comment 20 errata-xmlrpc 2018-04-10 09:11:47 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0770

Note You need to log in before you can comment on or make changes to this bug.