Bug 1620584

Summary: XWayland crashes on start up and causes the terminal to be unresponsive.
Product: [Fedora] Fedora Reporter: Lukas Ruzicka <lruzicka>
Component: xorg-x11-serverAssignee: X/OpenGL Maintenance List <xgl-maint>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 29CC: alexl, awilliam, bskeggs, caillon+fedoraproject, gmarr, jglisse, john.j5live, lruzicka, namar66, ofourdan, rhughes, richard.shadbolt, robatino, rstrode, sandmann, sgallagh, xgl-maint, znmeb
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: RejectedBlocker
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Journactl logs from the described boot process.
none
journalctl log from Silverblue 29 XWayland crash
none
Updated log - much smaller - use this one!
none
Log of the rebase from Silverblue 28 to 29
none
Traceback generated from Xwayland crash dump
none
journalctl log from Xorg crash
none
gdb traceback from Xorg crash none

Description Lukas Ruzicka 2018-08-23 08:52:03 UTC
Created attachment 1478112 [details]
Journactl logs from the described boot process.

Description of problem:

When booting the computer into the GDM, the XWayland crashed and caused the computer to become partly unresponsive. GDM not only did not start, but I was not able to switch to any of the consoles using ALT-CTRL-F*. The only button combination it responded to was CTRL-ALT-DEL which enabled to reboot the computer.

Version-Release number of selected component (if applicable):

* Fedora 29
* xorg-x11-server-Xwayland-1.20.1-1.fc29.x86_64
* kernel 4.18.1-300.fc29.x86_64
* alternatively used kernel 4.17.14 from F28 -> same result

How reproducible:

Always

Steps to Reproduce:
1. Fedora 29 with all updates
2. Reboot computer and wait for it

Actual results:

This was in the journalctl:
====
Aug 23 09:45:08 platypus systemd-coredump[1523]: Process 1198 (Xwayland) of user 42 dumped core.
                                                 
                                                 Stack trace of thread 1198:
                                                 #0  0x00007f328da5753f raise (libc.so.6)
                                                 #1  0x00007f328da41895 abort (libc.so.6)
                                                 #2  0x0000000000594390 OsAbort (Xwayland)
                                                 #3  0x0000000000599629 AbortServer (Xwayland)
                                                 #4  0x000000000059a49d FatalError (Xwayland)
                                                 #5  0x000000000042e61c xwl_log_handler (Xwayland)
                                                 #6  0x00007f328e158790 n/a (libwayland-client.so.0)
                                                 #7  0x00007f328e1544dd wl_proxy_marshal_array_constructor_versioned (libwayland-client.so.0)
                                                 #8  0x00007f328e15474e wl_proxy_marshal_constructor (libwayland-client.so.0)
                                                 #9  0x00007f328e1557f9 wl_display_roundtrip_queue (libwayland-client.so.0)
                                                 #10 0x000000000042eadc xwl_screen_init (Xwayland)
                                                 #11 0x000000000055b9ab AddScreen (Xwayland)
                                                 #12 0x000000000043049e InitOutput (Xwayland)
                                                 #13 0x000000000055f587 dix_main (Xwayland)
                                                 #14 0x00007f328da43413 __libc_start_main (libc.so.6)
                                                 #15 0x000000000042e33e _start (Xwayland)
Aug 23 09:45:08 platypus audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-15>
Aug 23 09:45:09 platypus abrt-server[1531]: Deleting problem directory ccpp-2018-08-23-09:45:08.488710-1198 (dup of ccpp-2018-08-23-09:39:22.607751-4569)
Aug 23 09:45:09 platypus dbus-daemon[822]: [system] Activating service name='org.freedesktop.problems' requested by ':1.131' (uid=0 pid=1571 comm="/usr/bin>
Aug 23 09:45:09 platypus dbus-daemon[822]: [system] Successfully activated service 'org.freedesktop.problems'
====

Expected results:

Should work flawlessly.

Additional info:

I solved the situation in this way:

1. Used the isolate 3 kernel parametre.
2. Replace gdm with lightdm.
3. Computer boots with no problems and no trace of coredump in journalctl.

Comment 1 Fedora Blocker Bugs Application 2018-08-23 08:54:51 UTC
Proposed as a Blocker for 29-beta by Fedora user lruzicka using the blocker tracking app because:

 Since my machine was totally unusable and this behaviour violates the release criteria, I am proposing this as a blocker:
 
"No part of any release-blocking desktop's panel (or equivalent) configuration may crash on startup or be entirely non-functional."

Comment 2 Adam Williamson 2018-08-23 17:30:42 UTC
We at least need the actual full backtrace, I think - can you get the core file out of systemd and get the backtrace from it? We're gonna want to know what the 'FatalError' was...

Comment 3 Stephen Gallagher 2018-08-27 16:44:39 UTC
This is likely hardware-specific, as I'm not seeing this on F29 on my system. So yeah, more information is going to be needed, but right now I'd be -1 blocker, +1 FE unless we start getting a lot of other reports.

Comment 4 Geoffrey Marr 2018-09-04 20:40:27 UTC
Discussed during the 2018-09-04 blocker review meeting: [1]

The decision to classify this bug as a "Rejected Blocker" was made as crashes like this are obviously bad, but so far it does not seem to affect anyone else, and lruzicka could not reproduce it on the initially affected system during the meeting. We will reconsider if more information emerges.

[1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2018-09-04/f29-blocker-review.2018-09-04-16.01.txt

Comment 5 M. Edward (Ed) Borasky 2018-09-14 07:21 UTC
Created attachment 1483204 [details]
journalctl log from Silverblue 29 XWayland crash

Comment 6 M. Edward (Ed) Borasky 2018-09-14 07:24:13 UTC
I have this on Silverblue 29. I think it's GPU-related.

I have two machines, a workstation with an AMD Bonaire and a latptop with an Intel Core i5 with integrated graphics. The workstation has this issue (log attached) and the laptop doesn't - it comes up in Silverblue 29 just fine.

Comment 7 M. Edward (Ed) Borasky 2018-09-14 07:29 UTC
Created attachment 1483207 [details]
Updated log - much smaller - use this one!

I thought I had pruned out the stuff before the Reboot but I didn't. This one is pruned.

Comment 8 Olivier Fourdan 2018-09-14 07:40:18 UTC
Humm, neither logs are useful to investigate why Xwayland crashed:

  Sep 13 23:58:50 Silverblue systemd-coredump[1112]: Core file was truncated to 2147483648 bytes.
  Sep 13 23:58:51 Silverblue systemd-coredump[1112]: Process 1097 (Xwayland) of user 42 dumped core.
                                                     Stack trace of thread 1102:
                                                     #0  0x00007f8d23907090 n/a (n/a)

And that's it...

But to me this is not the most important question. More importantly is why does this hang the system.

When Xwayland crashes, gnome-shell/mutter cannot survive and will terminate as well, which will terminate the user session and the system will return to the gdm login screen. If I understand correctly, that did not happen there.

We see that gnome-shell exited and gdm detected it:

  Sep 13 23:58:51 Silverblue gnome-session-binary[1002]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1

Systemd detected it as well:

  Sep 13 23:58:52 Silverblue systemd[910]: Starting Exit the Session...
  Sep 13 23:58:52 Silverblue systemd[1]: user-runtime-dir@42.service: Unit not needed anymore. Stopping.
  Sep 13 23:58:52 Silverblue systemd[1]: Stopped User Manager for UID 42.

So why it did not return to the login screen, I don;t know, but that's not Xwayland's fault there...

Comment 9 M. Edward (Ed) Borasky 2018-09-14 07:47:44 UTC
In my case I do have ALT-Fm consoles available so I can capture logfiles and if it left a core dump file somewhere I can snag it and upload it.

Comment 10 Olivier Fourdan 2018-09-14 07:50:32 UTC
User 42 is gdm, iirc, previously, gdm was able to detect a crash with Wayland and fallback to X11 in this case, maybe there is something broken there?

Comment 11 Olivier Fourdan 2018-09-14 07:53:00 UTC
(In reply to M. Edward (Ed) Borasky from comment #9)
> In my case I do have ALT-Fm consoles available so I can capture logfiles and
> if it left a core dump file somewhere I can snag it and upload it.

You could try coredumpctl, but from what I can see in the logs, that core file is most likely truncated...

  1. install the debuginfo (dnf debuginfo-install xorg-x11-server-Xwayland)
  2. coredumpctl list
     (see if Xwayland is listed there, as present)
  3. coredumpctl gdb Xwayland
  4. bt full

Comment 12 Olivier Fourdan 2018-09-14 08:12:39 UTC
Please note that comment #0 is a different issue though, it's not a /crash/ in Xwayland, it's Xwayland aborting because of the Wayland compositor (gnome-shell/mutter) is gone, thus the communication pipe with the Wayland compositor is broken and Xwayland aborts:

  Aug 23 09:45:08 platypus org.gnome.Shell.desktop[1092]: (EE)
  Aug 23 09:45:08 platypus org.gnome.Shell.desktop[1092]: Fatal server error:
  Aug 23 09:45:08 platypus org.gnome.Shell.desktop[1092]: (EE) Error sending   request: Broken pipe
  Aug 23 09:45:08 platypus org.gnome.Shell.desktop[1092]: (EE)

(EE) is Xwayland logs even though they show up as org.gnome.Shell.desktop (that's because gnome-shell/mutter spawns Xwayland), and the backtrace in comment #0 clearly shows the call to `xwl_log_handler()`

Comment 13 M. Edward (Ed) Borasky 2018-09-14 08:46:05 UTC
By the way - I only have this issue with Silverblue 29 - plain Fedora 29 Workstation installed via the "everything" network installer yesterday comes up with no problems. If you're keeping the Silverblue bugs somewhere else, I can continue there.

Comment 14 M. Edward (Ed) Borasky 2018-09-14 08:47:45 UTC
(In reply to Olivier Fourdan from comment #12)
> Please note that comment #0 is a different issue though, it's not a /crash/
> in Xwayland, it's Xwayland aborting because of the Wayland compositor
> (gnome-shell/mutter) is gone, thus the communication pipe with the Wayland
> compositor is broken and Xwayland aborts:
> 
>   Aug 23 09:45:08 platypus org.gnome.Shell.desktop[1092]: (EE)
>   Aug 23 09:45:08 platypus org.gnome.Shell.desktop[1092]: Fatal server error:
>   Aug 23 09:45:08 platypus org.gnome.Shell.desktop[1092]: (EE) Error sending
> request: Broken pipe
>   Aug 23 09:45:08 platypus org.gnome.Shell.desktop[1092]: (EE)
> 
> (EE) is Xwayland logs even though they show up as org.gnome.Shell.desktop
> (that's because gnome-shell/mutter spawns Xwayland), and the backtrace in
> comment #0 clearly shows the call to `xwl_log_handler()`

Do you know anything about the GPU in the other system?

Comment 15 Olivier Fourdan 2018-09-14 08:55:47 UTC
(In reply to M. Edward (Ed) Borasky from comment #14)
> Do you know anything about the GPU in the other system?

No more than what is provided in this bug, reading attachment 1478112 [details], I'd say 
Intel Skylake HD Graphics 520 (https://pci-ids.ucw.cz/read/PC/8086/1916)

Comment 16 M. Edward (Ed) Borasky 2018-09-14 09:39:39 UTC
(In reply to Olivier Fourdan from comment #10)
> User 42 is gdm, iirc, previously, gdm was able to detect a crash with
> Wayland and fallback to X11 in this case, maybe there is something broken
> there?

In my case, even if it fell back to Xorg, I still wouldn't get a greeter because Xorg crashes on this card too. And X leaves better logs than XWayland. ;-)

Comment 17 M. Edward (Ed) Borasky 2018-09-15 04:55:31 UTC
(In reply to Olivier Fourdan from comment #11)
> (In reply to M. Edward (Ed) Borasky from comment #9)
> > In my case I do have ALT-Fm consoles available so I can capture logfiles and
> > if it left a core dump file somewhere I can snag it and upload it.
> 
> You could try coredumpctl, but from what I can see in the logs, that core
> file is most likely truncated...
> 
>   1. install the debuginfo (dnf debuginfo-install xorg-x11-server-Xwayland)
>   2. coredumpctl list
>      (see if Xwayland is listed there, as present)
>   3. coredumpctl gdb Xwayland
>   4. bt full

It's been about a decade since I did any serious low-level troubleshooting, but isn't there some ulimit way to allow bigger core dumps? I've got this on a 300 GB partition. ;-)

Comment 18 Adam Williamson 2018-09-15 05:33:01 UTC
You can configure systemd-coredump's limits in /etc/systemd/coredump.conf . `man coredump.conf` explains the settings. You probably would want to bump ExternalSizeMax to something higher.

Comment 19 M. Edward (Ed) Borasky 2018-09-17 08:19:55 UTC
Good news - upgrading to Silverblue 29 Beta 1.3 fixed this! The display is functioning normally now.

Comment 20 M. Edward (Ed) Borasky 2018-09-20 07:54:42 UTC
The Silverblue Test Day added the scenario of upgrading from Silverblue 28 to 29 via a rebase. I've run that, and this issue still exists when I upgrade. So I'm going ahead with the core dump / backtrace attempt.

Comment 21 M. Edward (Ed) Borasky 2018-09-20 09:02 UTC
Created attachment 1485065 [details]
Log of the rebase from Silverblue 28 to 29

Comment 22 M. Edward (Ed) Borasky 2018-09-20 09:08 UTC
Created attachment 1485066 [details]
Traceback generated from Xwayland crash dump

ProcessSizeMax=8G
ExternalSizeMax=8G

in /etc/systemd/coredump.conf got a complete core dump

Backtrace was generated with 

# coredumpctl gdb Xwayland | tee dump-traceback
(gdb) bt full

Comment 23 Olivier Fourdan 2018-09-20 09:12:12 UTC
Looks like a DRI driver issue (i.e. Mesa), which would explain why it affects both Xorg and Xwayland (via glamor).

Comment 24 Olivier Fourdan 2018-09-20 09:15:13 UTC
I think you'll need the mesa-debuginfo and mesa-dri-drivers-debuginfo symbols as well to process further, because the backtrace is missing those (where the crash occurs)

Comment 25 M. Edward (Ed) Borasky 2018-09-20 09:35:35 UTC
(In reply to Olivier Fourdan from comment #24)
> I think you'll need the mesa-debuginfo and mesa-dri-drivers-debuginfo
> symbols as well to process further, because the backtrace is missing those
> (where the crash occurs)

Yeah - I'm about to install them and capture another backtrace. Any others I need??

Comment 26 M. Edward (Ed) Borasky 2018-09-20 10:52 UTC
Created attachment 1485096 [details]
journalctl log from Xorg crash

I had to drop back to Xorg - the Xwayland session doesn't let me Alt-F2 any more. This is the journalctl log.

Comment 27 M. Edward (Ed) Borasky 2018-09-20 10:53 UTC
Created attachment 1485097 [details]
gdb traceback from Xorg crash

And the traceback from the Xorg core dump

Comment 28 Shadders 2019-04-30 23:30:38 UTC
HI,
Just upgraded to Fedora 29, and obtaining same as per original report. Switching to lightdm does not fix. 
I re-installed the NVidia drivers too - and same issue, but i seemed ot get a bit further - the monitors noticed the video signals - one flashes, the other indicates HDMI signal.
Regards,
Richard.

Comment 29 Shadders 2019-05-01 00:18:31 UTC
Hi,

As an update - Laptop works no problem using internal Intel graphics.

Desktop is Intel i5-4430 3.0GHz, 8GB RAM, Motherboard is Gigabyte Z87xD3H, Graphics GeForce GTX1050. 

Regards,
Shadders.

Comment 30 Shadders 2019-05-01 09:06:59 UTC
Hi, 

Apologies for the spam. 

Upgraded from Fedora 29 (not working Gnome Desktop, to Fedora 30, same issue. 

Used a Live CD from Linux Format with Fedora 29 and selected basic graphics, this failed to start Gnome desktop manager. 

Changed BIOS to internal graphics only and one monitor connected, this too failed to start Gnome desktop manager for Fedora 29.

Used the Live DVD with Fedora 28, and this booted no issues.

Regards,
Shadders.

Comment 31 Shadders 2019-05-01 10:14:20 UTC
Hi,
Apologies yet again for the spam. 

Using a Live CD with F29 fails to start in basic graphics mode, which includes using Intel internal graphics. 

Updated the system to F30 using command line. 

After modifying the BIOS from graphics card, to internal Intel graphics, back to graphics card, without the HDMI monitor connected, DVI only, this allowed the system to boot. I have restarted the PC with both monitors connected and it still boots into graphics mode - using Fedora 30. 

Regards,
Shadders.

Comment 32 Adam Williamson 2019-05-01 15:16:13 UTC
F30 should work with basic graphics - we found several issues in the basic graphics mode path during F30 testing that also affected F29 (but weren't escalated during F29 testing unfortunately), we have fixed them for F30, so hopefully it should work better.