Bug 1516859

Summary: [abrt] xorg-x11-server-Xwayland: xwl_log_handler(): Xwayland killed by SIGABRT
Product: [Fedora] Fedora Reporter: Christian Stadelmann <fedora>
Component: xorg-x11-serverAssignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 27CC: alexl, awilliam, bskeggs, caillon+fedoraproject, jglisse, john.j5live, ofourdan, retape, rhughes, rstrode, sandmann, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/32c202c6f2e51f5f76413a6a163bff75eed17a26
Whiteboard: abrt_hash:3f66ffb13feadec7ffc4a78e66523a30f6b2210a;VARIANT_ID=workstation;
Fixed In Version: mutter-3.26.2-2.fc27 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-13 16:39:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
File: core_backtrace
none
File: cpuinfo
none
File: dso_list
none
File: limits
none
File: proc_pid_status none

Description Christian Stadelmann 2017-11-23 12:56:38 UTC
Description of problem:
I tried reproducing bug #1392242 comment #15.

Steps to reproduce:
1. on gdm login screen, log in to your gnome/wayland session
2. as soon as possible, switch to another tty (i.e. while login is not completed)

I guess that the underlaying resources (input or output devices) get missing which causes this bug.

Version-Release number of selected component:
xorg-x11-server-Xwayland-1.19.5-1.fc27

Additional info:
reporter:       libreport-2.9.3
backtrace_rating: 4
cmdline:        /usr/bin/Xwayland :0 -rootless -terminate -core -listen 4 -listen 5 -displayfd 6
crash_function: xwl_log_handler
executable:     /usr/bin/Xwayland
journald_cursor: s=56b603a1a0544164a0009e9af71baa8c;i=3944d;b=fa096e057a74423b9e4932266c55819c;m=18a124f63;t=55ea5c9fe16a6;x=72fcb15805b411f2
kernel:         4.13.13-300.fc27.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp

Truncated backtrace:
Thread no. 1 (10 frames)
 #5 xwl_log_handler at xwayland.c:885
 #6 wl_abort at src/wayland-util.c:416
 #7 wl_proxy_marshal_array_constructor_versioned at src/wayland-client.c:659
 #8 wl_proxy_marshal_array_constructor at src/wayland-client.c:599
 #9 wl_proxy_marshal_constructor at src/wayland-client.c:733
 #10 wl_display_sync at ./protocol/wayland-client-protocol.h:948
 #11 wl_display_roundtrip_queue at src/wayland-client.c:1113
 #12 wl_display_roundtrip at src/wayland-client.c:1150
 #13 xwl_screen_init at xwayland.c:806
 #14 AddScreen at dispatch.c:3916

Comment 1 Christian Stadelmann 2017-11-23 12:56:47 UTC
Created attachment 1358198 [details]
File: backtrace

Comment 2 Christian Stadelmann 2017-11-23 12:56:49 UTC
Created attachment 1358199 [details]
File: core_backtrace

Comment 3 Christian Stadelmann 2017-11-23 12:56:51 UTC
Created attachment 1358200 [details]
File: cpuinfo

Comment 4 Christian Stadelmann 2017-11-23 12:56:52 UTC
Created attachment 1358201 [details]
File: dso_list

Comment 5 Christian Stadelmann 2017-11-23 12:56:54 UTC
Created attachment 1358202 [details]
File: limits

Comment 6 Christian Stadelmann 2017-11-23 12:56:56 UTC
Created attachment 1358203 [details]
File: proc_pid_status

Comment 7 Olivier Fourdan 2017-11-23 13:06:57 UTC
I think this is this: 

https://lists.x.org/archives/xorg-devel/2017-October/055025.html

Comment 8 Olivier Fourdan 2017-11-23 14:08:16 UTC
But I don't think this is with wl_output in this case, I don't see outputs being added/removed on VT change.

Comment 9 Olivier Fourdan 2017-11-23 15:07:58 UTC
In this case, the race occurs with the wl_seat (which is created/destroyed on VT switch).

  ...
  [2802049.598] wl_seat(new id wl_pointer@18)
  [2802049.637]  -> wl_display(wl_display@1, 0, "invalid object 18")
  [2802050.464] wl_display(wl_display@1, 0, "invalid object 18")
  (EE) 
  Fatal server error:
  (EE) wl_display@1: error 0: invalid object 18
  (EE)

Comment 10 Olivier Fourdan 2017-11-24 08:55:08 UTC
Yet, in this case here, I wonder if that couldn't be a mutter bug though.

Xwayland abort() on a Wayland issue, trying to access an object that doesn't exist anymore.

That update occurs on the wl_seat capabilities, basically mutter sends Xwayland a wl_seat_send_capabilities() enabling the pointer while the wl_seat has no pointer (thus the Wayland protocol issue).

The capabilities is sent from meta_wayland_seat_devices_updated() in meta-wayland-seat.c which is called on a signal "device-added" or "device-removed" from the device manager.

The capabilities are set or unset based on MetaWaylandSeat in meta_wayland_seat_set_capabilities() which is called from meta_wayland_seat_update_capabilities() which queries the device.

I think we have a potential race here, the device actua capabilities may have changed once more in between, which would explain the reproducer.

Comment 11 Christian Stadelmann 2017-11-24 09:06:19 UTC
(In reply to Olivier Fourdan from comment #10)
> […]

That sounds like a reasonable explanation to me.

Comment 12 Olivier Fourdan 2017-11-24 15:57:35 UTC
Humm, no, I think this is one of those case where the protocol itself is racy, not Xwayland or mutter...

Comment 13 Adam Williamson 2018-04-13 16:32:06 UTC
*** Bug 1523952 has been marked as a duplicate of this bug. ***

Comment 14 Adam Williamson 2018-04-13 16:35:07 UTC
Changed external bug per Olivier's comment here: https://bugzilla.redhat.com/show_bug.cgi?id=1523952#c18 .

Comment 15 Adam Williamson 2018-04-13 16:39:55 UTC
I believe this was fixed by https://bodhi.fedoraproject.org/updates/FEDORA-2017-39b370bebf - the commit mentioned there, cde545462 , is precisely the commit that fixed this bug, "wayland-outputs: Delay wl_output destruction". So, closing. Please re-open if you're hitting *exactly* this bug still.

Comment 16 Adam Williamson 2018-04-13 17:24:25 UTC
Well, I'm not 100% sure on that, so re-opening, for now, waiting on confirmation from Olivier. See https://bugzilla.redhat.com/show_bug.cgi?id=1523952#c20 (if you can, it's a private bug due to libreport silliness).