Bug 2313799 - [abrt] xorg-x11-server-Xwayland: Xwayland killed by SIGABRT
Summary: [abrt] xorg-x11-server-Xwayland: Xwayland killed by SIGABRT
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-server-Xwayland
Version: 40
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Olivier Fourdan
QA Contact:
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:1796bc6db59fdec16c2d51c5589...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-09-20 15:36 UTC by Luc Lalonde
Modified: 2024-10-04 01:46 UTC (History)
4 users (show)

Fixed In Version: xorg-x11-server-Xwayland-24.1.3-1.fc41 xorg-x11-server-Xwayland-24.1.3-1.fc40
Clone Of:
Environment:
Last Closed: 2024-10-04 00:16:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: os_info (699 bytes, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: mountinfo (3.98 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: limits (1.29 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: backtrace (54.57 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: cpuinfo (3.34 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: environ (3.62 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: dso_list (464 bytes, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: core_backtrace (12.38 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: var_log_messages (1.99 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: proc_pid_status (1.55 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: maps (3.93 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details
File: open_fds (3.86 KB, text/plain)
2024-09-20 15:36 UTC, Luc Lalonde
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab xorg xserver issues 1752 0 None opened NULL pointer dereference in RemoveHost() 2024-09-23 08:02:00 UTC
freedesktop.org Gitlab xorg xserver merge_requests 1701 0 None opened os: Fix NULL pointer dereference 2024-09-23 08:03:01 UTC

Description Luc Lalonde 2024-09-20 15:36:37 UTC
Description of problem:
From SDDM or GDM login screen, choose GNOME-Wayland or Plasma-Wayland.
Unable to access session.   I can only use Plasma-X11 or GNOME-X11

Version-Release number of selected component:
xorg-x11-server-Xwayland-24.1.2-1.fc40

Additional info:
reporter:       libreport-2.17.15
cgroup:         0::/user.slice/user-12690.slice/user/session.slice/plasma-kwin_wayland.service
uid:            12690
kernel:         6.10.10-200.fc40.x86_64
type:           CCpp
backtrace_rating: 4
reason:         Xwayland killed by SIGABRT
cmdline:        /usr/bin/Xwayland :1 -auth /run/user/12690/xauth_lESENv -listenfd 85 -listenfd 86 -displayfd 77 -rootless -wm 80 -enable-ei-portal
package:        xorg-x11-server-Xwayland-24.1.2-1.fc40
rootdir:        /
runlevel:       N 5
journald_cursor: s=355bd676dc72403f95adb64dbf5e1fe7;i=55d2f5;b=e9ae07474a0840e4ba080f21c9973a31;m=245cfad;t=6228e9792f059;x=70e3b1e192f1a5be
executable:     /usr/bin/Xwayland

Truncated backtrace:
Thread no. 1 (8 frames)
 #10 RemoveHost at ../os/access.c:1448
 #12 DisableLocalUser at ../os/access.c:375
 #13 DisableLocalAccess at ../os/access.c:299
 #14 CheckAuthorization at ../os/auth.c:208
 #15 ClientAuthorized at ../os/connection.c:532
 #16 ProcEstablishConnection at ../dix/dispatch.c:3791
 #17 Dispatch at ../dix/dispatch.c:549
 #18 dix_main at ../dix/main.c:275

Comment 1 Luc Lalonde 2024-09-20 15:36:40 UTC
Created attachment 2047889 [details]
File: os_info

Comment 2 Luc Lalonde 2024-09-20 15:36:41 UTC
Created attachment 2047890 [details]
File: mountinfo

Comment 3 Luc Lalonde 2024-09-20 15:36:42 UTC
Created attachment 2047891 [details]
File: limits

Comment 4 Luc Lalonde 2024-09-20 15:36:43 UTC
Created attachment 2047892 [details]
File: backtrace

Comment 5 Luc Lalonde 2024-09-20 15:36:45 UTC
Created attachment 2047893 [details]
File: cpuinfo

Comment 6 Luc Lalonde 2024-09-20 15:36:46 UTC
Created attachment 2047894 [details]
File: environ

Comment 7 Luc Lalonde 2024-09-20 15:36:47 UTC
Created attachment 2047895 [details]
File: dso_list

Comment 8 Luc Lalonde 2024-09-20 15:36:48 UTC
Created attachment 2047896 [details]
File: core_backtrace

Comment 9 Luc Lalonde 2024-09-20 15:36:49 UTC
Created attachment 2047897 [details]
File: var_log_messages

Comment 10 Luc Lalonde 2024-09-20 15:36:51 UTC
Created attachment 2047898 [details]
File: proc_pid_status

Comment 11 Luc Lalonde 2024-09-20 15:36:52 UTC
Created attachment 2047899 [details]
File: maps

Comment 12 Luc Lalonde 2024-09-20 15:36:53 UTC
Created attachment 2047900 [details]
File: open_fds

Comment 13 Olivier Fourdan 2024-09-20 16:01:23 UTC
While it definitely shouldn't segfault ("client=0x0" and the code assigns "client->errorValue" in the RemoveHost error path), I suspect there is more to it.

That code is more than a decade old (so unlikely a recent regression) and this occurs in an error code path.

Comment 14 Luc Lalonde 2024-09-20 17:31:47 UTC
I downgraded to xorg-x11-server-Xwayland-23.2.6-1.fc40.x86_64.

But I see the same problem

Comment 15 Olivier Fourdan 2024-09-23 08:02:00 UTC
(In reply to Luc Lalonde from comment #14)
> I downgraded to xorg-x11-server-Xwayland-23.2.6-1.fc40.x86_64.
> 
> But I see the same problem

Not surprising, as I said, this isn't new at all, probably something changed on the configuration of your system that triggers that (old) bug in the Xserver.

Let's address the NULL pointer dereference and we can move forward from there.

For that purpose, I've prepared a scratch build with the fix I posted upstream, for you to test:

  https://koji.fedoraproject.org/koji/taskinfo?taskID=123814306

(Please note, this is a _scratch build_, meaning the build will be removed soon, so please make sure to grab it while it's there!)

Once you've installed that build, please try again, hopefully the Xserver (Xwayland in that case) won't segfault anymore and we can have a chance to better understand what is going on.

Comment 16 Luc Lalonde 2024-09-23 14:33:36 UTC
I can now login to a working Wayland session with Plasma... I can also try Gnome if you want.

I'm at your disposal for other info you might need!

Comment 17 Olivier Fourdan 2024-09-24 13:44:03 UTC
OK so that's a good news! Another good news is that I landed the fix upstream now.

So what happens here is that a client tries to connect to the display, the xserver checks whether that client is authorised to connect,  so CheckAuthorization() is called, since an auth file was specified on the command line („-auth /run/user/12690/xauth_lESENv“) and has a valid entry, the xserver disabled local access using DisableLocalAccess() which calls RemoveHost() with a NULL client and family set as "FamilyServerInterpreted" , and that will end up with a NULL pointer dereference.

So the fix is trivial, yet I wonder why this has been unnoticed for so many years, and why does it happen on that particular system while I clearly cannot reproduce locally. Even more so that it is seemly unrelated to the compositor.

So how come we get to that point where the crash is (was) triggered? 

The only way to get to that (broken) code path is to:

1. Have `ShouldLoadAuth` TRUE in `CheckAuthorization()` [1]
2. Have `CheckAddr(family, pAddr, length)` return a value < 0 in `RemoveHost()` [2]

As we know an `authorization_file` is specified, it seems the only way to get into that code path in 1. is that `stat(authorization_file, &buf)` fails and return -1

On top of that, we also need to have CheckAddr() to return a negative value, which means siCheckAddr() returned < 0 [3].

So I wonder, did you change the network setup of that machine and filesystems mount options recently?

[1] https://gitlab.freedesktop.org/xorg/xserver/-/blob/xwayland-24.1.2/os/auth.c#L180-185
[2] https://gitlab.freedesktop.org/xorg/xserver/-/blob/xwayland-24.1.2/os/access.c#L1439-1443
[3] https://gitlab.freedesktop.org/xorg/xserver/-/blob/xwayland-24.1.2/os/access.c#L1749-1797

Comment 18 Luc Lalonde 2024-09-24 13:57:08 UTC
No, no changes in network/mount setup on the client machine.  Could there be something in the user's profile causing this?

Comment 19 Olivier Fourdan 2024-09-24 14:16:04 UTC
(In reply to Luc Lalonde from comment #18)
> No, no changes in network/mount setup on the client machine.  Could there be
> something in the user's profile causing this?

That seems unlikely.

Probably unrelated, but I noticed there is DISPLAY set to ":1" in attachment 2047894 [details], is that expected or something explicitly forced in a environment file?

Comment 20 Luc Lalonde 2024-09-24 14:25:35 UTC
I don't know why I would have set  DISPLAY to ':1' and it's not in my .bashrc, .bash_profile.  Dunno where that's from.  But then again I've had this profile for a few years now.

Comment 21 Olivier Fourdan 2024-09-24 14:31:24 UTC
Well, the Wayland compositor will set the value of DISPLAY as it's the one spawning Xwayland, so having ":1" set there is perfectly plausible, it would be a problem only if the value is forced from some config file and different from the actual value set by the compositor.

Comment 22 Olivier Fourdan 2024-09-24 16:33:15 UTC
(In reply to Olivier Fourdan from comment #17)
> […]
> So how come we get to that point where the crash is (was) triggered? 
> 
> The only way to get to that (broken) code path is to:
> 
> 1. Have `ShouldLoadAuth` TRUE in `CheckAuthorization()` [1]
> 2. Have `CheckAddr(family, pAddr, length)` return a value < 0 in
> `RemoveHost()` [2]
> 
> As we know an `authorization_file` is specified, it seems the only way to
> get into that code path in 1. is that `stat(authorization_file, &buf)` fails
> and return -1
> 

Actually… `ShouldLoadAuth` can also be set to TRUE on server reset, if all X11 clients have vanished, so this is a possibility: At startup, a short lived X11 clients gets started, and when that client dies, the Xserver resets, calling ResetAuthorization() from ResetWellKnownSockets().

As for CheckAddr() (or, rather, siCheckAddr()) returning < 0, this would be because the address is rejected, in the form of "si:localuser:username" if I am, reading the backtrace from attachment 2047892 [details] correctly.

Did you recently added a "xhost" command at startup or something like that maybe?

Comment 23 Luc Lalonde 2024-09-24 17:55:19 UTC
No, haven't added a 'xhost' command at startup.

Comment 24 Luc Lalonde 2024-09-25 14:12:13 UTC
I downgraded to the version that causes the the segfault for a test...  I think that I've found a hint that could help point to the culprit:

no segfault with users that have a locally mounted home, but users with NFS homes consistently cannot login with a Wayland session (segfault).

Is this helpful?

Comment 25 Olivier Fourdan 2024-09-25 14:41:50 UTC
(In reply to Luc Lalonde from comment #24)
> Is this helpful?

It is, to some extent! :)

I was suspecting something along these lines, but I don't think the actual root cause is in the xserver (the xserver crashing clearly is a bug though, but that's addressed now).

Corollary question is, what else changed that triggers that bug now.

Comment 26 Fedora Update System 2024-10-02 12:52:03 UTC
FEDORA-2024-86b9498cd6 (xorg-x11-server-Xwayland-24.1.3-1.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-86b9498cd6

Comment 27 Fedora Update System 2024-10-02 13:13:46 UTC
FEDORA-2024-e9f1a3b79d (xorg-x11-server-Xwayland-24.1.3-1.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-e9f1a3b79d

Comment 28 Fedora Update System 2024-10-03 02:03:09 UTC
FEDORA-2024-86b9498cd6 has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-86b9498cd6`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-86b9498cd6

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 29 Fedora Update System 2024-10-03 03:38:06 UTC
FEDORA-2024-e9f1a3b79d has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-e9f1a3b79d`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-e9f1a3b79d

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 30 Fedora Update System 2024-10-04 00:16:12 UTC
FEDORA-2024-86b9498cd6 (xorg-x11-server-Xwayland-24.1.3-1.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 31 Fedora Update System 2024-10-04 01:46:37 UTC
FEDORA-2024-e9f1a3b79d (xorg-x11-server-Xwayland-24.1.3-1.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.