Bug 2249725 - GNOME applications suddenly exit with "Error 22 (Invalid argument) dispatching to Wayland display." and "WL: error in client communication"
Summary: GNOME applications suddenly exit with "Error 22 (Invalid argument) dispatchin...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: mesa
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: José Expósito
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-11-15 01:00 UTC by Adam Williamson
Modified: 2023-11-17 23:48 UTC (History)
15 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-17 23:48:35 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
log extract with various debugging options enabled (111.50 KB, text/plain)
2023-11-15 01:04 UTC, Adam Williamson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab mesa mesa issues 10146 0 None opened mesa 23.3: GNOME applications suddenly exit with "Error 22 (Invalid argument) dispatching to Wayland display." and "WL: ... 2023-11-15 01:44:07 UTC
freedesktop.org Gitlab mesa mesa merge_requests 26220 0 None merged zink: allow software rendering only if selected 2023-11-16 20:12:47 UTC

Description Adam Williamson 2023-11-15 01:00:45 UTC
With mesa-23.3.0~rc2-2.fc40 , GNOME applications - at least in qemu VMs, I have not tested on bare metal yet - seem to have a problem with suddenly dying. They do not crash - there is no core dump. openQA is seeing this in various apps - at least gnome-text-editor, gnome-software, gnome-clocks, and I think some others.

The easiest one to reproduce it with is gnome-text-editor. Just try printing some text to a file. In openQA this causes the app to die immediately more than half the time. I can also reproduce it manually in a normal virt-manager VM very easily. If it does not die the first time, try doing it again, and it will almost always die the second time.

On the console, we see this message:

Gdk-Message: 16:31:02.962: Error 22 (Invalid argument) dispatching to Wayland display.

in the system journal, we see this:

Nov 14 16:31:02 localhost-live gnome-shell[1967]: WL: error in client communication (pid 3562)

I also reproduced with mutter, mesa and libgl debugging enabled, and will attach the 500 lines around the "error in client communication" message from the journal in that case. (Note with all that debugging enabled, it never seemed to die on the first attempt, but *did* die on the second).

The previous build of mesa in Rawhide was mesa-23.2.1-1.fc40. If I downgrade to that version, the bug stops happening. I note that as well as the version bump of mesa itself, there's another significant difference between the two: 23.2.1-1.fc40 was built with LLVM 16, 23.3.0~rc2-2.fc40 was built with LLVM 17. I'm currently attempting a rebuild of 23.3.0~rc2 with LLVM 16 to see if the bug happens in that case or not.

Comment 1 Adam Williamson 2023-11-15 01:04:12 UTC
Created attachment 1999473 [details]
log extract with various debugging options enabled

Comment 2 Adam Williamson 2023-11-15 01:27:46 UTC
The crash does still happen with my rebuild of mesa 23.3.0~rc2 with llvm 16, so it seems the issue really is in mesa itself, not an LLVM issue.

Comment 3 Adam Williamson 2023-11-15 01:28:02 UTC
sigh, I mean "the bug" not "the crash". :D

Comment 4 José Expósito 2023-11-15 10:05:48 UTC
I scratch-built 23.3.0~rc3:
https://koji.fedoraproject.org/koji/taskinfo?taskID=109057805

And the issue is still present.

Comment 5 Adam Williamson 2023-11-15 17:08:49 UTC
José has now bisected and identified the issue upstream - https://gitlab.freedesktop.org/mesa/mesa/-/issues/10146#note_2168870 . Big thanks for that.

Comment 6 José Expósito 2023-11-16 11:35:23 UTC
Patch sent upstream:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26220

@awilliam if you don't disagree, I'd like to wait a few days until we receive feedback about the patch and then I'll build a new version of mesa with the patch applied.

Comment 7 Adam Williamson 2023-11-16 20:07:44 UTC
It's been merged upstream already, so I guess we could do a backport build now? Thanks!

Comment 8 José Expósito 2023-11-17 10:20:48 UTC
That was fast! I generated a new mesa build:

 - https://koji.fedoraproject.org/koji/taskinfo?taskID=109154034
 - https://bodhi.fedoraproject.org/updates/FEDORA-2023-f5f3c2d7fc

Leaving the issue open for the moment. Once the QA process starts, do you mind validating that GNOME apps are not crashing anymore and close it, please?

Comment 9 Adam Williamson 2023-11-17 23:48:35 UTC
I think it looks good: we haven't had a failure of the desktop_printing update test since it went stable, and no Workstation live failures in the recently-completed Rawhide compose. Thanks! I'll close it out.


Note You need to log in before you can comment on or make changes to this bug.