Bug 2222057

Summary: [abrt] xorg-x11-server-Xwayland: Xwayland killed by SIGABRT
Product: [Fedora] Fedora Reporter: Albert Flügel <af>
Component: mesaAssignee: Adam Jackson <ajax>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 38CC: af, ajax, bskeggs, igor.raits, j, lyude, mail, mdaenzer, ofourdan, rhughes, rstrode, tstellar, walter.pete
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/95c45b8a2292e189b3702cfba8efbf4bfc45a7d
Whiteboard: abrt_hash:9f28495d488fa3b80ede8ef2dbd458f3849f3465;VARIANT_ID=;
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: limits
none
File: mountinfo
none
File: proc_pid_status
none
File: maps
none
File: open_fds
none
File: core_backtrace
none
File: environ
none
File: os_info
none
File: dso_list
none
File: cpuinfo
none
File: backtrace
none
File: var_log_messages
none
core file of Xwayland
none
journal around the time of the Xwayland crash
none
Xorg log from the latest crash none

Description Albert Flügel 2023-07-11 17:38:41 UTC
Description of problem:
I logged on and did a few things in the GUI. Main thing was to look at a PDF using evince.
Suddenly the screen froze including the mouse pointer.

Version-Release number of selected component:
xorg-x11-server-Xwayland-22.1.9-2.fc38

Additional info:
reporter:       libreport-2.17.11
runlevel:       N 5
journald_cursor: s=ea7842033cbe4949825a1af9b1594e1e;i=401232;b=00feb968207c4320bb23fe49af3f0fdb;m=129dc351;t=600393f7788a4;x=8e883ecb2a764aec
type:           CCpp
cgroup:         0::/user.slice/user-1679.slice/user/session.slice/org.gnome.Shell
backtrace_rating: 4
uid:            1679
package:        xorg-x11-server-Xwayland-22.1.9-2.fc38
kernel:         6.3.11-200.fc38.x86_64
rootdir:        /
cmdline:        /usr/bin/Xwayland :0 -rootless -noreset -accessx -core -auth /run/user/5687/.mutter-Xwaylandauth.IA1Z71 -listenfd 4 -listenfd 5 -displayfd 6 -initfd 7 -byteswappedclients
executable:     /usr/bin/Xwayland
reason:         Xwayland killed by SIGABRT

Truncated backtrace:
Thread no. 1 (7 frames)
 #10 glamor_poly_segment_solid_gl at ../glamor/glamor_segs.c:81
 #11 glamor_poly_segment_gl at ../glamor/glamor_segs.c:131
 #12 glamor_poly_segment at ../glamor/glamor_segs.c:166
 #13 damagePolySegment at ../miext/damage/damage.c:1023
 #14 ProcPolySegment at ../dix/dispatch.c:1909
 #15 Dispatch at ../dix/dispatch.c:550
 #16 dix_main at ../dix/main.c:271

Comment 1 Albert Flügel 2023-07-11 17:38:45 UTC
Created attachment 1975193 [details]
File: limits

Comment 2 Albert Flügel 2023-07-11 17:38:46 UTC
Created attachment 1975194 [details]
File: mountinfo

Comment 3 Albert Flügel 2023-07-11 17:38:47 UTC
Created attachment 1975195 [details]
File: proc_pid_status

Comment 4 Albert Flügel 2023-07-11 17:38:48 UTC
Created attachment 1975196 [details]
File: maps

Comment 5 Albert Flügel 2023-07-11 17:38:50 UTC
Created attachment 1975197 [details]
File: open_fds

Comment 6 Albert Flügel 2023-07-11 17:38:51 UTC
Created attachment 1975198 [details]
File: core_backtrace

Comment 7 Albert Flügel 2023-07-11 17:38:53 UTC
Created attachment 1975199 [details]
File: environ

Comment 8 Albert Flügel 2023-07-11 17:38:54 UTC
Created attachment 1975200 [details]
File: os_info

Comment 9 Albert Flügel 2023-07-11 17:38:56 UTC
Created attachment 1975201 [details]
File: dso_list

Comment 10 Albert Flügel 2023-07-11 17:38:57 UTC
Created attachment 1975202 [details]
File: cpuinfo

Comment 11 Albert Flügel 2023-07-11 17:38:59 UTC
Created attachment 1975203 [details]
File: backtrace

Comment 12 Albert Flügel 2023-07-11 17:39:00 UTC
Created attachment 1975204 [details]
File: var_log_messages

Comment 13 Olivier Fourdan 2023-07-12 08:18:26 UTC
The crash occurs in `glamor_poly_segment_solid_gl()` when trying to
copy data to the VBO space:

```
 68     /* Set up the vertex buffers for the points */
 69 
 70     v = glamor_get_vbo_space(drawable->pScreen,
 71                              (nseg << add_last) * sizeof (xSegment),
 72                              &vbo_offset);
 73 
 74     glEnableVertexAttribArray(GLAMOR_VERTEX_POS);
 75     glVertexAttribPointer(GLAMOR_VERTEX_POS, 2, GL_SHORT, GL_FALSE,
 76                           sizeof(DDXPointRec), vbo_offset);
 77 
 78     if (add_last) {
 79         int i, j;
 80         for (i = 0, j=0; i < nseg; i++) {
 81             v[j++] = segs[i];                  <---- Here
 82             v[j].x1 = segs[i].x2;
 83             v[j].y1 = segs[i].y2;
 84             v[j].x2 = segs[i].x2+1;
 85             v[j].y2 = segs[i].y2;
 86             j++;
 87         }
 88     } else
 89         memcpy(v, segs, nseg * sizeof (xSegment));
 90 
```

We see that `i == 0` and `j == 1` so this is the very first iteration,
which may indicate that v is actually NULL (unfortunately, we may need
the core file to confirm this).

Can you please check in coredumpctl whether you still have the core file
for this crash?

Comment 14 Albert Flügel 2023-07-13 17:29:52 UTC
Created attachment 1975636 [details]
core file of Xwayland

Here it is. Please handle with care.

Comment 15 Albert Flügel 2023-07-13 17:31:30 UTC
This double j++ looks weird to me. Is that really intended ?
And if j should be 1 in the first assignment, i'd expect ++j
However, i'm not in deeply enough to judge.

Comment 16 Michel Dänzer 2023-07-17 10:24:17 UTC
(In reply to Albert Flügel from comment #15)
> This double j++ looks weird to me. Is that really intended ?

It is. The add_last case produces two output xSegment instances for each input instance.

> And if j should be 1 in the first assignment, i'd expect ++j

That would be incorrect, the pre-increment value is needed there.

This code has been unchanged for many years, there have been no other similar reports in all this time. The root cause is likely somewhere else.

Comment 17 Olivier Fourdan 2023-07-20 07:52:24 UTC
Thanks for the core file!

Humm, unfortunately, the value of `v` as returned by `glamor_get_vbo_space()` is optimized out, so that does not help... :(

Can you please attach the journalctl around the time of the issue (hopefully, it's still available), I wonder if we would have some `GL_OUT_OF_MEMORY` errors logged.

My current theory is still that `glamor_get_vbo_space()` returned NULL for some reason.

Comment 18 Olivier Fourdan 2023-07-20 08:03:11 UTC
> My current theory is still that `glamor_get_vbo_space()` returned NULL for some reason.

Or maybe not NULL, but at least bogus.

Comment 19 Olivier Fourdan 2023-07-20 08:10:00 UTC
(gdb) bt
#0  0x00007f891e8a9844 in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007f891e858abe in raise () from /lib64/libc.so.6
#2  0x00007f891e84187f in abort () from /lib64/libc.so.6
#3  0x000055814483813a in OsAbort () at ../os/utils.c:1362
#4  0x000055814483840e in AbortServer () at ../os/log.c:879
#5  FatalError (f=<optimized out>) at ../os/log.c:1017
#6  0x000055814482e714 in OsSigHandler (unused=<optimized out>, sip=<optimized out>, signo=7) at ../os/osinit.c:156
#7  OsSigHandler (signo=7, sip=<optimized out>, unused=<optimized out>) at ../os/osinit.c:110
#8  <signal handler called>
#9  0x0000558144711563 in glamor_poly_segment_solid_gl (drawable=drawable@entry=0x55814665b0b0, gc=gc@entry=0x5581465591c0, nseg=nseg@entry=191, 
    segs=segs@entry=0x558146503fb0) at ../glamor/glamor_segs.c:81
#10 0x00005581447177cd in glamor_poly_segment_gl (segs=0x558146503fb0, nseg=191, gc=0x5581465591c0, drawable=0x55814665b0b0) at ../glamor/glamor_segs.c:131
#11 glamor_poly_segment (drawable=0x55814665b0b0, gc=0x5581465591c0, nseg=191, segs=0x558146503fb0) at ../glamor/glamor_segs.c:166
#12 0x00005581447b45d0 in damagePolySegment (pDrawable=0x55814665b0b0, pGC=0x5581465591c0, nSeg=191, pSeg=<optimized out>) at ../miext/damage/damage.c:1023
#13 0x000055814476284a in ProcPolySegment (client=0x558146503b00) at ../dix/dispatch.c:1909
#14 0x0000558144769e87 in Dispatch () at ../dix/dispatch.c:550
#15 0x00005581446f4fc6 in dix_main (envp=<optimized out>, argv=<optimized out>, argc=<optimized out>) at ../dix/main.c:271
#16 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../dix/stubmain.c:34
(gdb) f 9
#9  0x0000558144711563 in glamor_poly_segment_solid_gl (drawable=drawable@entry=0x55814665b0b0, gc=gc@entry=0x5581465591c0, nseg=nseg@entry=191, 
    segs=segs@entry=0x558146503fb0) at ../glamor/glamor_segs.c:81
81	            v[j++] = segs[i];
(gdb) disass
Dump of assembler code for function glamor_poly_segment_solid_gl:
Address range 0x5581447113f0 to 0x558144711755:
[…]
   0x0000558144711548 <+344>:	movslq %ebx,%rcx
   0x000055814471154b <+347>:	mov    -0x78(%rbp),%rdx
   0x000055814471154f <+351>:	lea    (%rax,%rcx,8),%rsi
   0x0000558144711553 <+355>:	nopl   0x0(%rax,%rax,1)
   0x0000558144711558 <+360>:	mov    (%rax),%rcx
   0x000055814471155b <+363>:	add    $0x8,%rax
   0x000055814471155f <+367>:	add    $0x10,%rdx
=> 0x0000558144711563 <+371>:	mov    %rcx,-0x10(%rdx)
   0x0000558144711567 <+375>:	movzwl -0x4(%rax),%ecx
   0x000055814471156b <+379>:	mov    %cx,-0x8(%rdx)
[…]
(gdb) info line 81
Line 81 of "../glamor/glamor_segs.c" starts at address 0x558144711558 <glamor_poly_segment_solid_gl+360>
   and ends at 0x55814471155b <glamor_poly_segment_solid_gl+363>.
(gdb) info registers
rax            0x558146503fb8      94013718806456
rbx            0xbf                191
rcx            0x73000400730004    32369639509131268
rdx            0x7f891cdd4010      140226871509008
rsi            0x5581465045a8      94013718807976
rdi            0x7f891deae010      140226889179152
rbp            0x7ffc65646980      0x7ffc65646980
rsp            0x7ffc65646900      0x7ffc65646900
r8             0x0                 0
r9             0x4                 4
r10            0xf                 15
r11            0x7f891deae010      140226889179152
r12            0x5581459137e0      94013706287072
r13            0x558145a68801      94013707683841
r14            0x55814665b110      94013720211728
r15            0x4                 4
rip            0x558144711563      0x558144711563 <glamor_poly_segment_solid_gl+371>
eflags         0x246               [ PF ZF IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

So if I am not mistaken, `v` would be %rcx, i.e. 0x73000400730004 which looks bogus.

Comment 20 Albert Flügel 2023-07-21 12:40:20 UTC
Created attachment 1976890 [details]
journal around the time of the Xwayland crash

Please find attached some stuff from the journal. It starts with a GPU lockup and looks similar to what i reported in
https://bugzilla.redhat.com/show_bug.cgi?id=2221367
no oom in sight.
When i find the time, i'll build without -O* and catch in gdb. I expect to find a byte swapping client as the culprit haaaaaahaaaaaaaa scnr
Just had an Xorg freeze when i wanted to make a screenshot of a splattered area, that should just show black. Probably the next bug reports. i guess X* was never as broken as in Fedora 38.

Comment 21 Albert Flügel 2023-07-21 15:44:13 UTC
Created attachment 1976961 [details]
Xorg log from the latest crash

Seems with Xorg the same thing happens now. Here i have an old Xorg.1.log

Comment 22 Albert Flügel 2023-07-22 13:42:00 UTC
I caught the SIGBUS in the debugger several times now. What i see is:

#0  0x00007f0069fcb3f9 in glamor_poly_segment_solid_gl (
    drawable=0x557c7b273c20, gc=0x557c7ad172b0, nseg=191, segs=0x557c7b0de1e0)
    at glamor_segs.c:81
81                  v[j++] = segs[i];
(gdb) print v
$1 = (xSegment *) 0x7f00612a5000
(gdb) print i
$2 = 0
(gdb) print j
$3 = 1
(gdb) print segs
$4 = (xSegment *) 0x557c7b0de1e0
(gdb) print segs[0]
$8 = {x1 = 4, y1 = 115, x2 = 4, y2 = 115}
(gdb) print nseg
$5 = 191
(gdb) print v[0]
Cannot access memory at address 0x7f00612a5000      <------- this is cause gdb tries to access, not Xorg
(gdb) print vbo_offset
$6 = 0x0
(gdb) print add_last
$7 = 1

However, v points to the address of:

/proc/.../maps:
7f00612a5000-7f0061325000 rw-s 1033fd000 00:05 501                       /dev/dri/card0

that vbo_offset is 0 seems weird to me. When Xorg is working, this value is in the range 100000-400000.

A second crash looked a bit different:

Thread 1 "Xorg" received signal SIGBUS, Bus error.
__memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memmove-ssse3.S:152
Downloading source file /usr/src/debug/glibc-2.37-4.fc38.x86_64/string/../sysdeps/x86_64/multiarch/memmove-ssse3.S
152             movups  %xmm0, (%rdi)                                           

from info reg:
rdi            0x7f1eab37d000      139769698308096

Again the first address of the video device:
7f1eab37d000-7f1eab37e000 rw-s 10430c000 00:05 499                       /dev/dri/card0

Is this normal to access the first address ? The vertex buffer should have some offset > 0 or not ?

This is the stack, also sth under glamor:
#0  __memcpy_ssse3 () at ../sysdeps/x86_64/multiarch/memmove-ssse3.S:152
#1  0x00007f1ea9a6bf0a in memcpy
    (__len=<optimized out>, __src=<optimized out>, __dest=<optimized out>)
    at /usr/include/bits/string_fortified.h:29
#2  store_shader
    (ctx=ctx@entry=0x55812df50f00, shader=shader@entry=0x55812f4b6b10)
    at ../src/gallium/drivers/r600/r600_shader.c:160
#3  0x00007f1ea9a7733a in r600_pipe_shader_create
    (ctx=ctx@entry=0x55812df50f00, shader=shader@entry=0x55812f4b6b10, key=...)
    at ../src/gallium/drivers/r600/r600_shader.c:335
#4  0x00007f1ea99c93b3 in r600_shader_select
    (ctx=ctx@entry=0x55812df50f00, sel=sel@entry=0x55812ea93cf0, dirty=dirty@entry=0x7ffc67c6d797, precompile=precompile@entry=true)
    at ../src/gallium/drivers/r600/r600_state_common.c:967
#5  0x00007f1ea99c95a7 in r600_create_shader_state
    (ctx=0x55812df50f00, state=<optimized out>, pipe_shader_type=<optimized out>) at ../src/gallium/drivers/r600/r600_state_common.c:1071
#6  0x00007f1ea9388cc5 in st_create_nir_shader
    (st=st@entry=0x55812df651e0, state=state@entry=0x7ffc67c6d8b0)
    at ../src/mesa/state_tracker/st_program.c:542
#7  0x00007f1ea938907f in st_create_common_variant
    (st=st@entry=0x55812df651e0, prog=prog@entry=0x55812f2041a0, key=key@entry=0
    (st=st@entry=0x55812df651e0, prog=prog@entry=0x55812f2041a0, key=key@entry=0x7ffc67c6dc50) at ../src/mesa/state_tracker/st_program.c:761
#8  0x00007f1ea9389a1b in st_get_common_variant
    (st=st@entry=0x55812df651e0, prog=prog@entry=0x55812f2041a0, key=key@entry=0x7ffc67c6dc50) at ../src/mesa/state_tracker/st_program.c:814
#9  0x00007f1ea938a0af in st_precompile_shader_variant
    (prog=0x55812f2041a0, st=0x55812df651e0)
    at ../src/mesa/state_tracker/st_program.c:1288
#10 st_finalize_program
    (st=st@entry=0x55812df651e0, prog=prog@entry=0x55812f2041a0)
    at ../src/mesa/state_tracker/st_program.c:1365
#11 0x00007f1ea938b6c1 in st_deserialise_nir_program
    (ctx=ctx@entry=0x7f1ea9140010, shProg=shProg@entry=0x55812f520f00, prog=prog@entry=0x55812f2041a0) at ../src/mesa/state_tracker/st_shader_cache.c:199
#12 0x00007f1ea938b869 in st_load_nir_from_disk_cache
    (ctx=0x7f1ea9140010, prog=0x55812f520f00)
    at ../src/mesa/state_tracker/st_shader_cache.c:220
#13 0x00007f1ea95e5f30 in link_shader (prog=0x55812f520f00, ctx=0x7f1ea9140010)
    at ../src/mesa/state_tracker/st_glsl_to_ir.cpp:43
#14 st_link_shader(gl_context*, gl_shader_program*)
    (ctx=0x7f1ea9140010, prog=0x55812f520f00)
    at ../src/mesa/state_tracker/st_glsl_to_ir.cpp:106
#15 0x00007f1ea95c6798 in _mesa_glsl_link_shader(gl_context*, gl_shader_program*) (ctx=ctx@entry=0x7f1ea9140010, prog=prog@entry=0x55812f520f00)
    at ../src/mesa/program/link_program.cpp:91
#16 0x00007f1ea95a462f in link_program
    (no_error=<optimized out>, shProg=<optimized out>, ctx=<optimized out>)
    at ../src/mesa/main/shaderapi.c:1332
#17 link_program_error (ctx=0x7f1ea9140010, shProg=0x55812f520f00)
    at ../src/mesa/main/shaderapi.c:1443
#18 0x00007f1eab12f84b in glamor_link_glsl_prog
    (screen=0x55812de84e10, prog=43, format=0x7f1eab15a473 "%s_%s")
    at glamor_core.c:98
#19 0x00007f1eab143ffe in glamor_build_program
    (screen=0x55812de84e10, prog=0x55812df9ca20, prim=0x7f1eab164160 <glamor_facet_composite_glyphs_130>, fill=0x7f1eab1644c0 <glamor_source_solid>, combine=0x7f1eab15a578 "       gl_FragColor = source * mask.a;\n", defines=0x55812df16e70 "#define ATLAS_DIM_INV 0.000976562500000000\n") at glamor_program.c:351
#20 0x00007f1eab144743 in glamor_setup_one_program_render
    (screen=0x55812de84e10, prog=0x55812df9ca20, source_type=glamor_program_source_solid, alpha=glamor_program_alpha_normal, prim=0x7f1eab164160 <glamor_facet_composite_glyphs_130>, defines=0x55812df16e70 "#define ATLAS_DIM_INV 0.000976562500000000\n") at glamor_program.c:596
#21 0x00007f1eab14490b in glamor_setup_program_render
    (op=3 '\003', src=0x55812edbd460, mask=0x55812f4fa870, dst=0x55812eeaa070, program_render=0x55812df9ca20, prim=0x7f1eab164160 <glamor_facet_composite_glyphs_130>, defines=0x55812df16e70 "#define ATLAS_DIM_INV 0.000976562500000000\n")
    at glamor_program.c:658
#22 0x00007f1eab132709 in glamor_composite_glyphs
    (op=3 '\003', src=0x55812edbd460, dst=0x55812eeaa070, glyph_format=0x55812df9ae18, x_src=819, y_src=610, nlist=0, list=0x7ffc67c6ec10, glyphs=0x7ffc67c6e408) at glamor_composite_glyphs.c:443
#23 0x000055812c4dee5a in damageGlyphs
    (op=3 '\003', pSrc=0x55812edbd460, pDst=0x55812eeaa070, maskFormat=0x55812df9ae18, xSrc=819, ySrc=610, nlist=1, list=0x7ffc67c6ec00, glyphs=0x7ffc67c6e400)
    at damage.c:579
#24 0x000055812c4c2ac2 in CompositeGlyphs
    (op=3 '\003', pSrc=0x55812edbd460, pDst=0x55812eeaa070, maskFormat=0x55812df9ae18, xSrc=819, ySrc=610, nlist=1, lists=0x7ffc67c6ec00, glyphs=0x7ffc67c6e400) at glyph.c:558
#25 0x000055812c4cd828 in ProcRenderCompositeGlyphs (client=0x55812eaa9c20)
    at render.c:1377
#26 0x000055812c4cf397 in ProcRenderDispatch (client=0x55812eaa9c20)
    at render.c:1988
#27 0x000055812c397936 in Dispatch () at dispatch.c:479
#28 0x000055812c3a5c36 in dix_main
    (argc=17, argv=0x7ffc67c6f2a8, envp=0x7ffc67c6f338) at main.c:276
#29 0x000055812c387bc1 in main
    (argc=17, argv=0x7ffc67c6f2a8, envp=0x7ffc67c6f338) at stubmain.c:34

Could something be wrong on kernel level with the video device driver for Radeon.
However, to verify, that i do not have some memory error, i'll check my RAM now.

Comment 23 Albert Flügel 2023-07-22 15:27:59 UTC
The RAM in that box is ok.
An interesting phenomenon is, that after Xorg has crashed 1-3 times, it's running quite stable and without artifacts. But until this status is reached, the system is almost unusable. When i click on "Activities" in Gnome or try to open the menu with the power butten etc. the UI hangs, only the mousepointer can be moved. Sometimes after 1-2 minutes the screen gets black, reappears and i can do a few things - unless i click "Activities" ....
In the logs there is meanwhile nothing helpful.

Comment 24 Albert Flügel 2023-07-23 10:40:56 UTC
Now i downgraded the following packages to an old status of Fedora 37 from last year and to the shown versions:
libdrm-2.4.112-1.fc37.x86_64.rpm
libglvnd-1.5.0-1.fc37.x86_64.rpm
libglvnd-egl-1.5.0-1.fc37.x86_64.rpm
libglvnd-gles-1.5.0-1.fc37.x86_64.rpm
libglvnd-glx-1.5.0-1.fc37.x86_64.rpm
libglvnd-opengl-1.5.0-1.fc37.x86_64.rpm
mesa-dri-drivers-22.2.2-1.fc37.x86_64.rpm
mesa-libEGL-22.2.2-1.fc37.x86_64.rpm
mesa-libgbm-22.2.2-1.fc37.x86_64.rpm
mesa-libGL-22.2.2-1.fc37.x86_64.rpm
mesa-libglapi-22.2.2-1.fc37.x86_64.rpm
mesa-libOpenCL-22.2.2-1.fc37.x86_64.rpm
mesa-libOpenCL-devel-22.2.2-1.fc37.x86_64.rpm
mesa-libOSMesa-22.2.2-1.fc37.x86_64.rpm
mesa-va-drivers-22.2.2-1.fc37.x86_64.rpm
xorg-x11-drv-ati-19.1.0-8.fc37.x86_64.rpm
xorg-x11-server-common-1.20.14-8.fc37.x86_64.rpm
xorg-x11-server-Xorg-1.20.14-8.fc37.x86_64.rpm
leaving the rest of the installation on Fedora 38 and: The issues are gone. X runs stable and also the artifacts reported in this bug https://bugzilla.redhat.com/show_bug.cgi?id=2224601 are gone.

Another thing i tried: With the current versions of the above packages i booted the kernel kernel-6.0.7-301.fc37.x86_64, also from last year, what did NOT help. So the kernel is out of the suspects and it is clearly a software regression and not a hardware fault, what i had also considered possible, but now is neither the culprit.

It is somewhat surprising for me, that noone else is facing this issue, but i'm not convinced. This issue could be in the same area: https://bugzilla.redhat.com/show_bug.cgi?id=2220717

So what is the suggestion ? Buy new hardware, cause mine is niche ?
Try to upgrade the above package by package and see when the trouble starts ?

Comment 25 Michel Dänzer 2023-07-24 14:34:35 UTC
(In reply to Albert Flügel from comment #24)
> Now i downgraded the following packages to an old status of Fedora 37 from
> last year and to the shown versions:

xorg-x11-server-Xwayland isn't on the list. Did you actually not downgrade it, or just forgot to list it?

> Try to upgrade the above package by package and see when the trouble starts ?

That seems like a good next step. If you did downgrade xorg-x11-server-Xwayland, you could try upgrading just that first.

Comment 26 Albert Flügel 2023-07-24 17:22:32 UTC
For Xorg i can currently say, that the trouble is bound to this set of packages (all others are in current version), that can only be installed in the same version (otherwise neither Xorg nor Xwayland can start). When they are in the current version, Xorg crashes. In old version, everything seems fine:
 mesa-dri-drivers       x86_64       23.1.3-1.fc38          updates        19 M
 mesa-libEGL            x86_64       23.1.3-1.fc38          updates       131 k
 mesa-libGL             x86_64       23.1.3-1.fc38          updates       174 k
 mesa-libOSMesa         x86_64       23.1.3-1.fc38          updates       3.1 M
 mesa-libOpenCL         x86_64       23.1.3-1.fc38          updates       6.5 M
 mesa-libgbm            x86_64       23.1.3-1.fc38          updates        45 k
 mesa-libglapi          x86_64       23.1.3-1.fc38          updates        54 k

For wayland things are more complicated. I still have no real clue, what combination works. This is all taking much time because to definitly tell, i have to boot after each package change. Sigh. I end the experiments for today.

Comment 27 Albert Flügel 2023-07-25 18:20:50 UTC
Today i found the reason, why the older Xwayland crashed. Before coming to this i can tell, that it is the same libraries listed in the previous comment who are making the problems for Xwayland.

Why was it impossible to test the older version of Xwayland ?
In the journal i find, that Xwayland writes a usage and complains about the unknown argument -byteswappedclients .
Indeed, the old Xwayland did not yet know this option, cause you and your colleagues introduced that later. The argument is hardcoded in a shared object of the mutter stuff. Thus i cannot test that. There you are with hardcoded incompatibilities in binaries and if i understand correctly it is not planned to make this a real configuration option in Fedora 38 or higher.
I'm somewhat surprised, that the old version of Xorg does not complain.
Furthermore i found, that Xwayland dumped a core due to the wrong usage:
                Stack trace of thread 4884:
                #0  0x00007f71d1eae844 __pthread_kill_implementation (libc.so.6 + 0x8e844)
                #1  0x00007f71d1e5dabe raise (libc.so.6 + 0x3dabe)
                #2  0x00007f71d1e4687f abort (libc.so.6 + 0x2687f)
                #3  0x0000559fa1c96c0c OsAbort (Xwayland + 0x175c0c)
                #4  0x0000559fa1c96ebf FatalError (Xwayland + 0x175ebf)
                #5  0x0000559fa1c9eac8 ProcessCommandLine (Xwayland + 0x17dac8)
                #6  0x0000559fa1b584a5 main (Xwayland + 0x374a5)
                #7  0x00007f71d1e47b4a __libc_start_call_main (libc.so.6 + 0x27b4a)
                #8  0x00007f71d1e47c0b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x27c0b)
                #9  0x0000559fa1b5b3f5 _start (Xwayland + 0x3a3f5)
                ELF object binary architecture: AMD x86-64
This is new for me. Until now i experienced this mainly as a bad habit of Java programs. The information, that some argument is wrong or whatever trivial thing i must search in stack traces of not seldom hundreds of lines.

Comment 28 Olivier Fourdan 2023-07-25 19:05:24 UTC
(In reply to Albert Flügel from comment #27)
> 
> Why was it impossible to test the older version of Xwayland ?
> In the journal i find, that Xwayland writes a usage and complains about the
> unknown argument -byteswappedclients .
> Indeed, the old Xwayland did not yet know this option, cause you and your
> colleagues introduced that later. The argument is hardcoded in a shared
> object of the mutter stuff. Thus i cannot test that. There you are with
> hardcoded incompatibilities in binaries and if i understand correctly it is
> not planned to make this a real configuration option in Fedora 38 or higher.
> I'm somewhat surprised, that the old version of Xorg does not complain.

Xwayland is spawned automatically by the Wayland compositor, i.e. mutter, which specifies the command line options it passes to Xwayland at startup. Of course, newer versions of mutter may use options only available in newer version of Xwayland.

Anyways, that's a regression in Mesa as downgrading Mesa solves the issue, so I'll move that to Mesa.