Bug 2284116 - Xwayland regularly goes to 100% CPU, making all X windows unresponsive
Summary: Xwayland regularly goes to 100% CPU, making all X windows unresponsive
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-server-Xwayland
Version: 40
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Olivier Fourdan
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-05-31 11:56 UTC by Jan Wielemaker
Modified: 2024-06-28 01:58 UTC (History)
4 users (show)

Fixed In Version: xorg-x11-server-Xwayland-24.1.0-2.fc40
Clone Of:
Environment:
Last Closed: 2024-06-28 01:58:30 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
freedesktop.org Gitlab xorg xserver issues 1680 0 None opened Redirect surface window as needed for page flips causes Xwayland to hang sometimes 2024-06-03 07:46:09 UTC

Description Jan Wielemaker 2024-05-31 11:56:20 UTC
After the last upgrade to xorg-x11-server-Xwayland-24.1.0-1.fc40.x86_64, my desktop freezes a couple of times per day.   Xwayland is running at 100% CPU.  Killing Xwayland using `kill -9` terminates the X11 applications, after which I can restart them.

After downgrading to  xorg-x11-server-Xwayland-23.2.6-1.fc40.x86_64 the problem is gone again.

Reproducible: Sometimes




AMD 3950X with GeForce RTX 3070 graphics.   New fresh Fedora 40 installation.

Comment 1 Olivier Fourdan 2024-05-31 12:39:58 UTC
Which driver are you using?

Can you try to find out where in the code Xwayland is spinning? (install the debuginfo package and generate a few stack traces using "gstack" and post those in there).

Comment 2 Olivier Fourdan 2024-05-31 12:41:38 UTC
And also which compositor?

Comment 3 Jan Wielemaker 2024-05-31 13:05:10 UTC
Thanks for the response.   First the driver info

> lspci -n -n -k | grep -A 2 -e VGA -e 3D
09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070] [10de:2484] (rev a1)
	Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:3903]
	Kernel driver in use: nouveau
> glxinfo | grep -e OpenGL.vendor -e OpenGL.renderer
OpenGL vendor string: Mesa
OpenGL renderer string: NV174
> switcherooctl list
Device: 0
  Name:        NVIDIA Corporation GA104 [GeForce RTX 3070]
  Default:     yes
  Environment: DRI_PRIME=pci-0000_09_00_0

The compositor

> inxi -Gxx | grep compositor
    compositor: gnome-shell v: 46.2 driver: X: loaded: modesetting,nouveau

Now the more interesting part.   Using a remote login I attached gdb.  That got me this info.  As you can see, pDamage is simply cyclic.   No wonder it loops :)

(gdb) bt
#0  damageRegionProcessPending (pDrawable=pDrawable@entry=0x55eeb8e26b10)
    at ../miext/damage/damage.c:292
#1  0x000055eeb65f0732 in damageCopyArea (pSrc=<optimized out>, 
    pDst=0x55eeb8e26b10, pGC=0x55eeb893b8d0, srcx=0, srcy=<optimized out>, 
    width=<optimized out>, height=19, dstx=0, dsty=0)
    at ../miext/damage/damage.c:778
#2  0x000055eeb65cbd88 in compRestoreWindow (pWin=pWin@entry=0x55eeb8e26b10, 
    pPixmap=pPixmap@entry=0x55eeb906c230) at ../composite/compalloc.c:250
#3  0x000055eeb65d7136 in compCheckRedirect (pWin=pWin@entry=0x55eeb8e26b10)
    at ../composite/compwindow.c:181
#4  0x000055eeb65d7cfd in compUnrealizeWindow (pWin=0x55eeb8e26b10)
    at ../composite/compwindow.c:292
#5  0x000055eeb65d08ba in UnrealizeTree (pWin=0x55eeb8e26b10, fromConfigure=0)
    at ../dix/window.c:2805
#6  0x000055eeb66bc92b in UnmapWindow.isra.0 (pWin=0x55eeb8e26b10, 
    fromConfigure=fromConfigure@entry=0) at ../dix/window.c:2863
#7  0x000055eeb659d8e1 in ProcUnmapWindow (client=<optimized out>)
    at ../dix/dispatch.c:946
#8  0x000055eeb65a68f8 in Dispatch () at ../dix/dispatch.c:549
#9  0x000055eeb6527b16 in dix_main (argc=<optimized out>, 
    argv=<optimized out>, envp=<optimized out>) at ../dix/main.c:275
#10 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at ../dix/stubmain.c:34

Key values

(gdb) p pDamage
$4 = (DamagePtr) 0x55eeb9067dd0
(gdb) p pDamage->pNext
$5 = (DamagePtr) 0x55eeb9067dd0
(gdb) p *pDamage
$6 = {pNext = 0x55eeb9067dd0, pNextWin = 0x0, damage = {extents = {x1 = -1, 
      y1 = -1, x2 = -1, y2 = -1}, data = 0x55eeb6728440 <RegionEmptyData>}, 
  damageLevel = DamageReportNonEmpty, isInternal = 0, 
  closure = 0x55eeb8f8e300, isWindow = 1, pDrawable = 0x55eeb8e26b10, 
  damageReport = 0x55eeb6540bf0 <damage_report>, 
  damageDestroy = 0x55eeb6536a10 <damage_destroy>, reportAfter = 0, 
  pendingDamage = {extents = {x1 = 0, y1 = 0, x2 = 0, y2 = 0}, 
    data = 0x55eeb6728440 <RegionEmptyData>}, pScreen = 0x55eeb80131c0}

Comment 4 Olivier Fourdan 2024-05-31 13:27:14 UTC
Would it be possible for you to run a bisection in git to find the first bad commit?

I wonder if this could be  https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1300, considering it is touching damages.

Comment 5 Jan Wielemaker 2024-05-31 13:57:38 UTC
Is there a fairly mechanical way to set everything up for compiling Xwayland from the GIT sources?   If so, I'm happy to try a couple of things.   As it either takes several hours normal working or non-purposefully opening/closing/moving windows for a while to get into this situation, it is a bit tedious ...

Comment 6 Olivier Fourdan 2024-05-31 14:21:43 UTC
> As it either takes several hours normal working or non-purposefully opening/closing/moving windows for a while to get into this situation, it is a bit tedious

Yeah good point…

Maybe it'd be easier/more efficient if I was to build a test package with xorg/xserver!1300 reverted.

There it is: https://koji.fedoraproject.org/koji/taskinfo?taskID=118344390

Important: This is a scratch build, it will automatically be removed in a little while. so better grab and try it while it's there.
Also, this is a test, based on an educated guess, it may break or cause even more instability to the system, or it may very well not fix anything! This is for the sole purpose of identifying the root cause of the issue, use that at your own risk…

Comment 7 Jan Wielemaker 2024-05-31 14:57:14 UTC
Your educated guess seems to be a good one.   I have a very old legacy application that pops up small windows all over the place :)  I've been clicking windows for about 15 min and all is still fine. With the current 24 release some 5 minutes is usually plenty to "hang" it.

Thanks.   Let me know if you want more tests to be executed.

Comment 8 Olivier Fourdan 2024-06-04 15:39:39 UTC
(In reply to Jan Wielemaker from comment #7)
> Let me know if you want more tests to be executed.

Yup, actually, since you asked… :)

Mind giving that other test build a try:

https://koji.fedoraproject.org/koji/taskinfo?taskID=118562958

This takes a different approach to the problem, not reverting things.

Comment 9 Jan Wielemaker 2024-06-04 15:57:36 UTC
(In reply to Olivier Fourdan from comment #8)

> Yup, actually, since you asked… :)

Downloaded.   Will give it a try tomorrow and let you know.

Comment 10 Jan Wielemaker 2024-06-05 06:51:27 UTC
It looks promising!   Did a lot of clicking with the old application.  Still running fine.   I'll keep using this version for my normal work, so if you do not hear anything by tomorrow you can be pretty sure the problem is fixed.

Thanks a lot!

Comment 11 Olivier Fourdan 2024-06-07 09:06:36 UTC
Still working, so far? ^_~

Comment 12 Jan Wielemaker 2024-06-07 09:09:45 UTC
(In reply to Olivier Fourdan from comment #11)
> Still working, so far? ^_~

Yes.   I think we can be fairly sure you fixed the problem.   Thanks again.

Comment 13 Fedora Update System 2024-06-26 09:17:07 UTC
FEDORA-2024-bd81b79a0b (xorg-x11-server-Xwayland-24.1.0-2.fc40) has been submitted as an update to Fedora 40.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-bd81b79a0b

Comment 14 Ben Engbers 2024-06-26 14:12:27 UTC
My Lenovo is not as sophisticated as the machine from Jan, (mine has this graphics card: VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02)) but I also had problems with Wayland. After one of the last updates suddenly "Thunderbird for Wayland" had gone.

I'm glad to learn that the problem has been fixed for the NVidia GeForce card. Does this also means that it has been fixed for other cards?

Ben

Comment 15 Olivier Fourdan 2024-06-26 15:16:01 UTC
(In reply to Ben Engbers from comment #14)
> My Lenovo is not as sophisticated as the machine from Jan, (mine has this
> graphics card: VGA compatible controller [0300]: Intel Corporation HD
> Graphics 620 [8086:5916] (rev 02)) but I also had problems with Wayland.
> After one of the last updates suddenly "Thunderbird for Wayland" had gone.
> 
> I'm glad to learn that the problem has been fixed for the NVidia GeForce
> card. Does this also means that it has been fixed for other cards?

Your issue seems unrelated to that particular bug, you might want to file a bug against thunderbird.

Comment 16 Fedora Update System 2024-06-27 02:30:55 UTC
FEDORA-2024-bd81b79a0b has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-bd81b79a0b`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-bd81b79a0b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 17 Fedora Update System 2024-06-28 01:58:30 UTC
FEDORA-2024-bd81b79a0b (xorg-x11-server-Xwayland-24.1.0-2.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.