Description of problem: When I run a VirtualBox (an X11 application, built with Qt) VM via Xwayland, the Xwayland process appears to leak fds whenever a substantial amount of the VM's virtual display is redrawn. Eventually, Xwayland hits the (default in Fedora) 1024 FD limit and, in due course, falls over. When running a GNOME-on-Wayland session, this crashes GNOME Shell and I lose my session. I can reproduce the problem under Weston: in that case, Weston itself survives. The FDs that get (apparently) leaked are those created by os_create_anonymous_file(), based on correlating strace output with the contents of /proc/<pid>/fd. For example: [pid 20626] open("/run/user/1000/xwayland-shared-Izz5lm", O_RDWR|O_CREAT|O_EXCL, 0600) = 5 [pid 20626] fcntl(5, F_GETFD) = 0 [pid 20626] fcntl(5, F_SETFD, FD_CLOEXEC) = 0 [pid 20626] unlink("/run/user/1000/xwayland-shared-Izz5lm") = 0 [pid 20626] fcntl(5, F_DUPFD_CLOEXEC, 512) = 943 [pid 20626] close(5) = 0 and fd 943 is leaked. This could of course be a bug in VirtualBox. If that were the case, I would expect to see it also leaking FDs (which I do not see based on a script which monitors all the relevant /proc/<pid>/fd directories every second); and I'd rather my GNOME session didn't crash because of a buggy app :-) Version-Release number of selected component (if applicable): xorg-x11-server-Xwayland-1.18.4-4.fc24.x86_64 VirtualBox-5.1-5.1.4_110228_fedora24-1.x86_64 I believe this wasn't a problem when I had the following versions installed, but I haven't had a chance to try downgrading to test. xorg-x11-server-Xwayland-1.18.4-1.fc24.x86_64 VirtualBox-5.1-5.1.2_108956_fedora24-1.x86_64 I have also not found another X11 application which triggers this behaviour, though I confess I haven't tried very many. How reproducible: always Steps to Reproduce: 1. Launch a VirtualBox VM under Xwayland 2. Cause redraws within the VM; open menus in the VirtualBox window; generally cause activity. Watch the number of open FDs in the Xwayland process increase 3. Eventually see the following output from Xwayland as it dies: (EE) Fatal server error: (EE) dup failed: Too many open files (EE)
I(In reply to Will Thompson from comment #0) > [...] > > The FDs that get (apparently) leaked are those created by > os_create_anonymous_file(), based on correlating strace output with the > contents of /proc/<pid>/fd. For example: > > [pid 20626] open("/run/user/1000/xwayland-shared-Izz5lm", > O_RDWR|O_CREAT|O_EXCL, 0600) = 5 > [pid 20626] fcntl(5, F_GETFD) = 0 > [pid 20626] fcntl(5, F_SETFD, FD_CLOEXEC) = 0 > [pid 20626] unlink("/run/user/1000/xwayland-shared-Izz5lm") = 0 > [pid 20626] fcntl(5, F_DUPFD_CLOEXEC, 512) = 943 > [pid 20626] close(5) = 0 That's create_tmpfile_cloexec() in xwayland-shm.c > and fd 943 is leaked. If it's leaked, then it means the close() in xwl_shm_destroy_pixmap() is not called, either because xwl_shm_destroy_pixmap() itself is not called or because xwl_pixmap is NULL or pixmap->refcnt is not equal to 1. If we end up in the xwl_shm_*_pixmap() code, it's because either we are not using glamor (unlikely) or it's the cursor (as xwl_realize_cursor() calls in xwl_shm_create_pixmap() directly) - That's more likely.
Now, it's also possible that the application might be leaking pixmaps. Do you see a unusual high number of pixmaps, increasing in "xrestop" for that application?
(In reply to Olivier Fourdan from comment #2) > Now, it's also possible that the application might be leaking pixmaps. > > Do you see a unusual high number of pixmaps, increasing in "xrestop" for > that application? Good question. In `xrestop` I don't see "Pxms" increasing but I do see "Misc" increasing whenever the FD count in Xwayland increases. I suppose it would make sense for Xwayland (serving many clients) to run out of FDs before the one client does. I will try varying the version of VirtualBox – perhaps it's a regression there.
Note that creating an X resource does not create a file descriptor for the client, it's the X server who allocates resources on behalf on its clients and in the case of Xwayland also open a file descriptor with using shm.
I am try to reproduce, downloaded and installed virtualbox, running on Wayland, but I don't see the number of file descriptors increasing unfortunately. Could you lease attach the output of "journalctl -b -0 -t org.gnome.Shell.desktop" (to see if Xwayland is using glamor) and the output of "lsof" as well?
I *can* reproduce the "dup failed: Too many open files" error, but it's not a "leak" in the sense that it's not a single app allocating consituously new pixmaps, I can reproduce by simply running several X11 based apps that use several cursors each. Xwayland will allocate the resources on behalf of its clients, so the more clients, the more resources for the single Xwayland process. As the cursor code uses a xwl_shm_*_pixmap() which will open a new file descriptor each time, it doesn't take that many X11 apps to reach the limit of file descriptors. But then, there's Rui's patch: https://patchwork.freedesktop.org/patch/72738/
(In reply to Olivier Fourdan from comment #5) > I am try to reproduce, downloaded and installed virtualbox, running on > Wayland, but I don't see the number of file descriptors increasing > unfortunately. Your guess that it's probably the cursor is correct. When I mouse over the password field in the (Windows) VM login screen, for example, the VM cursor changes to an I-bar and I get 23 calls to xwl_shm_create_pixmap() as below and the open FD count on Xwayland increases by 23. Thread 1 "Xwayland" hit Breakpoint 1, 0x0000000000426d00 in xwl_shm_create_pixmap () #0 0x0000000000426d00 in xwl_shm_create_pixmap () #1 0x00000000004268df in xwl_realize_cursor () #2 0x00000000004ee9ee in AnimCurRealizeCursor () #3 0x000000000054a28a in RealizeCursorAllScreens () #4 0x000000000054a718 in AllocARGBCursor () #5 0x00000000004eab1c in ProcRenderCreateCursor () #6 0x0000000000556c1f in Dispatch () #7 0x000000000055ac43 in dix_main () #8 0x00007fbd71c67731 in __libc_start_main () from /lib64/libc.so.6 #9 0x0000000000423919 in _start () > Could you lease attach the output of "journalctl -b -0 -t > org.gnome.Shell.desktop" (to see if Xwayland is using glamor) Here's the important bit: Sep 07 08:22:32 tensionsheet org.gnome.Shell.desktop[1971]: glamor: EGL version 1.4 (DRI2): > and the output > of "lsof" as well? Coming up. The offending Xwayland process has pid 1978. > I *can* reproduce the "dup failed: Too many open files" error, but it's > not a "leak" in the sense that it's not a single app allocating consituously > new pixmaps, I can reproduce by simply running several X11 based apps that > use several cursors each. You're more patient than me! I experimented with the gtk3-demo's cursor demo app, but couldn't get the same behaviour.
(In reply to Will Thompson from comment #7) > Your guess that it's probably the cursor is correct. When I mouse over the > password field in the (Windows) VM login screen, for example, the VM cursor > changes to an I-bar and I get 23 calls to xwl_shm_create_pixmap() as below > and the open FD count on Xwayland increases by 23. That sounds like a leak from the client though, it the client calls XRenderCreateCursor() without ever freeing them, this is a bug in the client.
So I think we might have two problems actually: - An X client who's leaking cursors, which causes Xwayland to reach the maximum number of open file descriptors (because Xwayland opens a file descriptor for each new cursor) - This should be seen in xrestop, cursors show up as "Misc" iirc. - Xwayland keeping the file descriptors open for as long as the pixmap exist when using shm. We can (or try to) improve the latter, but the former will inevitably cause trouble and should be fixed in the application (if the application is actually leaking cursors, of course)
Created attachment 1198581 [details] PyQt5 sample to reproduce this problem Okay, I wrote a minimal PyQt5 app which exhibits the same problem under Xwayland when run with `WAYLAND_DISPLAY= ./cursors`, in an attempt to take VirtualBox out of the picture. Repeatedly click the button (which creates a new cursor from a pixmap, and sets it as the cursor for the button). I don't see what more the application could do to clean up the old cursors (though I confess to not being a (Py)Qt expert!) and yet I see the same problem here: the app seems to leak the cursors.
(In reply to Will Thompson from comment #10) > the app seems to leak the > cursors. Indeed, it calls xcb_render_create_cursor() but never xcb_free_cursor(). I guess this could be a Qt5 bug, a PyQt5 bug, or (most likely, of course!) a bug in this little app.
Same thing, no Python: https://bugzilla.redhat.com/show_bug.cgi?id=1373451
Apparently today's the day for copy-paste mistakes: https://github.com/wjt/cursors
And yet xcb_free_cursor() should be called from the destructor ~QXcbCursor() https://github.com/qt/qtbase/blob/dev/src/plugins/platforms/xcb/qxcbcursor.cpp#L334
Indeed. I can't spend any longer on this today, but will dig further later in the week…
Robin Burchell discovered that there is only one QXcbCursor instance per screen, so the cache lasts (essentially) the lifetime of the application (or until you hot-unplug the screen). He's working on a patch to Qt.
https://codereview.qt-project.org/#/c/170426/ makes Qt cache only one pixmap cursor at a time, rather than all of them forever. I've tested this patch with both my toy test app and the original VirtualBox VM, and it seems to do the trick. I haven't read the patch at https://patchwork.freedesktop.org/patch/72738/ closely but I agree with you in principle :-)
Brilliant, thanks for your follow up on the Qt side! That will be beneficial on X native as well...
The Qt patch has been merged, we can apparently expect that it will be included in 5.6.3 in a couple of months. (NB. the next release is 5.6.2.)
So there are two issues here (comment 9), one is a bug in Qt and another one was a weakness in Xwayland that would keep the FD open as long as the cursor pixmap is used. Rui's patch for the cursor which would greatly mitigate the issue has been merged upstream and is part of the latest F25 package xorg-x11-server-Xwayland-1.19.0-0.3.20161026.
Closing as current release, the Xwayland issue is fixed in 1.19.x.