Bug 1579067
Summary: | GNOME on Wayland hangs after using Firefox for a bit with XWayland 1.20 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> |
Component: | xorg-x11-server | Assignee: | X/OpenGL Maintenance List <xgl-maint> |
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | alexl, bskeggs, caillon+fedoraproject, gecko-bugs-nobody, jan.steffens, jglisse, jhorak, john.j5live, jsmith.fedora, kengert, kevin, kinodont, mikhail.v.gavrilov, ofourdan, pjasicek, pmenzel+bugzilla.redhat.com, rhughes, rstrode, sandmann, vondruch, xgl-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-02 01:46:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Adam Williamson
2018-05-16 22:10:21 UTC
This actually seems to be in Xwayland. The bug can be reproduced with firefox-59.0.2-1.fc28 on Rawhide, but not with firefox-59.0.2 or firefox-60 on F28. So it's not anything that changed in Firefox. The bug does not happen with xorg-x11-server downgraded to 1.19.6-8.fc28 on Rawhide, so it's definitely XWayland at fault. So far the delta I have is that it broke somewhere between 1.19.6-8.fc28 and 1.20.0-1.fc29; I'm going to try the 1.19.99 builds now to try and get a more precise delta. Well, 1.19.99.903-1 is affected. That was the first 1.20 RC that was actually built as a Fedora package, so the smallest delta I can get just from testing builds in Koji is: it broke somewhere between 1.19.6 and 1.19.99.903. I can narrow it down a bit further tomorrow by doing my own builds, if necessary, but it'd be great if someone can maybe point at some suspect commits, or suggest some debugging steps, or just fix this magically...:) Note, a bit more precise description of what happens when the bug occurs: Firefox is basically hung, can move the cursor over its window, but can't cause anything to happen in it. Can click around an Evolution window open next to it, and type things into text entry fields...but it seems that when I click on the 'File' menu, that triggers the complete desktop hang. Once that happens I can only get in via ssh. I'm seeing the same thing in Rawhide with Firefox 60 and Wayland -- and usually it seems to be triggered by screen redraws in Firefox, such as when opening a link in a new tab. I experience the same issues :/ Opening new tab makes Xwayland run 100%. Now I am trying the f28 version of Xwayland: ~~~ $ sudo dnf downgrade --disablerepo=* --enablerepo=updates-testing --release 28 xorg-x11-server-Xwayland ~~~ It cannot be worser, right? ;) 3h later and FF still runs with the older Xwayland ... Vit: read up. I already narrowed it down to "it broke somewhere between 1.19.6 and 1.19.99.903". Quick question, do other X11 clients still work, like xterm for example? Does the same occur with F28? (I've been using the Xserver release candidates from pre-1.20 and now 1.20 and never had such a problem on F28) Some more questions, comment 0 states that “the pointer can be moved but clicking on anything doesn't work; then the pointer sticks” - You mean it won't budge at all or only in X11 apps? If the former, then it might as well be a Wayland compositor issue. Does either FF, Xwayland or gnome-shell processes take an unusual amount of CPU or memory when this occurs? Anything suspicious from those in journalctl? (Meanwhile I switched back to firefox-59.0.2-1.fc28.x86_64 also using xorg-x11-server-Xwayland-1.20.0-1.fc28.x86_64 to see if I can reproduce) (In reply to Olivier Fourdan from comment #8) > Does either FF, Xwayland or gnome-shell processes take an unusual amount of > CPU Yes, XWayland consumes 100% CPU. (In reply to Vít Ondruch from comment #9) > Yes, XWayland consumes 100% CPU. Can you try to spot where it spins in the code? Olivier: as I mentioned above, I tend to have an Evo window open next to the Firefox window. I can do some stuff in the Evo window after Firefox has gone non-responsive - like click in the search box above the message list, and type some stuff. But it seems that trying to open the File menu in the Evo window reliably triggers the complete UI freeze; after that point I can't move the pointer any more (but can still ssh into the system). I've been running Firefox (various versions, whatever comes along with Fedora 28) with Xwayland from server 1.20 (and before that, every single release candidates of Xserver 1.19.99.90x) without ever having such an issue with Firefox. Yesterday, I downgraded to firefox-59.0.2-1.fc28.x86_64 to match the version mentioned in comment 1 and I've been running just fine with dozens of tabs open of Fedora 28 since then. So there is probably more to it that just Firefox and Xwayland. Also, the fact that the whole entire session freezes makes me think that the Wayland compositor (mutter/gnome-shell) might as well play a role (rawhide uses mutter 3.29.1 whereas F28 uses 3.28.2). One possibility, since Vít mentioned in comment 9 that Xwayland is taking 100% CPU, would be to find out what it's busy doing, maybe a couple of “gstack $(pidof Xwayland)” (with debuginfo installed) would give a hint where about in the code it's wandering. Well, yes, I did report the bug as "GNOME on Wayland hangs", after all. :) I'll try to ssh in and get some info on the hung process later. Same bug on Arch Linux: https://bugs.archlinux.org/task/58705 It was triggered by the upgrade of mesa 18.0.4 to 18.1.0. I have added XWayland and Firefox backtraces to the Arch Linux bug tracker: https://bugs.archlinux.org/task/58705#comment169691 Also included is a patch that fixed the issue for me. So if this is indeed the same bug, the information there may be helpful to you. Sounds plausible indeed, I'll see if that patch also 'works' here. Thanks a lot for the cross-reference! That patch does seem to 'fix' the bug for me too. Like you I don't know if it's the correct fix or not, but certainly seems like at minimum we're hitting the same thing and you've identified the cause, so thanks much for that. Olivier, can you take it from there? well...after running fine all afternoon with the patch, I just saw a rather similar hang, but I was using calibre (the ebook handling software) at the time the system hung, rather than firefox. It seems I can start an ssh connection to the system this time - I get as far as the "Last login:" message - but it does not complete, it does not reach the shell, so I can't look where Xwayland is stuck (assuming that is what happened). I'll see if this happens again... Using magic SysRq+R to reset the keyboard mode should allow you to switch away from the VT with the hanging compositor. Can you please check if the following patches fix the issue: https://patchwork.freedesktop.org/series/43618/ Jan: all sysrq 'magic' besides sync is disabled on Fedora by default, so that wouldn't have helped. See https://fedoraproject.org/wiki/QA/Sysrq . I suppose I could enable them locally, but meh - I find they rarely actually do anything useful anyway. Olivier: will do, thanks. OK, so looks like the hang I hit when running calibre was probably unrelated - from the logs of that boot the kernel hit a GPF on plug or unplug of my book reader, so that was probably the issue there. I'm now running with the patches from https://patchwork.freedesktop.org/series/43618/ , will report if anything happens. So far it's OK and I've visited the sites that usually trigger the hang. I've been running with https://patchwork.freedesktop.org/series/43618/ here since yesterday with no lockups. Patches have landed in git upstream. https://cgit.freedesktop.org/xorg/xserver/commit/?id=3da999a https://cgit.freedesktop.org/xorg/xserver/commit/?id=4d5950c Can you backport them to Rawhide, or do you mind if I do it? Or will there be a new release soon? Thanks! I've done this now. https://koji.fedoraproject.org/koji/buildinfo?buildID=1088120 |