Description of problem: When I boot F36 Workstation Live iso (Fedora-Workstation-Live-x86_64-36-20220310.n.0.iso) on a new default VM created in virt-manager, the Live session boots, but appears frozen. The system doesn't respond to any mouse events. It also doesn't respond to keyboard events, except Escape, which closes the initial 'Welcome to Fedora' dialog. After that, I'm unable to perform any further action. This only happens when the VM has Video: QXL (which is the default value). When I change it to Video: Virtio the Live system works as excepted. This also affects gnome-boxes, when you start the existing VM (created in virt-manager). However, if you create a new VM in gnome-boxes, it uses virtio by default, and therefore works OK. When I try this with KDE, it works OK even with qxl, and so this problem seems to be related to GNOME stack (mutter?) rather than the qxl driver itself. I also tested an older Workstation Live (Fedora-Workstation-Live-x86_64-36-20220228.n.0.iso) and then one works OK even with qxl. So the regression is quite recent. Version-Release number of selected component (if applicable): Fedora-Workstation-Live-x86_64-36-20220310.n.0.iso Packages in the VM: gnome-shell-42~beta-4.fc36.x86_64 mutter-42~beta-1.fc36.x86_64 Packages on the host: virt-manager-3.2.0-4.fc35.noarch qemu-kvm-6.1.0-14.fc35.x86_64 qemu-device-display-qxl-6.1.0-14.fc35.x86_64 How reproducible: always Steps to Reproduce: 1. create a new VM in virt-manager, confirm that it has QXL video driver (the default) 2. boot Fedora-Workstation-Live-x86_64-36-20220310.n.0.iso 3. see the welcome screen frozen - mouse doesn't work nor keyboard, except the Esc key
Created attachment 1865437 [details] VM xml from 'virsh dumpxml'
Proposed as a Blocker for 36-beta by Fedora user kparal using the blocker tracking app because: Proposing as a Beta blocker because: "The release must install and boot successfully as a virtual guest in a situation where the virtual host is running the current stable Fedora release." https://fedoraproject.org/wiki/Basic_Release_Criteria#Guest_on_current_stable_release
Quite interestingly, if I install Fedora-Workstation-Live-x86_64-36-20220310.n.0.iso using virtio (to work around the problem) and then switch the VM to qxl, I can no longer reproduce the problem with the installed system (I even tried enabling autologin, to match the Live scenario). Which means I don't know how to gather system logs, when it happens only on Live (and the system then doesn't respond to anything).
I can't reproduce this, on F36 at least. First of all, a newly-created virt-manager VM uses virtio for me, not qxl. If I create a new VM and change it to qxl before booting, I can boot fine and use the live system. I tested with the live image openQA built from the GNOME 42-rc megaupdate. Could you test with the Beta candidate, which also includes that update, and see if that works for you?
>I can't reproduce this, on F36 at least. First of all, a newly-created virt-manager VM uses virtio for me, not qxl. qxl is the default display device prior to virt-manager 4.0.0 (source: https://listman.redhat.com/archives/virt-tools-list/2022-March/017511.html ) I managed to reproduce this issue three times out of maybe 10-15 attempts. The clock show the current time so it's not completely frozen but the VM doesn't respond to keyboard and mouse events, except ESC. host: Fedora 36 disk image: Fedora-Workstation-Live-x86_64-36-20220311.n.0.iso maybe libguestfs-tools can be used to extract the journal but I'm not sure if it's compatible with snapshots.
What I would maybe try is to first boot runlevel 3, enable sshd, ssh in, then do 'systemctl isolate graphical.target'. Then if the bug reproduces you have a logged-in ssh session to grab the logs from. you could also try network logging, I guess.
I spent more time with debugging this and have some interesting findings. First of all, all the testing is done on F35 (virt-manager 3.2, qxl by default), so please note that F36 results might be different. Originally I had 100% failure rate. However, now that Carl mentioned race conditions, I can confirm this is indeed a race. But in my case, I still get almost 100% failure rate, but only when the VM is started in *user session mode*. In *system mode*, I get a failure rate similar to what Carl described, e.g. 1 failure in 5-10 attempts. All of this applies both to Fedora-Workstation-Live-x86_64-36-20220310.n.0.iso and Fedora-Workstation-Live-x86_64-36_Beta-1.1.iso. Fedora-Workstation-Live-x86_64-36-20220228.n.0.iso seems to always work fine (or maybe I'm just lucky because of the races).
I managed to log in to the stuck system using `console=ttyS0` on the boot cmdline and then connecting through `virsh console`. I can confirm that the clock on top keeps updating, so the system is not actually stuck, just all/most input seems stuck. Also, I saw an "update available" popup appear and disappear (which it shouldn't appear at all, on a Live image), and the serial console works normally, so this is not a complete system freeze. I can try to debug something from the cmdline, if you tell me where to look.
Created attachment 1865840 [details] system journal while stuck
Created attachment 1865841 [details] system journal (notice priority and above) while stuck
Another interesting find - if you wait 5 minutes without interacting with the VM, so that the screensaver kicks in (even if the screen looks frozen), the keyboard input starts working and you can interact with the OS. Mouse input is still broken, though. > I still get almost 100% failure rate, but only when the VM is started in *user session mode* I was wrong about this, it's still a race. I see it much more often in user session mode, but not nearly 100%. There are times when it works several times in a row, and times when the opposite is true.
tldr: I tried to reproduce this on Fedora 36 and I have not experienced any problems. I have been running Fedora 36 for some time already and I did not notice exactly when the default driver switched from QXL to Virtio, but currently my default option is Virtio. I changed that to QXL on both sessions, system and users, and started to run the virtual machines based on F36 Workstation. I ran 20 attempts on each session and I did not see a single freeze. The current versions are: libvirt-gconfig-4.0.0-4.fc36.x86_64 python3-libvirt-8.0.0-2.fc36.x86_64 libvirt-client-8.1.0-2.fc36.x86_64 qemu-common-6.2.0-5.fc36.x86_64 virt-manager-4.0.0-1.fc36.noarch The ISO used was the 20220313 nightly build.
Folks, if possible, can you please test on F35 using the user session mode? Thanks.
Discussed during the 2022-03-14 blocker review meeting: [0] The decision to classify this bug as a "RejectedBlocker (Beta)" was made as this is pretty bad, but since it apparently doesn't happen consistently in the most common config (system virt session) and is easy to workaround (use virtio), we think it's not bad enough to block Beta. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2022-03-14/f36-blocker-review.2022-03-14-16.01.txt
Discussed during the 2022-03-14 blocker review meeting: [0] The decision to delay the classification of this as a blocker bug was made as we were not able to reach a clear decision at this time and with the information currently available. We'll aim to have more folks test and hopefully get feedback from the developers on what may be going on here. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2022-03-14/f36-blocker-review.2022-03-14-16.01.txt
booting w/ multi-user.target and switching to graphical.target: This might be a different bug because the VM is unresponsive for ~10s... every minute or so. localhost-live kernel: qxl 0000:00:01.0: object_init failed for (3149824, 0x00000001) localhost-live kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO (fedora 36, qemu:///system session)
(In reply to Kamil Páral from comment #13) > Folks, if possible, can you please test on F35 using the user session mode? > Thanks. Using Boxes (which uses user mode) on a fully updated F35 install, the F36 candidate beta demonstrates the described behavior. [0] - https://kojipkgs.fedoraproject.org/compose/36/Fedora-36-20220319.0/compose/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-36_Beta-1.2.iso
(In reply to Brandon Nielsen from comment #17) > Using Boxes (which uses user mode) on a fully updated F35 install, the F36 > candidate beta demonstrates the described behavior. Brandon, Boxes creates new VMs with the virtio driver, and so it shouldn't affected by this bug. Unless you created the VM in virt-manager (then it has qxl), and then started it in Boxes. Can you describe how you created that VM? Also, can you open virt-manager and display details of the affected Boxes VM and check whether it has qxl or virtio video driver? Thanks. If you can reproduce this bug with a VM with the virtio driver, that would be quite an important finding.
Discussed during the 2022-03-21 blocker review meeting: [0] The decision to delay the classification of this as a Final Blocker bug was made as it's still not clear how common this bug is, so we can't really make a Final blocker determination yet. We will try to test it further after the meeting. We do accept it as a Beta freeze exception, just in case the fix is in the client side and shows up soon. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2022-03-21/f36-blocker-review.2022-03-21-16.01.txt
(In reply to Kamil Páral from comment #18) > (In reply to Brandon Nielsen from comment #17) > > Using Boxes (which uses user mode) on a fully updated F35 install, the F36 > > candidate beta demonstrates the described behavior. > > Brandon, Boxes creates new VMs with the virtio driver, and so it shouldn't > affected by this bug. Unless you created the VM in virt-manager (then it has > qxl), and then started it in Boxes. Can you describe how you created that > VM? Also, can you open virt-manager and display details of the affected > Boxes VM and check whether it has qxl or virtio video driver? Thanks. > > If you can reproduce this bug with a VM with the virtio driver, that would > be quite an important finding. Let's ignore that report. virtio graphics on that machine just seem all kinds of broken, no matter the guest. I cannot reproduce this bug on a Fedora 35 host installing the beta 1.3 compose[0] as guest on either of the two machines I tested on. Using QXL and virt-manager, both system and user sessions work fine. The resulting installs work fine as well. [0] - https://kojipkgs.fedoraproject.org/compose/36/Fedora-36-20220320.0/compose/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-36_Beta-1.3.iso
I figured I'd try what I currently had installed on my baremetal test machine before I attempted an install as Kamil did (F35 host). I tested this on a host system running Fedora-Workstation-Live-x86_64-36-20220321.n.0.iso, attempting to install Fedora-Workstation-36_Beta-1.3 using virt-manager. I could not reproduce this bug under these conditions. My virt-manager defaults to using Virtio as the video driver, and not QXL. In order to replicate this test as it was originally run, I had to manually change the video driver to QXL. Installed (virtualized) system: Fedora-Workstation-Live-x86_64-36_Beta-1.3.iso gnome-shell-42~rc-2.fc36.x86_64 mutter-42~rc-5.fc36.x86_64 Host system: Fedora-Workstation-Live-x86_64-36-20220321.n.0.iso virt-manager-4.0.0-1.fc36.noarch qemu-kvm-6.2.0-5.fc36.x86_64 qemu-device-display-qxl-6.2.0-5.fc36.x86_64
Yeah, we know virt-manager on F36 defaults to virtio. On F35 it defaults to qxl (AIUI).
As another datapoint, I tested this on 2 additional PCs. On the first PC with Fedora 36, I couldn't reproduce the issue, even though I performed ~10 VM boots using the user session mode and ~10 VM boots using the system mode. All of this while having qxl graphics, of course. On the second PC with Fedora 35, I reproduced the issue easily. It happened 6 times out of 10 boots, both in the user session mode and in the system mode. So at least in my testing, it seems the issue is much more likely to occur on F35 than on F36.
I actually saw something that looked rather like this in passing while working on https://bugzilla.redhat.com/show_bug.cgi?id=2066424 , with an *F35* guest image. Kamil, have you tried reproducing this using an F35 guest image, in the setups where you can easily reproduce it?
I tested with Fedora-Workstation-Live-x86_64-35-1.2.iso and saw no issues in 10 boots. I also re-checked Fedora-Workstation-Live-x86_64-36-20220228.n.0.iso as mentioned in comment 0 and again saw no issues in 10 boots. So I'm quite certain the regression is at maximum 1 month old.
Thanks, I guess I must've hit something different. I'll try this again with F36 images later today.
BTW, another thing that might be interesting - does the bug happen with current Rawhide images?
Tested the Workstation beta 1.4 compose[0] on yet another machine with virt-manager and QXL. Could not reproduce the issue with either a user or system session. [0] - https://kojipkgs.fedoraproject.org/compose/36/Fedora-36-20220322.0/compose/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-36_Beta-1.4.iso
Created attachment 1867693 [details] rpm diff between 0228 and 0307 Fedora-Workstation-Live-x86_64-36-20220228.n.0.iso <- works Fedora-Workstation-Live-x86_64-36-20220307.n.0.iso <- broken Unfortunately images older than 0307 have already been cleaned up in Koji, so I can't test them. See the attached diff for an overview of changes packages between 0228 and 0307. Most notably, GNOME packages were upgraded from 42 alpha to 42 beta. I also found out that 'nomodeset' boot argument ("basic graphics mode") avoids this issue. So this is something related either to graphics acceleration or wayland (because nomodeset starts X11).
(In reply to Adam Williamson from comment #27) > BTW, another thing that might be interesting - does the bug happen with > current Rawhide images? Fedora-Workstation-Live-x86_64-Rawhide-20220308.n.0.iso -> broken Fedora-Workstation-Live-x86_64-Rawhide-20220322.n.0.iso -> broken Unfortunately older images than 0308 have already been cleaned up by Koji.
Beta is signed off, so no point to the FE proposal any more.
Discussed during the 2022-03-28 blocker review meeting: [0] The decision to classify this bug as an "AcceptedBlocker (Final)" was made as it violates the following criterion: "A system installed with a release-blocking desktop must boot to a log in screen where it is possible to log in to a working desktop...", when running on a default F35 or earlier virt-manager VM and hitting the bug. [0] https://meetbot.fedoraproject.org/fedora-blocker-review/2022-03-28/f36-blocker-review.2022-03-28-16.00.txt
Upstream issue filed - https://gitlab.gnome.org/GNOME/mutter/-/issues/2201 . Kamil, please correct any errors and add anything I missed :)
As a follow up, see bug 2071226 (in particular bug 2071226 comment 8) - I believe this still happens and other people are hitting it, but it's partially obscured by another issue - an autologin problem.
virt-manager change to virtio was merged for f35: https://bodhi.fedoraproject.org/updates/FEDORA-2022-fec53b10e3
I wonder if it'd be possible to inject WAYLAND_DEBUG=1 somewhere in the environment (e.g. /etc/environment) and try to reproduce the bug again (e.g. try to click and type in the client). I am not sure what the root of this issue might be yet, so it would be good to see what events are actually reaching to the client.
Carlos, for me this only happens on the Live image and not the installed system, so I'm not sure how to modify the environment. I guess I'd have to build my own Live image. I can try to do that, but since it seems we'll work around this a bit by changing the virt-manager default to virtio, I need to put out some other blocker-related fires first, before working on this one :-/
Lifting AcceptedBlocker as workaround https://bodhi.fedoraproject.org/updates/FEDORA-2022-fec53b10e3 landed in F35.
If you can get access to a console as described in https://bugzilla.redhat.com/show_bug.cgi?id=2063156#c8 (didn't work here, it fails with "error: operation failed: Active console session exists for this domain") you could maybe install a few packages (debug symbols, gdb) and do some digging by attaching gdb to the gnome-shell process.
Discussed during the 2022-04-11 blocker review meeting: [1] The decision to classify this bug as an AcceptedFreezeException was made: "It is a noticeable issue that cannot be fixed with an update." [1] https://meetbot-raw.fedoraproject.org/fedora-blocker-review/2022-04-11/f36-blocker-review.2022-04-11-16.00.log.txt
Seems to be related to the "cover pane" used during startup in gnome-shell; changing component.
FEDORA-2022-d0c4cc0d54 has been submitted as an update to Fedora 36. https://bodhi.fedoraproject.org/updates/FEDORA-2022-d0c4cc0d54
(In reply to Fedora Update System from comment #42) > FEDORA-2022-d0c4cc0d54 has been submitted as an update to Fedora 36. > https://bodhi.fedoraproject.org/updates/FEDORA-2022-d0c4cc0d54 I created a custom Workstation Live ISO containing this update, and started it 10 times on F35 and 10 times on F36 (using qxl). There was no issue on any of the systems, so I believe this is now fixed. If anyone else want to test, here's the ISO: https://fedorapeople.org/groups/qa/rhbz2063156.iso
FEDORA-2022-d0c4cc0d54 has been pushed to the Fedora 36 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2022-d0c4cc0d54` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-d0c4cc0d54 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2022-d0c4cc0d54 has been pushed to the Fedora 36 stable repository. If problem still persists, please make note of it in this bug report.