Created attachment 1206476 [details] pstack from qemu VM process Since recently I started to see an old bad behavior that was ok for some period of time. Now my host and guest are both Fedora 24. Version-Release number of selected component (if applicable): qemu-kvm-2.6.0-5.fc24.x86_64 Linux 4.6.4-301.fc24.x86_6 How reproducible: hard, usually machine should be suspended for several hours Attaching pstack. +++ This bug was initially created as a clone of Bug #1221518 +++ Description of problem: Sometimes my VMs hang with strange CPU usage pattern after wake up from suspend. I'm running fedora 20 and fedora 21 VMs. Sometimes even virtual machine manager GUI becomes unresponsive tryingto work with such VM. Strangely though after some time the machine recovers by itself (sometimes). In this occasion both machines returned to normal after calling `pstack` on them. <...>
Created attachment 1206477 [details] qemu log for the Fedora 24 VM
Created attachment 1206478 [details] the Fedora 24 VM XML dump
While filing the bug report the VM recovered from that high CPU usage. The strange thing is that I don't see any changes in pstack and QEMU log during the high usage and after machine has recovered. If you have other ideas how to debug the high CPU usage *and* forgot to say earlier *network access to VM lost*, please let me know. Perhaps I should look at network stats during the high CPU usage period.
See bug 1352992 (duplicate?). It just happened again, no suspend involved - the host and guest were fresh boots. In my case, and after this regression with F24 (it had been OK for a few months under F23), when this happens it's almost 100% when I start up the work-related Rails stack in the guest. That stack has a rather CPU-heavy initialization workload, but it normally (and usually) is done with this "legitimate" CPU peg after a few seconds.
In the minority of occurrences I also lose network access to the VM (usermode networking, qemu:///session user-run VM).
Still happens to me after upgrading to F25, qemu 2:2.7.0-8.fc25 on x86_64
Duping to 1352992 since sounds like they probably have the same root issue. Let's follow up there *** This bug has been marked as a duplicate of bug 1352992 ***
The duped bug has a different reproducing pattern, so reopening this one to track VM spin after host suspend/resume.
*** Bug 1389226 has been marked as a duplicate of this bug. ***
*** Bug 1393352 has been marked as a duplicate of this bug. ***
*** Bug 1233568 has been marked as a duplicate of this bug. ***
*** Bug 1165352 has been marked as a duplicate of this bug. ***
*** Bug 1178533 has been marked as a duplicate of this bug. ***
*** Bug 1221518 has been marked as a duplicate of this bug. ***
Aleksander, can you try this config changes: * clear the <clock> xml. fully stop the VM, then do: sudo virt-xml fedora_work --edit --confirm --clock clearxml=yes * if the issue still reproduces, fully stop the VM, then clear the <cpu> XML: sudo virt-xml fedora_work --edit --confirm --cpu clearxml=yes if the issue still reproduces, report here and we can try some more. Please try to eliminate any other variables, like other VMs running, or any additional VM config changes. thanks for your patience, I realize this has been lingering for too long...
So, I had this (or something like it) 100% reproducible and narrowed it down to kvmclock. Then I updated my f25 guest and now it's not reproducing :( Went from kernel-4.8.6 to kernel-4.9.14 in the guest. So, question for other users that are still hitting this: what host and guest are you reproducing this with? Can anyone reproduce with an up to date f25 guest? === My reproducing steps for posterity: Setup: * Up to date f25 host (kernel-4.9.14-200.fc25.x86_64) * Out of date f25 guest (kernel-4.8.6-300.fc25.x86_64). VM installed via virt-manager but with the VM config pared down to only: /usr/bin/qemu-kvm \ -no-user-config \ -nodefaults \ -cpu qemu64 \ -m 4096 \ -smp 4 \ -drive file=/var/lib/libvirt/images/fedora25.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \ -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk0 \ -vga std -usb -usbdevice tablet \ -monitor vc \ -display sdl Steps: * Launch the VM, log into standard gnome-shell desktop * Close the laptop lid * Set a timer for 20 minutes (10 and 15 minutes _dont_ reproduce, 20, 30, 60, 120 all reproduce the issue) * After timer is up, open laptop, log in to host. VM is frozen and unresponsive to any UI interaction. Top shows between 100% and 300% CPU spinning for qemu-system-x86. pstack is always completely uneventful, just showing CPU threads and main loop threads. The VM doesn't recover quickly, I waited 5 minutes once and it was still spinning before I gave up and killed it. Config variations that made no difference: disabled s3/s4, -cpu host and -cpu Broadwell, default virt-manager timer settings, -rtc clock=guest, all the default virt-manager devices like spice, agent channels, network devices, reproduces with sdl and gtk and spice UI. The only thing that avoided the issue was -cpu qemu64,-kvmclock. I never managed to make it reproduce with that setup
RHEL/Centos 7 VMs are affected as well, so I filed a bug for that: https://bugzilla.redhat.com/show_bug.cgi?id=1434566
Been silent for a couple months. So: - Anyone still hitting this? If so please report host/guest distro and kernel - Anyone that was previously hitting this _not_ hitting it anymore? Guest kernel update fixed it for me but I want to be sure it's the same for others too
I haven't hit this for a long while, definitely not after I implemented the workaround (not use usermode networking too much) for bug 1352992. I've been suspending with the Fedora guest running several times a day for months.
Okay given lack of confirmation that this is still an issue, I think it's safe to assume that latest kernels fix this, so closing
I'm experiencing the same problem with. Guest hangs with high CPU after suspended. Open connections don't work anymore and this is impossible to open new connections. HOST=Fedora 26 (Linux version 4.11.11-300.fc26.x86_64) GUEST=RHEL 7.3-36 (Linux version 3.10.0-514.26.2.el7.x86_64)
I'm also still experiencing. HOST=Fedora 25 - 4.10.15-200.fc25.x86_64 GUEST=Centos 7 - 3.10.0-693.2.2.el7.x86_64 I'd say ~80% of the time I suspend, I get 100% guest CPU usage. No noticeable difference whether I pause the VM before closing laptop.
This is a guest kernel bug, so for centos/rhel guests you should follow https://bugzilla.redhat.com/show_bug.cgi?id=1434566 Reclosing again since this is fixed for fedora guests