Bug 710378 - F15/F16 guest VMs hangs with 100% CPU when host resumes from long suspend
Summary: F15/F16 guest VMs hangs with 100% CPU when host resumes from long suspend
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Amit Shah
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 767498
TreeView+ depends on / blocked
 
Reported: 2011-06-03 09:08 UTC by Tomas Mraz
Modified: 2012-09-05 13:40 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 767498 947727 (view as bug list)
Environment:
Last Closed: 2012-09-05 13:40:30 UTC
Type: ---


Attachments (Terms of Use)

Description Tomas Mraz 2011-06-03 09:08:31 UTC
I have multiple VMs running in qemu-kvm on F13 x86_64 host machine. If the machine is suspended and resumed after some longer time the F15 x86_64 guest VM is always taking full CPU and does not respond except to ping. The various other VMs (with F14, RHEL5 and RHEL6) run fine.

Comment 1 Jens Petersen 2011-06-10 01:30:58 UTC
Yes rather annoying and has been happening from before F15Beta iirc.

Comment 2 Jens Petersen 2011-06-10 01:34:31 UTC
I don't think the host OS matters much: I have seen it on
both F14 and F15 x86_64 hosts, but problem only occurs for
F15 guests as Tomas also mentioned (both i686 and x86_64).

Comment 3 Jens Petersen 2011-06-10 01:37:05 UTC
I have tried leaving a "top -b" process running as you once suggested
but haven't gotten an useful output yet from that.

Is there anything else we can try to get more info on this problem?

Comment 4 Jens Petersen 2011-07-07 07:39:52 UTC
Increasing severity and priority in the hope this might get some attention.

Also happens with f16 rawhide guests.

Comment 5 Jens Petersen 2011-07-23 13:46:10 UTC
Oh my rawhide guest resumed now!
It is running kernel-3.0-0.rc7.git3.1.fc16.x86_64.

Comment 6 Tomas Mraz 2011-07-26 08:02:50 UTC
It did not for me with kernel-3.0.0-1.fc16.x86_64. So no change here. Although the hang was not complete - ping works, even ssh can connect but not finish the login, gpm on text console can move the mouse cursor but getty does not react to keyboard.

Comment 7 Jens Petersen 2011-10-18 07:19:30 UTC
Yeah I think I was too optimistic too soon: after that
I have still seen issues too.

Comment 8 Jens Petersen 2011-10-21 00:02:29 UTC
Happened again for me today with Fedora-16-Nightly-20111019.10-x86_64-Live-desktop guest.

Comment 10 Ilari Stenroth 2011-12-10 09:09:26 UTC
I wonder if this is related to guest clock steal time accounting. Linux 3.0-rc6 introduced steal time accounting to KVM. But if this happens with kernels prior 3.0 then it's probably not related to steal time accounting.
I don't have a hibernateable system with KVM guests so I can not try out myself. If you will try to boot guest kernel with parameter "no-steal-acc" and see if something changes.

Comment 11 Tomas Mraz 2011-12-12 07:51:02 UTC
No, I was seeing it with kernel-2.6.38 already. I'm unsure what version started it with though.

Comment 12 Tomas Mraz 2011-12-12 07:54:12 UTC
Also I've always suspected some interaction of kernel with systemd to be the cause because in F14 we had upstart as init. I do not want to say that systemd is the culprit here I'd rather say that systemd does something with kernel that upstart did not do that triggers the bug in kernel.

I'll try some distro with upstart init and recent kernel.

Comment 13 Tomas Mraz 2011-12-14 08:35:04 UTC
Tried Linux Mint 12 with 3.0.0 kernel and upstart - does not hang. Currently updated F16 still hangs.

The possible culprit might be usage of cgroups. Systemd uses them, upstart does not.

Comment 14 Josh Boyer 2012-06-06 13:24:31 UTC
Moving this to F16 for now.

Is this still being seen in current F16/F17/rawhide?

Comment 15 Tomas Mraz 2012-06-06 13:32:30 UTC
I did not see the hang recently. So if it happens at all it must be much much less often than before. But I've also changed the host machine hardware so it might be related to this change.

Comment 16 Josh Boyer 2012-09-05 13:40:30 UTC
Closing per comment #15.  There were also recent changes to prevent libvirt from getting stuck on one CPU after resume.


Note You need to log in before you can comment on or make changes to this bug.