Bug 1434566

Summary: rhel7 kvm guest is unresponsive after host suspend/resume
Product: Red Hat Enterprise Linux 7 Reporter: Cole Robinson <crobinso>
Component: kernelAssignee: David Hildenbrand <dhildenb>
kernel sub component: KVM QA Contact: FuXiangChun <xfu>
Status: CLOSED WONTFIX Docs Contact:
Severity: unspecified    
Priority: unspecified CC: akostadi, amigo.elite, amit, areis, chayang, crobinso, hhuang, juzhang, michen, svanders, virt-maint
Version: 7.4Keywords: Reopened
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1568487 (view as bug list) Environment:
Last Closed: 2018-07-20 10:48:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 923626, 1568487    

Description Cole Robinson 2017-03-21 18:42:37 UTC
TLDR: Run a RHEL7 VM on F25 host, login to gnome-shell in the guest, suspend your host for 20 mins, resume the host, VM is hung and unresponsive and CPU is spinning like mad. 100% reproducible for me. -cpu X,-kvmclock avoids the issue. Looks to be fixed in kernel 4.9 so maybe this just needs a backport for RHEL


This bug has been affecting Fedora host/guest since F21 timeframe it seems, see bug 1380893 and all the dupes for some context. bug 1380893#c19 in particular has all my testing. Reproducing command line:

/usr/bin/qemu-kvm \
  -no-user-config \
  -nodefaults \
  -cpu qemu64 \
  -m 4096 \
  -smp 4 \
  -drive file=/var/lib/libvirt/images/fedora25.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \
  -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk0 \
  -vga std -usb -usbdevice tablet \
  -monitor vc \
  -display sdl

Steps:
* Launch the VM, log into standard gnome-shell desktop
* Close the laptop lid
* Set a timer for 20 minutes (10 and 15 minutes _dont_ reproduce, 20, 30, 60, 120 all reproduce the issue)
* After timer is up, open laptop, log in to host.

VM is frozen and unresponsive to any UI interaction. Top shows between 100% and 300% CPU spinning for qemu-system-x86. pstack is always completely uneventful, just showing CPU threads and main loop threads. The VM doesn't recover quickly, I waited 5 minutes once and it was still spinning before I gave up and killed it.

The issue can be avoided with -cpu qemu64,-kvmclock

I tried both F25 qemu 2.7 and qemu.git as of yesterday; both reproduced the problem with

On Fedora kernel-4.8.6 reproduced the problem but kernel-4.9.14 appears to be fixed! So hopefully this just needs a backport

Comment 2 David Hildenbrand 2017-04-06 08:36:55 UTC
Just a couple of question to see if I got this right:

1. You're running Fedora with kernel-4.8.6 as hypervisor and can produce a guest hang?

2. If you switch to kernel-4.9.14 in Fedora (hypervisor), you cannot reproduce the guest hang?

3. This happens with a RHEL guest. Does this also happen with other guests (I assume yes, from the mentioned bugzillas)?

4. Nothing in the guest has to be changed to make it work. Only in the hypervisor - Fedora - kernel?


For now, this sounds like a bug in the Fedora kernel to me. Not something related to RHEL.

Comment 3 David Hildenbrand 2017-04-06 08:46:00 UTC
Looking at bz1380893, I assume:

Setup:
* Up to date f25 host (kernel-4.9.14-200.fc25.x86_64)
* Latest RHEL 7.4

Did you try with RHEL 7.3?

Comment 4 Chao Yang 2017-05-22 10:00:58 UTC
Hi Cole, 

Could you possibly reply to Comment 2 and 3?

Comment 5 Cole Robinson 2017-05-30 23:56:49 UTC
(In reply to David Hildenbrand from comment #3)
> Looking at bz1380893, I assume:
> 
> Setup:
> * Up to date f25 host (kernel-4.9.14-200.fc25.x86_64)
> * Latest RHEL 7.4

Yes, f25 host and latest (at the time) rhel7.4 guest, with kernel 3.10.0-612.el7. 

I've also just reproduced with f26 host (kernel-4.11.1-300.fc26) and rhel7 guest (kernel-3.10.0-666.el7)

> Did you try with RHEL 7.3?

No, neither on the host or the guest. I will try rhel7.3 guest. But I don't have any hosts running RHEL so I would appreciate if someone else can give that case a spin, if it's relevant.


Note, the only way I found to 'fix' this was with update the kernel in the fedora 25 VM (from kernel-4.8.6 to kernel-4.9.14), which made me think this is more about the guest kernel than the host kernel, hence the RHEL bug

Comment 6 Cole Robinson 2017-05-31 01:42:25 UTC
Confirmed as well with f26 host + rhel 7.3 guest (kernel 3.10.0-514.el7)

Comment 20 Vladimir Stackov 2018-07-19 22:50:10 UTC
This bug is still actual on updated 7.5, so why WONTFIX?
If you need any information I can provide it.

Comment 21 David Hildenbrand 2018-07-20 10:48:07 UTC
As given in comment 18 by Ademar, this is not supported. Therefore "WONTFIX".