Bug 1446498

Summary: Guest freeze after live snapshot with data-plane
Product: Red Hat Enterprise Linux 7 Reporter: Qianqian Zhu <qizhu>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: Qianqian Zhu <qizhu>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: chayang, famz, juzhang, knoel, michen, pbonzini, qizhu, virt-maint, xfu
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-5.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:35:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qianqian Zhu 2017-04-28 08:47:45 UTC
Description of problem:
Guest hang after live snapshot with data-plane

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-1.el7.x86_64
kernel-3.10.0-643.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Launch guest with data-plane:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -vga cirrus  \
    -object iothread,id=iothread0 \
    -device virtio-scsi-pci,iothread=iothread0,id=virtio_scsi_pci0,bus=pci.0,addr=03 \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel74-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:ac:ad:ae:af:b0,id=idqKTtyC,vectors=4,netdev=idBAxKKY,bus=pci.0,addr=04  \
    -netdev tap,id=idBAxKKY,vhost=on \
    -m 4096  \
    -smp 4,cores=2,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -monitor stdio \
    -qmp tcp::5555,server,nowait

2. Live snapshot:
{'execute': 'blockdev-snapshot-sync', 'arguments': {'device': 'drive_image1', 'snapshot-file': '/home/kvm_autotest_root/images/sn1', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'id': 'Wh5neQ4H'}

3. Check guest

Actual results:
Guest freeze, no response to mouse and keyboard. Tt's able to ping, but can't ssh login.

Expected results:
Guest works well.

Additional info:
qemu-kvm-rhev-2.6.0-27.el7.x86_64 works well, so this is a regression.

Comment 3 Fam Zheng 2017-05-04 08:49:08 UTC
Can you collect the backtrace of all threads when it hangs?

Comment 4 Fam Zheng 2017-05-04 10:09:19 UTC
Stefan managed to reproduce this and we've located the root cause. Backtrace doesn't really help here, the issue is the iothread didn't get awakened correctly by aio_enable_external(). Clearing the needinfo therefore.

A fix will need to be worked on upstream.

Comment 5 Stefan Hajnoczi 2017-05-04 10:24:41 UTC
I've posted a patch upstream ("[PATCH] aio: add missing aio_notify() to aio_enable_external()") and will backport it.

Comment 6 Stefan Hajnoczi 2017-05-15 14:26:45 UTC
Patch posted downstream.

Comment 7 Miroslav Rezanina 2017-05-16 13:03:53 UTC
Fix included in qemu-kvm-rhev-2.9.0-5.el7

Comment 9 Qianqian Zhu 2017-05-17 02:35:26 UTC
Verified on:
qemu-kvm-rhev-2.9.0-5.el7.x86_64
kernel-3.10.0-640.el7.x86_64

Steps same as comment 0, add block stream test afterwards.
Result:
As expected, guest works well after live snapshot and block stream.

Moving to VERIFIED.

Comment 11 errata-xmlrpc 2017-08-02 04:35:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392