Bug 1519721

Summary: Both qemu and guest hang when performing live snapshot transaction with data-plane
Product: Red Hat Enterprise Linux 7 Reporter: Qianqian Zhu <qizhu>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: Qianqian Zhu <qizhu>
Severity: high Docs Contact:
Priority: high    
Version: 7.5CC: chayang, juzhang, knoel, lmiksik, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.10.0-14.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-11 00:52:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qianqian Zhu 2017-12-01 10:11:48 UTC
Description of problem:
Both qemu and guest hang when performing live snapshot transaction with data-plane. 
It only occurs when use same one iothread object for multiple block. 
If use separate iothreads for different block, the issue does not exist.


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-9.el7.x86_64
kernel-3.10.0-800.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Launch guest with 3 block devices, all start with a same iothread "iothread1":

#  /usr/libexec/qemu-kvm \
     -name 'avocado-vt-vm1'  \
     -object iothread,id=iothread1 \
     -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 \
     -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,iothread=iothread1,bus=pci.0,addr=0x3 \
     -drive id=drive_sn1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/sn1.qcow2 \
     -device virtio-blk-pci,id=sn1,drive=drive_sn1,bootindex=1,iothread=iothread1,bus=pci.0,addr=0x4 \
     -drive id=drive_sn2,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/sn2.qcow2 \
     -device virtio-blk-pci,id=sn2,drive=drive_sn2,bootindex=2,iothread=iothread1,bus=pci.0,addr=0x5 \
     -device virtio-net-pci,mac=9a:29:2a:2b:2c:2d,id=idoE4ENA,vectors=4,netdev=idgbsOIs,bus=pci.0,addr=0x6  \
     -netdev tap,id=idgbsOIs \
     -m 2048  \
     -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
     -cpu 'SandyBridge',+kvm_pv_unhalt \
     -vnc :0  \
     -qmp stdio

2. Live snapshot three block devices together within a transaction command:

{'execute': 'transaction', 'arguments': {'actions': [{'data': {'device':'drive_image1', 'snapshot-file': '/home/kvm_autotest_root/images/image1-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}, {'data': {'device': 'drive_sn1', 'snapshot-file': '/home/kvm_autotest_root/images/sn1-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}, {'data': {'device': 'drive_sn2', 'snapshot-file': '/home/kvm_autotest_root/images/sn2-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}]}, 'id': 'bYyCHqv5'}


Actual results:
Both qemu and guest hang there, qmp only shows formating the first snapshot files, and then hang there without any other returns:

Formatting '/home/kvm_autotest_root/images/image1-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16

Expected results:

Formatting '/home/kvm_autotest_root/images/image1-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
Formatting '/home/kvm_autotest_root/images/sn1-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/sn1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
Formatting '/home/kvm_autotest_root/images/sn2-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/sn2.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
{"return": {}, "id": "bYyCHqv5"}


Additional info:
If use three different iothreads object for each block as below, both qemu and guest works well, and qmp returned information is same as above expected results.

/usr/libexec/qemu-kvm \
     -name 'avocado-vt-vm1'  \
     -object iothread,id=iothread1 \
     -object iothread,id=iothread2 \
     -object iothread,id=iothread3 \
     -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 \
     -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,iothread=iothread1,bus=pci.0,addr=0x3 \
     -drive id=drive_sn1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/sn1.qcow2 \
     -device virtio-blk-pci,id=sn1,drive=drive_sn1,bootindex=1,iothread=iothread2,bus=pci.0,addr=0x4 \
     -drive id=drive_sn2,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/sn2.qcow2 \
     -device virtio-blk-pci,id=sn2,drive=drive_sn2,bootindex=2,iothread=iothread3,bus=pci.0,addr=0x5 \
     -device virtio-net-pci,mac=9a:29:2a:2b:2c:2d,id=idoE4ENA,vectors=4,netdev=idgbsOIs,bus=pci.0,addr=0x6  \
     -netdev tap,id=idgbsOIs \
     -m 2048  \
     -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
     -cpu 'SandyBridge',+kvm_pv_unhalt \
     -vnc :0  \
     -qmp stdio

Comment 2 Ademar Reis 2017-12-01 23:35:26 UTC
May be related to bug 1508708

Comment 3 Stefan Hajnoczi 2017-12-05 10:48:00 UTC
Patches posted upstream:
https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00614.html

Comment 5 Miroslav Rezanina 2018-01-02 14:19:42 UTC
Fix included in qemu-kvm-rhev-2.10.0-14.el7

Comment 7 Qianqian Zhu 2018-01-03 05:50:02 UTC
Verified with qemu-kvm-rhev-2.10.0-14.el7.x86_64,

Steps and command line same as comment 0.

Result:
Live snapshot with transaction success:
{'execute': 'transaction', 'arguments': {'actions': [{'data': {'device':'drive_image1', 'snapshot-file': '/home/kvm_autotest_root/images/image1-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}, {'data': {'device': 'drive_sn1', 'snapshot-file': '/home/kvm_autotest_root/images/sn1-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}, {'data': {'device': 'drive_sn2', 'snapshot-file': '/home/kvm_autotest_root/images/sn2-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}]}, 'id': 'bYyCHqv5'}
Formatting '/home/kvm_autotest_root/images/image1-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
Formatting '/home/kvm_autotest_root/images/sn1-snap', fmt=qcow2 size=2147483648 backing_file=/home/kvm_autotest_root/images/sn1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
Formatting '/home/kvm_autotest_root/images/sn2-snap', fmt=qcow2 size=2147483648 backing_file=/home/kvm_autotest_root/images/sn2.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16
{"return": {}, "id": "bYyCHqv5"}

Moving to VERIFIED accordingly.

Comment 8 Stefan Hajnoczi 2018-01-04 10:12:50 UTC
*** Bug 1522752 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2018-04-11 00:52:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104