Red Hat Bugzilla – Bug 1519721
Both qemu and guest hang when performing live snapshot transaction with data-plane
Last modified: 2018-04-10 20:54:29 EDT
Description of problem: Both qemu and guest hang when performing live snapshot transaction with data-plane. It only occurs when use same one iothread object for multiple block. If use separate iothreads for different block, the issue does not exist. Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.10.0-9.el7.x86_64 kernel-3.10.0-800.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Launch guest with 3 block devices, all start with a same iothread "iothread1": # /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -object iothread,id=iothread1 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,iothread=iothread1,bus=pci.0,addr=0x3 \ -drive id=drive_sn1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/sn1.qcow2 \ -device virtio-blk-pci,id=sn1,drive=drive_sn1,bootindex=1,iothread=iothread1,bus=pci.0,addr=0x4 \ -drive id=drive_sn2,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/sn2.qcow2 \ -device virtio-blk-pci,id=sn2,drive=drive_sn2,bootindex=2,iothread=iothread1,bus=pci.0,addr=0x5 \ -device virtio-net-pci,mac=9a:29:2a:2b:2c:2d,id=idoE4ENA,vectors=4,netdev=idgbsOIs,bus=pci.0,addr=0x6 \ -netdev tap,id=idgbsOIs \ -m 2048 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'SandyBridge',+kvm_pv_unhalt \ -vnc :0 \ -qmp stdio 2. Live snapshot three block devices together within a transaction command: {'execute': 'transaction', 'arguments': {'actions': [{'data': {'device':'drive_image1', 'snapshot-file': '/home/kvm_autotest_root/images/image1-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}, {'data': {'device': 'drive_sn1', 'snapshot-file': '/home/kvm_autotest_root/images/sn1-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}, {'data': {'device': 'drive_sn2', 'snapshot-file': '/home/kvm_autotest_root/images/sn2-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}]}, 'id': 'bYyCHqv5'} Actual results: Both qemu and guest hang there, qmp only shows formating the first snapshot files, and then hang there without any other returns: Formatting '/home/kvm_autotest_root/images/image1-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 Expected results: Formatting '/home/kvm_autotest_root/images/image1-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 Formatting '/home/kvm_autotest_root/images/sn1-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/sn1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 Formatting '/home/kvm_autotest_root/images/sn2-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/sn2.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 {"return": {}, "id": "bYyCHqv5"} Additional info: If use three different iothreads object for each block as below, both qemu and guest works well, and qmp returned information is same as above expected results. /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -object iothread,id=iothread1 \ -object iothread,id=iothread2 \ -object iothread,id=iothread3 \ -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,iothread=iothread1,bus=pci.0,addr=0x3 \ -drive id=drive_sn1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/sn1.qcow2 \ -device virtio-blk-pci,id=sn1,drive=drive_sn1,bootindex=1,iothread=iothread2,bus=pci.0,addr=0x4 \ -drive id=drive_sn2,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/sn2.qcow2 \ -device virtio-blk-pci,id=sn2,drive=drive_sn2,bootindex=2,iothread=iothread3,bus=pci.0,addr=0x5 \ -device virtio-net-pci,mac=9a:29:2a:2b:2c:2d,id=idoE4ENA,vectors=4,netdev=idgbsOIs,bus=pci.0,addr=0x6 \ -netdev tap,id=idgbsOIs \ -m 2048 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'SandyBridge',+kvm_pv_unhalt \ -vnc :0 \ -qmp stdio
May be related to bug 1508708
Patches posted upstream: https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00614.html
Fix included in qemu-kvm-rhev-2.10.0-14.el7
Verified with qemu-kvm-rhev-2.10.0-14.el7.x86_64, Steps and command line same as comment 0. Result: Live snapshot with transaction success: {'execute': 'transaction', 'arguments': {'actions': [{'data': {'device':'drive_image1', 'snapshot-file': '/home/kvm_autotest_root/images/image1-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}, {'data': {'device': 'drive_sn1', 'snapshot-file': '/home/kvm_autotest_root/images/sn1-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}, {'data': {'device': 'drive_sn2', 'snapshot-file': '/home/kvm_autotest_root/images/sn2-snap', 'mode': 'absolute-paths', 'format': 'qcow2'}, 'type': 'blockdev-snapshot-sync'}]}, 'id': 'bYyCHqv5'} Formatting '/home/kvm_autotest_root/images/image1-snap', fmt=qcow2 size=21474836480 backing_file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 Formatting '/home/kvm_autotest_root/images/sn1-snap', fmt=qcow2 size=2147483648 backing_file=/home/kvm_autotest_root/images/sn1.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 Formatting '/home/kvm_autotest_root/images/sn2-snap', fmt=qcow2 size=2147483648 backing_file=/home/kvm_autotest_root/images/sn2.qcow2 backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 {"return": {}, "id": "bYyCHqv5"} Moving to VERIFIED accordingly.
*** Bug 1522752 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104