Description of problem: Blockcopy --wait doesn't seem to wait for the mirroring phase when using qemu 2.2.x which causes scripts and other functions such as blockcopy --pivot to fail. Version-Release number of selected component (if applicable): qemu-2.2.1 libvirt-1.2.14 python 2.7.5 compiled on slackware 14.1 How reproducible: Always: Steps to Reproduce: 1. install qemu-2.2.1, libvirt-1.2.14 root@wasvirt2:~# virsh blockcopy test vda /datastore/test.qcow2 --verbose --wait Now in mirroring phase Immediately reports it's in mirroring phase without showing the block copy progress. You can check the block copy progress like this: root@wasvirt2:~# virsh blockjob test vda --info Block Copy: [ 58 %] We should never see this given we passed a --wait. Expected results: virsh blockcopy test vda /datastore/test.qcow2 --verbose --wait Block Copy: [100 %] Then virsh unblocks, and we can do stuff with the mirrored blockstore. Additional info: This breaks --pivot, because libvirt doesn't wait for the mirror to complete before trying to pivot: root@wasvirt2:~# virsh blockcopy test vda /glustervol1/vm/test/test.qcow2 --verbose --wait --pivot error: failed to pivot job for disk vda error: internal error: unable to execute QEMU command 'block-job-complete': The active block job for device 'drive-virtio-disk0' cannot be completed If I revert back to qemu 2.1.x then this works fine. If I do the block copy without --wait, and then call --pivot it works fine: root@wasvirt2:~# virsh blockcopy test vda /datastore/test.qcow2 --verbose --wait Now in mirroring phase root@wasvirt2:~# virsh blockjob test vda --info Block Copy: [ 96 %] root@wasvirt2:~# virsh blockjob test vda --info Block Copy: [100 %] root@wasvirt2:~# virsh blockjob test vda --pivot root@wasvirt2:~# virsh blockjob test vda --info No current block job for vda Someone else referenced it here: https://www.redhat.com/archives/libvirt-users/2015-January/msg00010.html
I can confirm that this issue exists while using qemu 2.3.0 as well. I did a little more debugging today running these commands: root@wasvirt2:/# date; virsh blockcopy test vda /datastore/test.qcow2 --wait --verbose Tue Apr 28 15:46:06 AKDT 2015 Now in mirroring phase root@wasvirt2:/# date Tue Apr 28 15:46:09 AKDT 2015 root@wasvirt2:/# virsh blockjob test vda Block Copy: [100 %] root@wasvirt2:/# date; virsh blockjob test vda --pivot Tue Apr 28 15:46:59 AKDT 2015 You can see that libvirt announces that it's in mirroring phase without waiting. Attached is the debug with timestamps showing that it didn't actually get to mirrored phase until 2015-04-28 23:46:25. I can provide backtraces if requested, just need to know where to break. Hopefully this debug will help.
Created attachment 1019911 [details] debug output with timestamps
This sounds a lot like bug 1197592; it may be the same data race. Immediately after creating a job, qemu reports job stats as 0 complete of 0 total; only after a little time has elapsed does it start reporting 0 complete of non-zero total. However, libvirt is treating complete == total as done, even though that only makes sense for a non-zero size (or for a no-op job that really has nothing to do). I'm still trying to figure out if the fix should be in qemu, libvirt, or both.
The RHEL bug Eric linked to mentions a few commits that should fix this, starting with: commit 86c4df83b913dd73b79caeed2038291374384dc5 Author: Michael Chapman <mike.org> Date: Wed Jan 27 13:24:54 2016 +1100 virsh: improve waiting for block job readiness So closing as CURRENTRELEASE