Cloning to RHEL. Libvirt would really like for qemu to error out if the destination is not big enough. It may be possible for libvirt to do sanity checks itself if qemu is unpatched (at which point we would want to reassign this bug to libvirt), although it feels better to get it done at the bottom of the stack.
Also, the less-than-ideal error reporting highlights a design issue - if libvirt misses the BLOCK_JOB_ERROR event (such as across a libvirtd restart), the job just silently disappears, and libvirt has no idea if it succeeded or failed. I've raised some of these concerns on the upstream list:
https://lists.gnu.org/archive/html/qemu-devel/2014-07/msg00268.html
where it was suggested that libvirt may need to start using rerror= and werror= settings to make sure the job sticks around rather than disappearing after errors, so we may need fixes in both programs after all.
+++ This bug was initially created as a clone of Bug #1114793 +++
Description of problem:
https://lists.gnu.org/archive/html/qemu-devel/2014-06/msg07377.html
I tested on F20 with fedora-virt-preview, but suspect RHEL/RHEV may benefit from cloning this bug. It would be nice when doing a diskcopy into an existing file if qemu would automatically resize the destination to be large enough, or at a bare minimum fail up front if the size is wrong. But the current behavior is to silently and successfully start the job, then fail when the destination is out of space; if management misses the 'BLOCK_JOB_COMPLETED with error' event, there is NO indication that the job failed or why.
Version-Release number of selected component (if applicable):
qemu-kvm-2.0.0-7.fc20.x86_64
How reproducible:
100%
Steps to Reproduce:
1.#!/bin/sh
cd /tmp
rm -f base.img snap1.img snap2.img copy.img
virsh destroy testvm1 2>/dev/null
# base.img <- snap1.img <- snap2.img
qemu-img create -f raw base.img 10M
qemu-img create -f qcow2 -b base.img -o backing_fmt=raw snap1.img
qemu-img create -f qcow2 -b snap1.img -o backing_fmt=qcow2 snap2.img
# set up blank space to hold the copy
touch copy.img
# cp base.img copy.img # uncomment this to see expected results
virsh create /dev/stdin <<EOF
<domain type='kvm'>
<name>testvm1</name>
<memory unit='MiB'>256</memory>
<vcpu>1</vcpu>
<os>
<type arch='x86_64'>hvm</type>
</os>
<devices>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='$PWD/snap2.img'/>
<target dev='vda' bus='virtio'/>
</disk>
<graphics type='vnc'/>
</devices>
</domain>
EOF
# check for events
virsh event testvm1 block-job --loop --timeout 10 &
pid=$!
sleep 1
# run the blockcopy
virsh blockcopy testvm1 vda --wait --verbose --raw /tmp/copy.img --reuse-external
echo job started
sleep 5
virsh blockjob testvm1 vda --abort
wait $pid
Actual results:
Block Copy: [ 0 %]event 'block-job' for domain testvm1: Block Copy for /tmp/snap2.img failed
Now in mirroring phase
job started
event loop timed out
events received: 1
error: Requested operation is not valid: No active operation on device: drive-virtio-disk0
Expected results:
Block Copy: [ 0 %]event 'block-job' for domain testvm1: Block Copy for /tmp/snap2.img ready
Block Copy: [100 %]
Now in mirroring phase
job started
event 'block-job' for domain testvm1: Block Copy for /tmp/snap2.img completed
event loop timed out
events received: 2
Additional info:
--- Additional comment from Cole Robinson on 2014-07-02 08:40:19 MDT ---
Fedora qemu bugs have much less visibility than those filed against RHEL. Since your mention of this issue on the mailing list didn't get a response yet, I'd suggest cloning or fully moving this issue to RHEL where resources are more likely to be allocated.