Description of problem: When both the system and the block configuration support copy offloading, the backup job will use it. When the following happens: (1) the guest writes some area A to the source device, and (2) the guest then writes to a larger area B that encompasses A, but in such a way that A is not at the beginning of B, the backup job will copy area A twice (once in (1), then again in (2)). Notably, (1) has modified its contents, so the backup job must only copy it before or during (1), not afterwards. The result is that in area A, the target image will read the data as modified by (1), even though it should contain the data as it was before (1). Version-Release number of selected component (if applicable): qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3 (FWIW, also qemu-kvm-3.1.0-30.module+el8.0.1+3755+6782b0ed. Non-AV and RHV 7 are not affected, because they have disabled copy offloading.) How reproducible: Always. Steps to Reproduce: $ ./qemu-img create -f qcow2 src.qcow2 2M Formatting 'src.qcow2', fmt=qcow2 size=2097152 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ./qemu-io -c 'write -P 42 0 2M' src.qcow2 wrote 2097152/2097152 bytes at offset 0 2 MiB, 1 ops; 0.0282 sec (70.857 MiB/sec and 35.4283 ops/sec) $ cp src.qcow2 ref.qcow2 $ (echo '{"execute":"qmp_capabilities"} {"execute":"drive-backup", "arguments":{ "device":"src","job-id":"backup", "target":"tgt.qcow2","format":"qcow2", "sync":"full","speed":1048576 } } {"execute":"human-monitor-command", "arguments":{"command-line": "qemu-io src \"write -P 23 1088k 64k\"" } } {"execute":"human-monitor-command", "arguments":{"command-line": "qemu-io src \"write -P 66 1024k 1024k\"" } }'; sleep 5; echo '{"execute":"quit"}') \ | x86_64-softmmu/qemu-system-x86_64 -qmp stdio \ -blockdev file,node-name=src-file,filename=src.qcow2 \ -blockdev qcow2,node-name=src,file=src-file [QMP output] Actual results: $ ./qemu-img compare tgt.qcow2 ref.qcow2 Content mismatch at offset 1114112! Expected results: $ ./qemu-img compare tgt.qcow2 ref.qcow2 Images are identical. Additional info: Upstream, this has been broken since 3.0. I’ve sent a fix upstream, but it hasn’t appeared in the archives yet. In the meantime, here’s the Patchew link: https://patchew.org/QEMU/20190801173900.23851-1-mreitz@redhat.com/
commit 4a5b91ca024fc6fd87021c54655af76a35f2ef1e Author: Max Reitz <mreitz> Date: Thu Aug 1 19:38:59 2019 +0200 backup: Copy only dirty areas The backup job must only copy areas that the copy_bitmap reports as dirty. This is always the case when using traditional non-offloading backup, because it copies each cluster separately. When offloading the copy operation, we sometimes copy more than one cluster at a time, but we only check whether the first one is dirty. Therefore, whenever copy offloading is possible, the backup job currently produces wrong output when the guest writes to an area of which an inner part has already been backed up, because that inner part will be re-copied. Fixes: 9ded4a0114968e98b41494fc035ba14f84cdf700
Test on qemu-kvm-4.1.0-10.module+el8.1.0+4234+33aa4f57.x86_64, bug has been fixed, set its status to "Verified". Test steps: 1.Create src image #qemu-img create -f qcow2 src.qcow2 2M Formatting 'src.qcow2', fmt=qcow2 size=2097152 cluster_size=65536 lazy_refcounts=off refcount_bits=16 2.Write 2M data to src.qcow2 #qemu-io -c 'write -P 42 0 2M' src.qcow2 wrote 2097152/2097152 bytes at offset 0 2 MiB, 1 ops; 0.0282 sec (70.857 MiB/sec and 35.4283 ops/sec) 3.backup src.qcow2 via cp. #cp src.qcow2 ref.qcow2 4.Start guest with src.qcow2 /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -sandbox off \ -machine pc \ -nodefaults \ -device VGA,bus=pci.0,addr=0x2 \ -m 2048 \ -smp 10,maxcpus=10,cores=5,threads=1,sockets=2 \ -cpu SandyBridge \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -monitor stdio \ -blockdev file,node-name=src-file,filename=src.qcow2 \ -blockdev qcow2,node-name=src,file=src-file \ -qmp tcp:0:3000,server,nowait \ 5.Do full backup on src.qcow2 with speed 100 #telnet localhost 3000 {"execute":"qmp_capabilities"} {"execute":"drive-backup","arguments":{"device":"src","job-id":"backup","target":"tgt.qcow2","format":"qcow2","sync":"full","speed":1048576} } 6.During backup, write new data to src.qcow2 {"execute":"human-monitor-command","arguments":{"command-line":"qemu-io src \"write -P 23 1088k 64k\""} } {"execute":"human-monitor-command","arguments":{"command-line":"qemu-io src \"write -P 66 1024k 1024k\""} } 7.Set block job speed to 0 { "execute": "block-job-set-speed", "arguments": { "device": "backup", "speed":0}} 8.Quit vm {"execute":"quit"} 9.Image compare between tgt.qcow2 and ref.qcow2 #qemu-img compare tgt.qcow2 ref.qcow2 Images are identical.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723