Bug 1736554 - Backup block job produces wrong output with copy offloading
Summary: Backup block job produces wrong output with copy offloading
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.1
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: ---
Assignee: Hanna Czenczek
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-01 18:53 UTC by Hanna Czenczek
Modified: 2019-11-06 07:18 UTC (History)
6 users (show)

Fixed In Version: qemu-kvm-4.1.0-10.module+el8.1.0+4234+33aa4f57
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-06 07:18:12 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 0 None None None 2019-11-06 07:18:45 UTC

Description Hanna Czenczek 2019-08-01 18:53:08 UTC
Description of problem:

When both the system and the block configuration support copy offloading, the backup job will use it.  When the following happens:

(1) the guest writes some area A to the source device, and
(2) the guest then writes to a larger area B that encompasses A, but in such a way that A is not at the beginning of B,

the backup job will copy area A twice (once in (1), then again in (2)).  Notably, (1) has modified its contents, so the backup job must only copy it before or during (1), not afterwards.

The result is that in area A, the target image will read the data as modified by (1), even though it should contain the data as it was before (1).


Version-Release number of selected component (if applicable):

qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3

(FWIW, also qemu-kvm-3.1.0-30.module+el8.0.1+3755+6782b0ed.  Non-AV and RHV 7 are not affected, because they have disabled copy offloading.)


How reproducible:

Always.


Steps to Reproduce:

$ ./qemu-img create -f qcow2 src.qcow2 2M
Formatting 'src.qcow2', fmt=qcow2 size=2097152 cluster_size=65536 lazy_refcounts=off refcount_bits=16

$ ./qemu-io -c 'write -P 42 0 2M' src.qcow2
wrote 2097152/2097152 bytes at offset 0
2 MiB, 1 ops; 0.0282 sec (70.857 MiB/sec and 35.4283 ops/sec)

$ cp src.qcow2 ref.qcow2

$ (echo '{"execute":"qmp_capabilities"}
         {"execute":"drive-backup",
          "arguments":{
              "device":"src","job-id":"backup",
              "target":"tgt.qcow2","format":"qcow2",
              "sync":"full","speed":1048576
          } }
          {"execute":"human-monitor-command",
           "arguments":{"command-line":
               "qemu-io src \"write -P 23 1088k 64k\""
           } }
          {"execute":"human-monitor-command",
           "arguments":{"command-line":
               "qemu-io src \"write -P 66 1024k 1024k\""
          } }';
   sleep 5;
   echo '{"execute":"quit"}') \
  | x86_64-softmmu/qemu-system-x86_64 -qmp stdio \
      -blockdev file,node-name=src-file,filename=src.qcow2 \
      -blockdev qcow2,node-name=src,file=src-file

[QMP output]


Actual results:

$ ./qemu-img compare tgt.qcow2 ref.qcow2
Content mismatch at offset 1114112!


Expected results:

$ ./qemu-img compare tgt.qcow2 ref.qcow2
Images are identical.


Additional info:

Upstream, this has been broken since 3.0.

I’ve sent a fix upstream, but it hasn’t appeared in the archives yet.  In the meantime, here’s the Patchew link: https://patchew.org/QEMU/20190801173900.23851-1-mreitz@redhat.com/

Comment 2 Ademar Reis 2019-08-19 17:38:58 UTC
commit 4a5b91ca024fc6fd87021c54655af76a35f2ef1e
Author: Max Reitz <mreitz>
Date:   Thu Aug 1 19:38:59 2019 +0200

    backup: Copy only dirty areas
    
    The backup job must only copy areas that the copy_bitmap reports as
    dirty.  This is always the case when using traditional non-offloading
    backup, because it copies each cluster separately.  When offloading the
    copy operation, we sometimes copy more than one cluster at a time, but
    we only check whether the first one is dirty.
    
    Therefore, whenever copy offloading is possible, the backup job
    currently produces wrong output when the guest writes to an area of
    which an inner part has already been backed up, because that inner part
    will be re-copied.
    
    Fixes: 9ded4a0114968e98b41494fc035ba14f84cdf700

Comment 4 aihua liang 2019-09-19 02:26:51 UTC
Test on qemu-kvm-4.1.0-10.module+el8.1.0+4234+33aa4f57.x86_64, bug has been fixed, set its status to "Verified".

Test steps:
 1.Create src image
  #qemu-img create -f qcow2 src.qcow2 2M
Formatting 'src.qcow2', fmt=qcow2 size=2097152 cluster_size=65536 lazy_refcounts=off refcount_bits=16

 2.Write 2M data to src.qcow2
  #qemu-io -c 'write -P 42 0 2M' src.qcow2
wrote 2097152/2097152 bytes at offset 0
2 MiB, 1 ops; 0.0282 sec (70.857 MiB/sec and 35.4283 ops/sec)

 3.backup src.qcow2 via cp.
  #cp src.qcow2 ref.qcow2

 4.Start guest with src.qcow2
   /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -m 2048  \
    -smp 10,maxcpus=10,cores=5,threads=1,sockets=2  \
    -cpu SandyBridge \
    -vnc :0  \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -blockdev file,node-name=src-file,filename=src.qcow2 \
    -blockdev qcow2,node-name=src,file=src-file \
    -qmp tcp:0:3000,server,nowait \

  5.Do full backup on src.qcow2 with speed 100
    #telnet localhost 3000
    {"execute":"qmp_capabilities"}
    {"execute":"drive-backup","arguments":{"device":"src","job-id":"backup","target":"tgt.qcow2","format":"qcow2","sync":"full","speed":1048576} }

  6.During backup, write new data to src.qcow2
     {"execute":"human-monitor-command","arguments":{"command-line":"qemu-io src \"write -P 23 1088k 64k\""} }
     {"execute":"human-monitor-command","arguments":{"command-line":"qemu-io src \"write -P 66 1024k 1024k\""} }

  7.Set block job speed to 0
     { "execute": "block-job-set-speed", "arguments": { "device": "backup", "speed":0}}

  8.Quit vm
     {"execute":"quit"}

  9.Image compare between tgt.qcow2 and ref.qcow2
     #qemu-img compare tgt.qcow2 ref.qcow2
Images are identical.

Comment 6 errata-xmlrpc 2019-11-06 07:18:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.