Bug 1147358

Summary: qemu-img compare error after drive-mirror with 'sync=full'
Product: Red Hat Enterprise Linux 7 Reporter: ShupingCui <scui>
Component: qemu-kvm-rhevAssignee: Jeff Cody <jcody>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: chayang, coli, hhuang, jcody, juzhang, meyang, michen, ngu, qizhu, scui, shuang, virt-maint, xuhan, xutian
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1200350 (view as bug list) Environment:
Last Closed: 2016-03-15 20:10:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1200350    

Description ShupingCui 2014-09-29 06:08:46 UTC
Description of problem:
qemu-img compare error after doing drive-mirror full

Version-Release number of selected component (if applicable):
kernel-3.10.0-170.el7.x86_64
qemu-kvm-rhev-2.1.0-5.el7.x86_64

How reproducible:
80%

Steps to Reproduce:
1. boot the guest
/bin/qemu-kvm \
    -name 'virt-tests-vm1'  \
    -sandbox off  \
    -M pc  \
    -nodefaults  \
    -vga qxl  \
    -global qxl-vga.vram_size=33554432 \
    -device intel-hda,bus=pci.0,addr=03 \
    -device hda-duplex  \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=06 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/root/tests/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.1-64-virtio.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:55:56:57:58:59,id=idfzSf2h,vectors=4,netdev=idA38EjA,bus=pci.0,addr=07  \
    -netdev tap,id=idA38EjA,vhost=on,vhostfd=23,fd=22  \
    -m 4096  \
    -smp 4,cores=2,threads=1,sockets=2  \
    -cpu 'SandyBridge',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=3000,password=123456,addr=0,image-compression=auto_glz,zlib-glz-wan-compression=auto,streaming-video=all,agent-mouse=on,playback-compression=on,ipv4  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off  \
    -no-kvm-pit-reinjection \
    -enable-kvm

2. do block mirror full with 'sync=full'
{'execute': 'drive-mirror', 'arguments': {'device': u'drive_image1', 'mode': 'absolute-paths', 'format': 'qcow2', 'target': '/mnt/nfs/target1.qcow2', 'sync': 'full'}, 'id': '8oMy4jv0'}

3. waiting a event "BLOCK_JOB_READY" then paused guest
{'execute': 'stop', 'id': '4DQ7uJLa'}

4. do sync on host
# sync

5. compare with origin image file and target image file
# qemu-img compare /root/tests/staf-kvm-devel/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-7.1-64-virtio.qcow2 /mnt/nfs/target1.qcow2

Actual results:
Content mismatch at offset 1117696!
and cannot boot up using /mnt/nfs/target1.qcow

Expected results:
no error after comparing image

Additional info:

Comment 4 Xu Tian 2015-01-16 06:38:23 UTC
Hi Jeff,

This test always failed in our qemu-kvm-rhev acceptance testing, can you help to looks it ASAP.

Thanks very much!!
Xu

Comment 5 Shaolong Hu 2015-01-19 08:46:10 UTC
Hi Shuping,

I try to reproduce this with qemu-kvm-rhev-2.1.2-18.el7.x86_64, fail to reproduce with 5 times:

/usr/libexec/qemu-kvm -enable-kvm -M pc-i440fx-rhel7.0.0 -smp 4 -m 4G -name rhel6.3-64 -uuid 3f2ea5cd-3d29-48ff-aab2-23df1b6ae213 -drive file=/root/RHEL-Server-7.1-64-virtio.qcow2,cache=none,if=none,rerror=stop,werror=stop,id=drive-virtio-disk0,format=qcow2,aio=native -device virtio-blk-pci,drive=drive-virtio-disk0,id=device-virtio-disk0,bootindex=1 -boot order=cd -monitor stdio -readconfig nfs/ich9-ehci-uhci.cfg -device usb-tablet,id=input0 -chardev socket,id=s1,path=/tmp/s1,server,nowait -device isa-serial,chardev=s1 -monitor tcp::1235,server,nowait -vga qxl -global qxl-vga.revision=3 -spice port=5930,disable-ticketing -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -vnc :10 -qmp tcp:0:5556,server,nowait -sandbox on -cpu host -netdev tap,script=/etc/qemu-ifup,id=netdev0 -device virtio-net-pci,netdev=netdev0,id=device-net0,mac=aa:54:00:11:22:33


steps:

1. start mirroring:
drive_mirror drive-virtio-disk0 /root/sn1 qcow2

2. after steady state, on host:

sync
echo 3 > /proc/sys/vm/drop_caches
sync

# qemu-img compare RHEL-Server-7.1-64-virtio.qcow2 sn1 -p
Images are identical.



Could you try with latest qemu whether this issue still can be reproduced?


Bests,
Shaolong

Comment 6 Jeff Cody 2015-01-27 18:48:28 UTC
(In reply to xu from comment #4)
> Hi Jeff,
> 
> This test always failed in our qemu-kvm-rhev acceptance testing, can you
> help to looks it ASAP.
> 
> Thanks very much!!
> Xu

Looking at the description, it appears that the disks are being compared after the BLOCK_JOB_READY response has been received.  However, the user/tester did not enter a BLOCK_JOB_COMPLETE after the BLOCK_JOB_READY response.  If there is any guest i/o to that drive after the BLOCK_JOB_READY, then the images may differ at that point.  In order to finish the mirror, a BLOCK_JOB_COMPLETE must be issued.

Can you confirm that after issuing a BLOCK_JOB_COMPLETE, there is no issue?

Comment 7 Xu Tian 2015-01-28 03:06:11 UTC
(In reply to Jeff Cody from comment #6)
> (In reply to xu from comment #4)
> > Hi Jeff,
> > 
> > This test always failed in our qemu-kvm-rhev acceptance testing, can you
> > help to looks it ASAP.
> > 
> > Thanks very much!!
> > Xu
> 
> Looking at the description, it appears that the disks are being compared
> after the BLOCK_JOB_READY response has been received.  However, the
> user/tester did not enter a BLOCK_JOB_COMPLETE after the BLOCK_JOB_READY
> response.  If there is any guest i/o to that drive after the
> BLOCK_JOB_READY, then the images may differ at that point.  In order to
> finish the mirror, a BLOCK_JOB_COMPLETE must be issued.
> 
> Can you confirm that after issuing a BLOCK_JOB_COMPLETE, there is no issue?

Hi Jeff,

As I understand after BLOCK_JOB_COMPLETE issued, target image will be reopen by qemu, and qemu will not touch source image.  so I think if there is any guest i/o to the drive source image and target image will differ.

But I will have a try according your comment. 

Thanks,
Xu

Comment 8 Shaolong Hu 2015-01-28 03:26:17 UTC
Hi all,

Here is some heads up:

1. This bug is captured by autotest
2. the correct test steps is after reaching steady state, stop guest, do a sync, then compare the source and target image.
3. At first autotest steps has a little difference with manual test case, i have debug with reporter ShupingCui who is our autotest colleague, after fix autotest steps, qemu-img compare still fail.

At last, i use same steps as in comment 5, on same host that autotest scripts runs on, and with same guest image, do test manually, ShupingCui is there too, can not hit the problem, it's very weird.

Comment 9 Shaolong Hu 2015-01-28 03:28:16 UTC
btw, "echo 3 > /proc/sys/vm/drop_caches" is not necessary.

Comment 12 Jeff Cody 2016-03-15 20:10:13 UTC
Closing this as NOTABUG.  The mirror is not complete until the BLOCK_JOB_COMPLETE command is issued; up until then, there may be discrepancies due to guest filesystem caching, writes, etc.

Comment 13 Yang Meng 2016-05-20 06:49:41 UTC
kernel-3.10.0-229.35.1.el7.x86_64
qemu-kvm-rhev-2.1.2-23.el7_1.12.x86_64


i met this problem on the version above

steps:
1)boot up guest
MALLOC_PERTURB_=1  /usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga qxl \
    -device intel-hda,bus=pci.0,addr=03 \
    -device hda-duplex  \
    -chardev socket,id=qmp_id_qmp1,path=/var/tmp/monitor-qmp1-20160519-043259-jrcQKSGq,server,nowait \
    -mon chardev=qmp_id_qmp1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20160519-043259-jrcQKSGq,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id4Vcvh8  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20160519-043259-jrcQKSGq,server,nowait \
    -device isa-serial,chardev=serial_id_serial0 \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04  \
    -chardev socket,id=devvs,path=/var/tmp/virtio_port-vs-20160519-043259-jrcQKSGq,server,nowait \
    -device virtserialport,chardev=devvs,name=vs,id=vs,bus=virtio_serial_pci0.0  \
    -chardev socket,id=seabioslog_id_20160519-043259-jrcQKSGq,path=/var/tmp/seabios-20160519-043259-jrcQKSGq,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20160519-043259-jrcQKSGq,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=05 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=06 \
    -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,format=qcow2,file=/usr/share/avocado/data/avocado-vt/images/win10-64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:9e:9f:a0:a1:a2,id=idukLEkU,vectors=4,netdev=idLhNMOQ,bus=pci.0,addr=07  \
    -netdev tap,id=idLhNMOQ \
    -m 16384  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2  \
    -cpu 'Opteron_G3',+kvm_pv_unhalt,hv_spinlocks=0x1fff,hv_vapic,hv_time \
    -drive id=drive_cd1,if=none,cache=none,snapshot=off,aio=native,media=cdrom,file=/usr/share/avocado/data/avocado-vt/isos/windows/winutils.iso \
    -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -spice port=3000,password=123456 \
    -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \

2.continue the guest 

3. {'execute': 'drive-mirror', 'arguments': {'device': u'drive_image1', 'mode': 'absolute-paths', 'format': 'qcow2', 'target': '/usr/share/avocado/data/avocado-vt/images/target1.qcow2', 'sync': 'full'}, 'id': '46JUBq5D'}

4. check the status until it offset reached len
{'execute': 'query-block-jobs', 'id': 'ZU9k0BEr'}
{"return": [{"io-status": "ok", "device": "drive_image1", "busy": false, "len": 32212254720, "offset": 32212254720, "paused": false, "speed": 0, "type": "mirror"}], "id": "ZU9k0BEr"}

5.{'execute': 'block-job-complete', 'arguments': {'device': 'drive_image1'}, 'id': 'B5EBtNzs'}
{"return": {}, "id": "B5EBtNzs"}
{"timestamp": {"seconds": 1463726380, "microseconds": 249193}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_image1", "len": 32212254720, "offset": 32212254720, "speed": 0, "type": "mirror"}}

6.compare 

[root@intel-e52650-16-4 ~]# /bin/qemu-img compare /usr/share/avocado/data/avocado-vt/images/win10-64-virtio-scsi.qcow2 /usr/share/avocado/data/avocado-vt/images/target2.qcow2
Content mismatch at offset 317201920!

7. i tried several times ,and always got the mismatch, do you have any suggesstions,thanks.

Comment 14 Gu Nini 2016-05-20 07:21:45 UTC
(In reply to Yang Meng from comment #13)
> kernel-3.10.0-229.35.1.el7.x86_64
> qemu-kvm-rhev-2.1.2-23.el7_1.12.x86_64
> 
> 
> i met this problem on the version above
> 

Jeff,

After add the block-job-complete step, we still met the problem both manually and in auto on the latest rhel7.1z qemu-kvm-rhev version with 100% reproduce rate; however, we failed to reproduce the bug in the latest qemu-kvm-rhev versions of both rhel7.2z and rhel7.3.

Comment 15 Yang Meng 2016-05-25 02:31:17 UTC
tried on 
qemu-kvm-rhev-2.1.2-23.el7.x86_64
libvirt-daemon-driver-qemu-1.2.8-16.el7_1.5.x86_64
qemu-img-rhev-2.1.2-23.el7.x86_64
qemu-kvm-rhev-debuginfo-2.1.2-23.el7.x86_64
qemu-kvm-common-rhev-2.1.2-23.el7.x86_64
qemu-kvm-tools-rhev-2.1.2-23.el7.x86_64

also hit the problem,could you help to check, thanks.

Comment 16 Ademar Reis 2016-05-25 20:05:01 UTC
(In reply to Gu Nini from comment #14)
> 
> After add the block-job-complete step, we still met the problem both
> manually and in auto on the latest rhel7.1z qemu-kvm-rhev version with 100%
> reproduce rate; however, we failed to reproduce the bug in the latest
> qemu-kvm-rhev versions of both rhel7.2z and rhel7.3.

So given 7.2.z and 7.3 work, I'm changing this BZ to CURRENTRELEASE. AFAIK there are no plans or requests to release a fix to 7.1.z.