Bug 1109715
Summary: | live incremental migration of vm with common shared base, size(disk) > size(base) transfers unallocated sectors, explodes disk on dest | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Libor Miksik <lmiksik> | ||||
Component: | qemu-kvm | Assignee: | Kevin Wolf <kwolf> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.5 | CC: | acathrow, areis, brandon_nolte, bsarathy, cbuben, chayang, coli, dornelas, jamills, jen, jhunsaker, jkurik, juzhang, knoel, kwolf, lyarwood, michen, mkenneth, mrezanin, pbonzini, pm-eus, qzhang, scui, shu, stefanha, virt-maint, xuhan | ||||
Target Milestone: | rc | Keywords: | Regression, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.415.el6_5.14 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | 1092117 | Environment: | |||||
Last Closed: | 2014-08-19 09:12:59 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1092117 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Libor Miksik
2014-06-16 08:24:09 UTC
Fix included in qemu-kvm-0.12.1.2-2.428.el6 Fix included in qemu-kvm-0.12.1.2-2.415.el6_5.11 Reproduced this bug on qemu-kvm-0.12.1.2-2.415.el6_5.10.x86_64: Steps: 1. create a backing.qcow2 image and install a guest in it. # qemu-img info backing.qcow2 image: backing.qcow2 file format: qcow2 virtual size: 4.0G (4294967296 bytes) disk size: 2.0G cluster_size: 65536 2. Create source.qcow2 and dest.qcow2 images with 10G. Both images' backing file is backing.qcow2 [root@localhost home]# qemu-img create -f qcow2 -o backing_file=backing.qcow2 source.qcow2 10G Formatting 'source.qcow2', fmt=qcow2 size=10737418240 backing_file='backing.qcow2' encryption=off cluster_size=65536 [root@localhost home]# qemu-img info source.qcow2 image: source.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 196K cluster_size: 65536 backing file: backing.qcow2 [root@localhost home]# qemu-img create -f qcow2 -o backing_file=backing.qcow2 dest.qcow2 10G Formatting 'dest.qcow2', fmt=qcow2 size=10737418240 backing_file='backing.qcow2' encryption=off cluster_size=65536 [root@localhost home]# qemu-img info dest.qcow2 image: dest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 196K cluster_size: 65536 backing file: backing.qcow2 3. Boot up the source.qcow2 on rhel6 host and then boot up the dest.qcow2 image with listening mode "-incoming tcp:0:5800". Then do the live incremental migration. # /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu SandyBridge -m 2G -smp 2,sockets=1,cores=2,threads=1,maxcpus=16 -enable-kvm -name win7-32 -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -k en-us -rtc base=localtime,clock=host,driftfix=slew -nodefaults -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor unix:/tmp/monitor-unix,nowait,server -drive file=/home/test/source.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:1c,bus=pci.0,addr=0x4 -vga std -vnc :10 -usb -device usb-tablet QEMU 0.12.1 monitor - type 'help' for more information (qemu) (qemu) (qemu) migrate_set_speed 100M (qemu) (qemu) migrate -d -i tcp:0:5800 Result: After migration, the dest.qcow2 image disk size is 6G, even larger than the base image (4G). [root@localhost home]# qemu-img info source.qcow2 image: source.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 8.2M cluster_size: 65536 backing file: backing.qcow2 [root@localhost home]# [root@localhost home]# qemu-img info dest.qcow2 image: dest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 6.0G cluster_size: 65536 backing file: backing.qcow2 ============================ Verified pass on qemu-kvm-0.12.1.2-2.415.el6_5.11.x86_64, after migration, the dest.qcow2 image disk size is 40M. [root@localhost home]# qemu-img info source.qcow2 image: source.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 9.1M cluster_size: 65536 backing file: backing.qcow2 [root@localhost home]# [root@localhost home]# [root@localhost home]# [root@localhost home]# qemu-img info dest.qcow2 image: dest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 40M cluster_size: 65536 Based on above, the issue is fixed. Hi, Kevin Besides comment 8, is there any other regression test need to be executed? Thanks~ Qunfang Description of problem: Creates mirror image by using qmp commamd: {"execute": "__com.redhat_drive-mirror", "arguments": { "device": "drive_image1", "format": "qcow2", "full": true, "target": "/root/autotest-devel/client/tests/virt/shared/data/images/target1.qcow2", "mode": "absolute-paths"}} the mirror image was corrupted after drive-mirror operation finished. Mirror image: ------------- # qemu-img info target1.qcow2 image: target1.qcow2 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 6.7M cluster_size: 65536 # qemu-img check target1.qcow2 No errors were found on the image. Image end offset: 7143424 Base image: ----------- # qemu-img info RHEL-Server-6.5-64-virtio.qcow2 image: RHEL-Server-6.5-64-virtio.qcow2 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 16G cluster_size: 65536 This issue not happened with qemu-kvm-rhev-0.12.1.2-2.415.el6_5.10. Mirror image: ------------- # qemu-img info target1.qcow2 image: target1.qcow2 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 16G cluster_size: 65536 # qemu-img check target1.qcow2 No errors were found on the image. Image end offset: 16717840384 So, it is a regression. Version-Release number of selected component (if applicable): qemu-kvm-rhev-0.12.1.2-2.415.el6_5.11 How reproducible: 100% Steps to Reproduce: 1. Create a mirror image. {"execute": "__com.redhat_drive-mirror", "arguments": {"device": "drive_image1", "format": "qcow2", "full": true, "target": "/root/autotest-devel/client/tests/virt/shared/data/images/target1.qcow2", "mode": "absolute-paths"}} 2. query block-jobs. {"execute": "query-block-jobs"} 3. reopen with the mirror image. {"execute": "__com.redhat_drive-reopen", "arguments": {"device": "drive_image1", "new-image-file": "/root/autotest-devel/client/tests/virt/shared/data/images/target1.qcow2", "format": "qcow2"}} Actual results: After step 2, ------------- {"return": [{"device": "drive_image1", "len": 21474836480, "offset": 21474836480, "speed": 0, "type": "mirror"}]} After step 3, ------------- Guest went wrong because the mirror image was corrupted. Additional info: QEMU command line: ------------------ /usr/bin/qemu-kvm \ -name 'virt-tests-vm1' \ -M rhel6.5.0 \ -nodefaults \ -vga qxl \ -global qxl-vga.vram_size=33554432 \ -device intel-hda,bus=pci.0,addr=03 \ -device hda-duplex \ -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20140618-205437-jq0cRIdT,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20140618-205437-jq0cRIdT,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=04 \ -chardev socket,id=devvs,path=/tmp/virtio_port-vs-20140618-205437-jq0cRIdT,server,nowait \ -device virtserialport,chardev=devvs,name=vs,id=vs,bus=virtio_serial_pci0.0 \ -chardev socket,id=seabioslog_id_20140618-205437-jq0cRIdT,path=/tmp/seabios-20140618-205437-jq0cRIdT,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20140618-205437-jq0cRIdT,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=05 \ -drive id=drive_image1,if=none,cache=none,snapshot=off,aio=native,file=/root/autotest-devel/client/tests/virt/shared/data/images/RHEL-Server-6.5-64-virtio.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=06 \ -device virtio-net-pci,mac=9a:36:37:38:39:3a,id=idvcQ0IX,vectors=4,netdev=id0s2JYw,bus=pci.0,addr=07 \ -netdev tap,id=id0s2JYw,vhost=on \ -m 2048 \ -smp 4,maxcpus=4,cores=2,threads=1,sockets=2 \ -cpu 'SandyBridge' \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -spice port=3000,disable-ticketing \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off \ -no-kvm-pit-reinjection \ -enable-kvm \ -monitor stdio Hi, Kevin Any news about the issue in comment 10? Thanks. Created attachment 916352 [details] Test script There are two separate bugs at work here, one of which is a regression introduced by my patch, and the other one preexisting. 1. Missing !! in bdrv_is_allocated(). This is the regression. It exists in upstream as well, seems to be rather harmless there, though. In RHEL 6 it means that mirroring can skip sectors that it should copy. Sent a patch to qemu-devel to fix this: [PATCH for-2.1] block: Fix bdrv_is_allocated() return value https://lists.nongnu.org/archive/html/qemu-devel/2014-07/msg01163.html 2. When mirroring creates a new image for the target, it uses the size of the backing file of the source instead of the source itself. This means that the copy can be too small and the last part of the source image can be missing from the copy. This was fixed in upstream commit 86899072. Note that a backport to RHEL 6 also affect live snapshots, so I advise QE to include live snapshots when testing the fix. I'm attaching a shell script that is a much simpler and quicker test case, but should reproduce the same bug as was reported here. On buggy versions the two reads at the very end fail (in the way as commented in the test scripts), whereas is a fixed version they succeed. (In reply to Kevin Wolf from comment #14) > Created attachment 916352 [details] > Test script > > There are two separate bugs at work here, one of which is a regression > introduced by my patch, and the other one preexisting. > > 1. Missing !! in bdrv_is_allocated(). This is the regression. It exists in > upstream as well, seems to be rather harmless there, though. In RHEL 6 it > means that mirroring can skip sectors that it should copy. Sent a patch to > qemu-devel to fix this: > > [PATCH for-2.1] block: Fix bdrv_is_allocated() return value > https://lists.nongnu.org/archive/html/qemu-devel/2014-07/msg01163.html > > 2. When mirroring creates a new image for the target, it uses the size of the > backing file of the source instead of the source itself. This means that > the > copy can be too small and the last part of the source image can be missing > from the copy. > > This was fixed in upstream commit 86899072. Note that a backport to RHEL 6 > also affect live snapshots, so I advise QE to include live snapshots when > testing the fix. Okay, we will run some regression test on the live block copy and live snapshot features. > > I'm attaching a shell script that is a much simpler and quicker test case, > but > should reproduce the same bug as was reported here. On buggy versions the two > reads at the very end fail (in the way as commented in the test scripts), > whereas is a fixed version they succeed. Below is the test result on qemu-kvm-rhev-0.12.1.2-2.415.el6_5.12.x86_64: # sh /home/rhel6-test.sh Formatting '/tmp/backing.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 Formatting '/tmp/test.qcow2', fmt=qcow2 size=1073741824 backing_file='/tmp/backing.qcow2' encryption=off cluster_size=65536 wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0000 sec (552.257 KiB/sec and 8.6290 ops/sec) wrote 65536/65536 bytes at offset 134217728 64 KiB, 1 ops; 0.0000 sec (1.165 MiB/sec and 18.6449 ops/sec) VNC server running on `::1:5900' _QEMU 0.12.1 monitor - type 'help' for more information (qemu) __com.redhat_drive-mirror ide0-hd0 /tmp/copy.qcow2 Formatting '/tmp/copy.qcow2', fmt=qcow2 size=67108864 backing_file='/tmp/backing.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 (qemu) __com.redhat_drive-reopen ide0-hd0 /tmp/copy.qcow2 (qemu) quit Pattern verification failed at offset 0, 65536 bytes read 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0000 sec (6.782 GiB/sec and 111111.1111 ops/sec) read failed: Input/output error Hi, Kevin Do you know what is the process about the fix? Thanks! The patches are posted on rhvirt-devel, waiting for review. Fix included in qemu-kvm-0.12.1.2-2.415.el6_5.13 Fix NOT included in qemu-kvm-0.12.1.2-2.415.el6_5.13 due to broken build. Clearing "Fixed in Version" field and changing BZ status to POST. Fix included in qemu-kvm-0.12.1.2-2.415.el6_5.14 Verified this bug on qemu-kvm-rhev-0.12.1.2-2.415.el6_5.14.x86_64: 1. Test case of the original problem, steps are the same as comment 8. Result: [root@localhost test]# qemu-img info source.qcow2 image: source.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 7.7M cluster_size: 65536 backing file: backing.qcow2 [root@localhost test]# [root@localhost test]# [root@localhost test]# qemu-img info dest.qcow2 image: dest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 36M cluster_size: 65536 backing file: backing.qcow2 The result is expected. 2. Test script provided in comment 14 by Kevin: [root@localhost test]# sh rhel6-test.sh Formatting '/tmp/backing.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 Formatting '/tmp/test.qcow2', fmt=qcow2 size=1073741824 backing_file='/tmp/backing.qcow2' encryption=off cluster_size=65536 wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0000 sec (578.845 KiB/sec and 9.0445 ops/sec) wrote 65536/65536 bytes at offset 134217728 64 KiB, 1 ops; 0.0000 sec (1.162 MiB/sec and 18.5881 ops/sec) VNC server running on `::1:5900' _QEMU 0.12.1 monitor - type 'help' for more information (qemu) __com.redhat_drive-mirror ide0-hd0 /tmp/copy.qcow2 Formatting '/tmp/copy.qcow2', fmt=qcow2 size=1073741824 backing_file='/tmp/backing.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 (qemu) __com.redhat_drive-reopen ide0-hd0 /tmp/copy.qcow2 (qemu) quit read 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0000 sec (744.048 MiB/sec and 11904.7619 ops/sec) read 65536/65536 bytes at offset 134217728 64 KiB, 1 ops; 0.0000 sec (801.282 MiB/sec and 12820.5128 ops/sec) Based on above, the issue does not exist. QE will also arrange kvm accpetance test, live snapshot test and live block copy (block mirroring) function test for additional regression propose. Will update the result later after finish. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-1075.html |