Bug 1092117
Summary: | live incremental migration of vm with common shared base, size(disk) > size(base) transfers unallocated sectors, explodes disk on dest | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Chris Buben <cbuben> | |
Component: | qemu-kvm | Assignee: | Kevin Wolf <kwolf> | |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 6.5 | CC: | areis, brandon_nolte, bsarathy, cbuben, chayang, dornelas, jamills, jen, jhunsaker, juzhang, knoel, kwolf, lmiksik, lyarwood, mazhang, michen, mkenneth, mrezanin, pbonzini, qzhang, rbalakri, shu, virt-maint | |
Target Milestone: | rc | Keywords: | Regression, ZStream | |
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-0.12.1.2-2.431.el6 | Doc Type: | Bug Fix | |
Doc Text: |
In certain scenarios, when performing live incremental migration, the disk size could be expanded considerably due to the transfer of unallocated sectors past the end of the base image. With this update, the bdrv_is_allocated() function has been fixed to no longer return "True" for unallocated sectors, and the disk size no longer changes after performing live incremental migration.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1109715 1110681 1130582 (view as bug list) | Environment: | ||
Last Closed: | 2014-10-14 06:58:26 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1109715, 1110681 |
Description
Chris Buben
2014-04-28 19:00:24 UTC
Clarification: Actual results: The actual disk size is roughly equal to virtual_size(top) - virtual_size(base). In the example above, the disk size is exploded to ~6GB (10GB-4GB) apparently due to the transfer of unallocated sectors past the end of base to dest. Reproduced this bug on qemu-kvm-0.12.1.2-2.424.el6.x86_64, and this issue does not exist on RHEL6.4-z qemu-kvm-0.12.1.2-2.355.el6_4.9.x86_64. Steps: 1. create a backing.qcow2 image and install a guest in it. # qemu-img info backing.qcow2 image: backing.qcow2 file format: qcow2 virtual size: 4.0G (4294967296 bytes) disk size: 2.0G cluster_size: 65536 2. Create source.qcow2 and dest.qcow2 images with 10G. Both images' backing file is backing.qcow2 [root@t1 test]# qemu-img create -f qcow2 -o backing_file=backing.qcow2 source.qcow2 10G Formatting 'source.qcow2', fmt=qcow2 size=10737418240 backing_file='backing.qcow2' encryption=off cluster_size=65536 [root@t1 test]# [root@t1 test]# [root@t1 test]# qemu-img info source.qcow2 image: source.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 196K cluster_size: 65536 backing file: backing.qcow2 [root@t1 test]# [root@t1 test]# [root@t1 test]# qemu-img create -f qcow2 -o backing_file=backing.qcow2 dest.qcow2 10G Formatting 'dest.qcow2', fmt=qcow2 size=10737418240 backing_file='backing.qcow2' encryption=off cluster_size=65536 [root@t1 test]# [root@t1 test]# [root@t1 test]# qemu-img info dest.qcow2 image: dest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 196K cluster_size: 65536 backing file: backing.qcow2 3. Boot up the source.qcow2 on rhel6 host and then boot up the dest.qcow2 image with listening mode "-incoming tcp:0:5800". Then do the live incremental migration. # /usr/libexec/qemu-kvm -M rhel6.5.0 -cpu SandyBridge -m 2G -smp 2,sockets=1,cores=2,threads=1,maxcpus=16 -enable-kvm -name win7-32 -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -k en-us -rtc base=localtime,clock=host,driftfix=slew -nodefaults -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor unix:/tmp/monitor-unix,nowait,server -drive file=/home/test/source.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:1c,bus=pci.0,addr=0x4 -vga std -vnc :10 -usb -device usb-tablet QEMU 0.12.1 monitor - type 'help' for more information (qemu) (qemu) (qemu) migrate_set_speed 100M (qemu) (qemu) migrate -d -i tcp:0:5800 Result: After migration, the dest.qcow2 image disk size is 6G, even larger than the base image (4G). [root@t1 test]# qemu-img info source.qcow2 image: source.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 4.6M cluster_size: 65536 backing file: backing.qcow2 [root@t1 test]# [root@t1 test]# [root@t1 test]# [root@t1 test]# qemu-img info dest.qcow2 image: dest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 6.0G cluster_size: 65536 backing file: backing.qcow2 ================== Re-test on the RHEL6.4 host (qemu-kvm-0.12.1.2-2.355.el6_4.9.x86_64), the issue does not exist. After migration: # qemu-img info dest.qcow2 image: dest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 32M cluster_size: 65536 backing file: backing.qcow2 # qemu-img info source.qcow2 image: source.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 4.4M cluster_size: 65536 backing file: backing.qcow2 Chris, thanks for taking the time to enter a bug report with us. We appreciate the feedback and look to use reports such as this to guide our efforts at improving our products. That being said, we're not able to guarantee the timeliness or suitability of a resolution for issues entered here because this is not a mechanism for requesting support. If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain it receives the proper attention and prioritization to assure a timely resolution. For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto Thanks Ademar. I figure this is the standard disclaimer given to any reporter whose e-mail address doesn't end with @redhat.com? Thanks to the RH team for quick response and repro. Our team will raise this issue (and reference this bz) via our support contract as well. (In reply to Chris Buben from comment #8) > Thanks Ademar. I figure this is the standard disclaimer given to any > reporter whose e-mail address doesn't end with @redhat.com? > > Thanks to the RH team for quick response and repro. Our team will raise > this issue (and reference this bz) via our support contract as well. Yes Chris, it's a standard response. Having an actual customer case open in the customer portal helps us prioritize the bugs. Thanks for escalating it. I can confirm the bug, it reproduced on the first attempt. Bisecting the problem led to the patch "block: return BDRV_BLOCK_ZERO past end of backing file" (upstream commit f0ad5712, RHEL 6 commit 2a217cc0). The problem is that the block allocation status past the end of the backing file is wrong (all blocks are reported to be allocated). The same bug can be triggered using the following commands: $ ./qemu-img create -f qcow2 /tmp/backing.qcow2 1G Formatting '/tmp/backing.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off $ ./qemu-img create -f qcow2 -b /tmp/backing.qcow2 /tmp/overlay.qcow2 2G Formatting '/tmp/overlay.qcow2', fmt=qcow2 size=2147483648 backing_file='/tmp/backing.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off $ ./qemu-io -c 'alloc 1G 64k' /tmp/overlay.qcow2 65536/65536 sectors allocated at offset 1 GiB The next step is checking what the best fix is without breaking other use cases. Great troubleshooting; Thank you for your assistance on this issue. Is there any consideration of how to implement this fix? Or time frame for when we can hope for this to be implemented? Regards, Brandon Nolte Fix included in qemu-kvm-0.12.1.2-2.428.el6 Fix included in qemu-kvm-0.12.1.2-2.431.el6 *** Bug 1118185 has been marked as a duplicate of this bug. *** Verified this bug on qemu-kvm-rhev-0.12.1.2-2.436.el6.x86_64: 1. create a backing.qcow2 image and install a guest in it. # qemu-img info backing.qcow2 image: backing.qcow2 file format: qcow2 virtual size: 4.0G (4294967296 bytes) disk size: 2.0G cluster_size: 65536 2. Create source.qcow2 and dest.qcow2 images with 10G. Both images' backing file is backing.qcow2 [root@localhost test]# qemu-img create -f qcow2 -o backing_file=backing.qcow2 source.qcow2 10G Formatting 'source.qcow2', fmt=qcow2 size=10737418240 backing_file='backing.qcow2' encryption=off cluster_size=65536 [root@localhost test]# qemu-img info source.qcow2 image: source.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 196K cluster_size: 65536 backing file: backing.qcow2 [root@localhost test]# [root@localhost test]# [root@localhost test]# qemu-img create -f qcow2 -o backing_file=backing.qcow2 dest.qcow2 10G Formatting 'dest.qcow2', fmt=qcow2 size=10737418240 backing_file='backing.qcow2' encryption=off cluster_size=65536 [root@localhost test]# [root@localhost test]# [root@localhost test]# qemu-img info dest.qcow2 image: dest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 196K cluster_size: 65536 backing file: backing.qcow2 3. Boot up the source.qcow2 on rhel6 host and then boot up the dest.qcow2 image with listening mode "-incoming tcp:0:5800". Then do the live incremental migration. [root@localhost test]# /usr/libexec/qemu-kvm -M rhel6.6.0 -cpu SandyBridge -m 2G -smp 2,sockets=1,cores=2,threads=1,maxcpus=16 -enable-kvm -name rhel6.6 -uuid 990ea161-6b67-47b2-b803-19fb01d30d12 -smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 -k en-us -rtc base=localtime,clock=host,driftfix=slew -nodefaults -monitor stdio -qmp tcp:0:6666,server,nowait -boot menu=on -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor unix:/tmp/monitor-unix,nowait,server -drive file=/root/test/source.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:2e:28:1c,bus=pci.0,addr=0x4 -vga std -vnc :10 -usb -device usb-tablet QEMU 0.12.1 monitor - type 'help' for more information (qemu) (qemu) migrate_set_speed 100M (qemu) (qemu) migrate -d -i tcp:0:5800 (qemu) info migrate Result: After migration, check the source.qcow2 and dest.qcow2 image size: [root@localhost test]# qemu-img info source.qcow2 image: source.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 9.1M cluster_size: 65536 backing file: backing.qcow2 [root@localhost test]# [root@localhost test]# [root@localhost test]# qemu-img info dest.qcow2 image: dest.qcow2 file format: qcow2 virtual size: 10G (10737418240 bytes) disk size: 36M cluster_size: 65536 backing file: backing.qcow2 The dest.qcow2 image is 36M, not larger than 4G any more. And also test Kevin's script provided in bug 1109715: (1) On the old qemu-kvm-rhev-0.12.1.2-2.430.el6.x86_64: [root@localhost test]# sh rhel6-test.sh Formatting '/tmp/backing.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 Formatting '/tmp/test.qcow2', fmt=qcow2 size=1073741824 backing_file='/tmp/backing.qcow2' encryption=off cluster_size=65536 wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0000 sec (636.708 KiB/sec and 9.9486 ops/sec) wrote 65536/65536 bytes at offset 134217728 64 KiB, 1 ops; 0.0000 sec (1.378 MiB/sec and 22.0415 ops/sec) VNC server running on `::1:5900' _QEMU 0.12.1 monitor - type 'help' for more information (qemu) __com.redhat_drive-mirror ide0-hd0 /tmp/copy.qcow2 Formatting '/tmp/copy.qcow2', fmt=qcow2 size=67108864 backing_file='/tmp/backing.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 (qemu) __com.redhat_drive-reopen ide0-hd0 /tmp/copy.qcow2 (qemu) quit Pattern verification failed at offset 0, 65536 bytes read 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0000 sec (5.549 GiB/sec and 90909.0909 ops/sec) read failed: Input/output error (2) On the latest qemu-kvm-rhev-0.12.1.2-2.436.el6.x86_64: [root@localhost test]# sh rhel6-test.sh Formatting '/tmp/backing.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 Formatting '/tmp/test.qcow2', fmt=qcow2 size=1073741824 backing_file='/tmp/backing.qcow2' encryption=off cluster_size=65536 wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0000 sec (579.322 KiB/sec and 9.0519 ops/sec) wrote 65536/65536 bytes at offset 134217728 64 KiB, 1 ops; 0.0000 sec (1.172 MiB/sec and 18.7491 ops/sec) VNC server running on `::1:5900' _QEMU 0.12.1 monitor - type 'help' for more information (qemu) __com.redhat_drive-mirror ide0-hd0 /tmp/copy.qcow2 Formatting '/tmp/copy.qcow2', fmt=qcow2 size=1073741824 backing_file='/tmp/backing.qcow2' backing_fmt='qcow2' encryption=off cluster_size=65536 (qemu) __com.redhat_drive-reopen ide0-hd0 /tmp/copy.qcow2 (qemu) quit read 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0000 sec (657.895 MiB/sec and 10526.3158 ops/sec) read 65536/65536 bytes at offset 134217728 64 KiB, 1 ops; 0.0000 sec (753.012 MiB/sec and 12048.1928 ops/sec) Based on above, the separate issues in the old build (mentioned in bug 1109715 comment 14) do not exist any more. Hi, Kevin According to comment 31 and comment 32, this bug is verified pass with the original test case and also your script. As you suggested us to run some function test for live snapshot and block mirroring before, so I want to confirm with you: (1) Currently we are running a round of live snapshot and block mirroring function on the latest rhel6.5-z build for bug 1109715 *manually*. (2) If (1) pass without any new regression found, could we only run some *autotest* storage vm migration testing for rhel6.6 instead of a round of manual test? The difference here is: autotest might cover some basic test cases for live snapshot, block mirroring and image stream, they are only part of the manual cases. The features are not 100% automated. Thanks, Qunfang If 6.5.z passes the manual testing, I think it is reasonable to run some relaxed automated testing for 6.6. They are similar enough that I think the 6.5.z result gives us some confidence for 6.6 as well. (In reply to Kevin Wolf from comment #34) > If 6.5.z passes the manual testing, I think it is reasonable to run some > relaxed > automated testing for 6.6. They are similar enough that I think the 6.5.z > result > gives us some confidence for 6.6 as well. Okay, thank you for the feedback! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1490.html |