Description of problem: When moving preallocated disk between storage domains (tested from file to file and from block to file), when the move completes, the disk becomes thin-provisioned. Version-Release number of selected component (if applicable): vdsm-4.18.999-1184.git090267e.el7.centos.x86_64 ovirt-engine-4.1.0-0.2.master.20161216212250.gitc040969.el7.centos.noarch How reproducible: 100% Steps to Reproduce: 1. create new preallocated disk on block storage or file (doesn't matter) 2. move it to a different file storage domain Actual results: Disk becomes thin-provision Expected results: Disk should move as preallocated to the new storage domain Additional info: engine.log 2016-12-18 18:52:07,866+02 INFO [org.ovirt.engine.core.bll.storage.disk.MoveDisksCommand] (default task-10) [5268b91a-83fe-4b1e-b1a1-4c250cc6ca76] Running command: MoveDisksCommand internal: false. Entities affected : ID: 15869f80-58cf-4e3f-9792-661745c9cefe Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER 2016-12-18 18:52:08,185+02 INFO [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-10) [5268b91a-83fe-4b1e-b1a1-4c250cc6ca76] Lock Acquired to object 'EngineLock:{exclusiveLocks='[15869f80-58cf-4e3f-9792-661745c9cefe=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName move-test2>]', sharedLocks='[7406e343-a0ea-4f15-96db-8793f4aa83fa=<VM, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName move-test2>]'}'
Created attachment 1233182 [details] logs zip engine.log vdsm.log
Discussed with Liron and this behavior was intentional as the copy/move flow to a file domain was broken to two action: 1. Create the file container 2. Copy the data Since creating an preallocated file container means filling it with zeros and then overriding the same data size with the copied bytes from the image it was pointless so the volume is created thin provisioned before the copy. After the copy succeeded the image metadata should be changed accordingly to reflect whether it's preallocated or not. Liron please use this bug for this issue, need to check that the data is copied correctly as well and we are not missing some file system optimizations (dd?) cause in my reproduction I've noticed that the original image was indeed allocated: # ll -sltr total 1049608 1048580 -rw-rw----. 1 vdsm kvm 1073741824 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511 4 -rw-r--r--. 1 vdsm kvm 268 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511.meta 1024 -rw-rw----. 1 vdsm kvm 1048576 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511.lease While the copy was not: # ll -sltr total 1028 0 -rw-rw----. 1 vdsm kvm 1073741824 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511 4 -rw-r--r--. 1 vdsm kvm 268 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511.meta 1024 -rw-rw----. 1 vdsm kvm 1048576 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511.lease
*** Bug 1357962 has been marked as a duplicate of this bug. ***
This issue exists since vdsm was changed to use qemuimg convert to perform the copy instead of dd in 2014 (see https://gerrit.ovirt.org/#/c/26921/ ) - it was just hidden because of the image metadata (which had "PRELLOCATED" written in it). When using qemu-img convert to perform the copy, it automatically makes that target sparse by ignoring zeroes unless its executing with the -S flag (From the man page: indicates the consecutive number of bytes that must contain only zeros for qemu-img to create a sparse image during conversion). [root@dhcp-1-226 tmp]# qemu-img convert a.txt b.txt [root@dhcp-1-226 tmp]# ls -s --block-size 1 total 258048 258048 a.txt 0 b.txt [root@dhcp-1-226 tmp]# qemu-img convert a.txt c.txt -S 0 [root@dhcp-1-226 tmp]# ls -s --block-size 1 total 516096 258048 a.txt 0 b.txt 258048 c.txt [root@dhcp-1-226 tmp]# In 4.1 the move disk flow has changed as part of the SPDM feature. The engine first creates the target image using the spm and then copies the data to it using any host in the dc. In order to avoid preallocating the image on creation just to override it with the data copy which will follow we create the target as sparse. The current situation seems to be better then before, the engine will report the disk as thin provisioned as it actually is instead of reporting it as preallocated (because of its metadata, not because its actually preallocated) after it was converted to sparse by qemu-img. In order have fast preallocation we'll use fallocate (see BZ 1391859) so we'll be able to create the volume as preallocated on file domain as well - we need to decide if we want to target it to 4.1. For older versions I assume we'll leave it as is (after consulting with nsoffer as well). Nir/Tal, for what version do we want to handle the preallocation issue in the current flows?
Depends on how complicated and risky the fix is, do you have any estimation? *** This bug has been marked as a duplicate of bug 1403183 ***
Nir, as you handled the code review of the patch fixing the issue - what's your take on that?
This should be fixed by: https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:bug1391859 Once these patches land in vdsm, engine can create preallocated disks during move/copy disk flows. Preallocation with qemu-img on NFS >= 4.2 and glusterfs takes no time. Preallocation on NFS < 4.2 is faster using qemu-img. To save the preallocation time for the part of the disk being copied, we can first copy the data, and then preallocate the rest of the image: 1. create raw volume in some minimal size 2. copy data from source volume 3. check actual file size 4. use new fallocate helper to preallocate the rest the image /usr/libexec/vdsm/fallocate --offset <actual-size> <virtual-size> filename Since image preallocation during copy image is broken since 3.5 or so (since we moved to copy with qemu-img), I don't see any reason to do this in 4.1.