1405822 – Move preallocated disk to file storage domain converts the disk allocation policy to thin provisioned

Bug 1405822 - Move preallocated disk to file storage domain converts the disk allocation policy to thin provisioned

Summary: Move preallocated disk to file storage domain converts the disk allocation po...

Keywords:
Status:	CLOSED DUPLICATE of bug 1403183
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	BLL.Storage
Sub Component:
Version:	4.1.0
Hardware:	x86_64
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.1.0-beta
Target Release:	---
Assignee:	Liron Aravot
QA Contact:	Raz Tamir
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1357962 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-12-18 17:07 UTC by Lilach Zitnitski
Modified:	2017-02-02 16:56 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-01-23 13:41:16 UTC
oVirt Team:	Storage
Embargoed:
Dependent Products:
Flags:	tnisan: ovirt-4.1? lzitnits: planning_ack? lzitnits: devel_ack? lzitnits: testing_ack?

Attachments	(Terms of Use)
logs zip (126.40 KB, application/zip) 2016-12-18 17:08 UTC, Lilach Zitnitski	no flags	Details
View All

Description Lilach Zitnitski 2016-12-18 17:07:41 UTC

Description of problem:
When moving preallocated disk between storage domains (tested from file to file and from block to file), when the move completes, the disk becomes thin-provisioned. 

Version-Release number of selected component (if applicable):
vdsm-4.18.999-1184.git090267e.el7.centos.x86_64
ovirt-engine-4.1.0-0.2.master.20161216212250.gitc040969.el7.centos.noarch

How reproducible:
100%

Steps to Reproduce:
1. create new preallocated disk on block storage or file (doesn't matter)
2. move it to a different file storage domain

Actual results:
Disk becomes thin-provision 

Expected results:
Disk should move as preallocated to the new storage domain

Additional info:

engine.log

2016-12-18 18:52:07,866+02 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveDisksCommand] (default task-10) [5268b91a-83fe-4b1e-b1a1-4c250cc6ca76] Running command: MoveDisksCommand internal: false. Entities affected :  ID: 15869f80-58cf-4e3f-9792-661745c9cefe Type: DiskAction group CONFIGURE_DISK_STORAGE with role type USER
2016-12-18 18:52:08,185+02 INFO  [org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand] (default task-10) [5268b91a-83fe-4b1e-b1a1-4c250cc6ca76] Lock Acquired to object 'EngineLock:{exclusiveLocks='[15869f80-58cf-4e3f-9792-661745c9cefe=<DISK, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName move-test2>]', sharedLocks='[7406e343-a0ea-4f15-96db-8793f4aa83fa=<VM, ACTION_TYPE_FAILED_DISK_IS_BEING_MIGRATED$DiskName move-test2>]'}'

Comment 1 Lilach Zitnitski 2016-12-18 17:08:18 UTC

Created attachment 1233182 [details]
logs zip

engine.log
vdsm.log

Comment 2 Tal Nisan 2016-12-20 15:51:09 UTC

Discussed with Liron and this behavior was intentional as the copy/move flow to a file domain was broken to two action:
1. Create the file container
2. Copy the data

Since creating an preallocated file container means filling it with zeros and then overriding the same data size with the copied bytes from the image it was pointless so the volume is created thin provisioned before the copy.
After the copy succeeded the image metadata should be changed accordingly to reflect whether it's preallocated or not.

Liron please use this bug for this issue, need to check that the data is copied correctly as well and we are not missing some file system optimizations (dd?) cause in my reproduction I've noticed that the original image was indeed allocated:

# ll -sltr
total 1049608
1048580 -rw-rw----. 1 vdsm kvm 1073741824 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511
      4 -rw-r--r--. 1 vdsm kvm        268 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511.meta
   1024 -rw-rw----. 1 vdsm kvm    1048576 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511.lease


While the copy was not:

# ll -sltr
total 1028
   0 -rw-rw----. 1 vdsm kvm 1073741824 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511
   4 -rw-r--r--. 1 vdsm kvm        268 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511.meta
1024 -rw-rw----. 1 vdsm kvm    1048576 Dec 20 17:23 58bda9f2-93d9-4214-9251-49aaa3d98511.lease

Comment 3 Tal Nisan 2017-01-11 12:29:38 UTC

*** Bug 1357962 has been marked as a duplicate of this bug. ***

Comment 4 Liron Aravot 2017-01-19 17:14:00 UTC

This issue exists since vdsm was changed to use qemuimg convert to perform the copy instead of dd in 2014 (see https://gerrit.ovirt.org/#/c/26921/ ) - it was just hidden because of the image metadata (which had "PRELLOCATED" written in it).

When using qemu-img convert to perform the copy, it automatically makes that target sparse by ignoring zeroes unless its executing with the -S flag (From the man page: indicates the consecutive number of bytes that must contain only zeros for qemu-img to create a sparse image during conversion).

[root@dhcp-1-226 tmp]# qemu-img convert a.txt b.txt
[root@dhcp-1-226 tmp]# ls -s --block-size 1
total 258048
258048 a.txt       0 b.txt
[root@dhcp-1-226 tmp]# qemu-img convert a.txt c.txt -S 0
[root@dhcp-1-226 tmp]# ls -s --block-size 1
total 516096
258048 a.txt       0 b.txt  258048 c.txt
[root@dhcp-1-226 tmp]# 

In 4.1 the move disk flow has changed as part of the SPDM feature. The engine first creates the target image using the spm and then copies the data to it using any host in the dc. In order to avoid preallocating the image on creation just to override it with the data copy which will follow we create the target as sparse.
The current situation seems to be better then before, the engine will report the disk as thin provisioned as it actually is instead of reporting it as preallocated (because of its metadata, not because its actually preallocated) after it was converted to sparse by qemu-img.

In order have fast preallocation we'll use fallocate (see BZ 1391859) so we'll be able to create the volume as preallocated on file domain as well - we need to decide if we want to target it to 4.1.
For older versions I assume we'll leave it as is (after consulting with nsoffer as well).

Nir/Tal, for what version do we want to handle the preallocation issue in the current flows?

Comment 5 Tal Nisan 2017-01-23 13:41:16 UTC

Depends on how complicated and risky the fix is, do you have any estimation?

*** This bug has been marked as a duplicate of bug 1403183 ***

Comment 6 Liron Aravot 2017-02-02 16:31:40 UTC

Nir, as you handled the code review of the patch fixing the issue - what's your take on that?

Comment 7 Nir Soffer 2017-02-02 16:56:28 UTC

This should be fixed by:
https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:bug1391859

Once these patches land in vdsm, engine can create preallocated disks during
move/copy disk flows. Preallocation with qemu-img on NFS >= 4.2 and glusterfs
takes no time. Preallocation on NFS < 4.2 is faster using qemu-img.

To save the preallocation time for the part of the disk being copied, we can 
first copy the data, and then preallocate the rest of the image:

1. create raw volume in some minimal size
2. copy data from source volume
3. check actual file size
4. use new fallocate helper to preallocate the rest the image

   /usr/libexec/vdsm/fallocate --offset <actual-size> <virtual-size> filename

Since image preallocation during copy image is broken since 3.5 or so (since we
moved to copy with qemu-img), I don't see any reason to do this in 4.1.

Note You need to log in before you can comment on or make changes to this bug.