Description of problem: When a vm created on a NFS data domain as a thin provisioned disk is exported and then imported back to a Block Storage domain. The Disk shows thin provisioned but the virtual and Actual size of the disk show same. If the vm had Virtual size as 10GB and Actual size as 2GB after the import the both the Virtual and Actual Size become 10GB. Version-Release number of selected component (if applicable): Rhevm Rhevm 3.5.8-0.1.el6ev Host Rhel 6.7 libvirt-0.10.2-54.el6_7.3 vdsm-4.16.37-1.el6ev How reproducible: 100 % Steps to Reproduce: 1. Create a vm with Thin provisioned disk on a NFS Data domain. Example the vm has 10GB Virtual size and 2GB Actual size after the OS installation. 2. Export the vm. 3. Import the VM to a Block Storage domain. Actual results: Disk remains thin provisioned but Both the Actual and virtual size change to 10GB. Expected results: Disk should be thin provisioned with 10GB Virtual size and 2GB Actual size
please attach logs
@Tal: do we have such a limitation that when importing a cow volume from nfs to block storage it becomes raw?
It looks like the request from the engine is valid (with COW volume) format but VDSM reloads the LV and extend it to 10GB (see [1]) [1] 18570cde-1e14-4c32-9a1e-be2ea9ce3067::INFO::2016-07-22 11:42:56,168::blockVolume::143::Storage.Volume::(_create) Request to create COW volume /rhev/data-center/dca77209-1947-4b48-9eec-8a64e56e7a38/082d19c4-b186-4639-b3a6-be34cab85d06/images/2122455e-cc55-4e34-919f-b63ce20b3e63/213d6ae1-ec56-45c6-a4bf-ab94a16a3922 with size = 20480 sectors .... .... .... 18570cde-1e14-4c32-9a1e-be2ea9ce3067::INFO::2016-07-22 11:42:56,798::blockVolume::276::Storage.Volume::(extend) Request to extend LV 213d6ae1-ec56-45c6-a4bf-ab94a16a3922 of image 2122455e-cc55-4e34-919f-b63ce20b3e63 in VG 082d19c4-b186-4639-b3a6-be34cab85d06 with size = 20971520 18570cde-1e14-4c32-9a1e-be2ea9ce3067::DEBUG::2016-07-22 11:42:56,798::lvm::291::Storage.Misc.excCmd::(cmd) /usr/bin/sudo -n /sbin/lvm lvextend --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 obtain_device_list_from_udev=0 filter = [ '\''a|/dev/mapper/1IET_00010001|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --autobackup n --size 10240m 082d19c4-b186-4639-b3a6-be34cab85d06/213d6ae1-ec56-45c6-a4bf-ab94a16a3922 (cwd None)
Why is it extended to the full capacity then? Need to investigate on why it happens
(In reply to Tal Nisan from comment #9) > Why is it extended to the full capacity then? Need to investigate on why it > happens Nir, do you have any insights about this?
(In reply to Koutuk Shukla from comment #0) > Steps to Reproduce: > > 1. Create a vm with Thin provisioned disk on a NFS Data domain. Example the > vm has 10GB Virtual size and 2GB Actual size after the OS installation. This creates raw sparse file on NFS storage. > 2. Export the vm. This copy the raw sparse file to export domain > 3. Import the VM to a Block Storage domain. > > Actual results: > > Disk remains thin provisioned but Both the Actual and virtual size change to > 10GB. Looks like raw sparse file was converted to raw format on block storage, requiring preallocation. > > Expected results: > > Disk should be thin provisioned with 10GB Virtual size and 2GB Actual size This requires converting of the raw format from the export domain to cow format. I think we have the same issue in live storage migration - we even have a warning about this in the ui, that disk will be converted to preallocated. Maor, can you check the logs and confirm that this is the case?
Created attachment 1205331 [details] engine_log
Created attachment 1205332 [details] vdsm log
@Nir It looks like a bug at blockVolume#calculate_volume_alloc_size. The engine seems fine from the logs and pass the correct arguments. I've also managed to reproduce this on my env (see logs attached)
From the VDSM it seems that calculate_volume_alloc_size is being called with preallocation = 2 (sparse) and the initialSize is None. so this looks valid. The issue seems to be the same one I mentioned at comment 8: For some reason the reload of the LV extend it to 10GB. 25150811-da2d-4f33-9750-47f62b4cf3c2::WARNING::2016-09-28 01:14:25,726::utils::108::root::(rmFile) File: /rhev/data-center/f843c805-44f7-49e9-bea1-c2032710a53a/5e358460-9892-4766-87b4-226cc4 6aec7e/images/eeddd389-e35a-467f-b37e-63c6dc5b9257/68cd54b1-7239-4870-af6f-fd5c687c19a0 already removed 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:25,727::blockVolume::476::storage.Volume::(_create) Request to create COW volume /rhev/data-center/f843c805-44f7-49e9-bea1-c20327 10a53a/5e358460-9892-4766-87b4-226cc46aec7e/images/eeddd389-e35a-467f-b37e-63c6dc5b9257/68cd54b1-7239-4870-af6f-fd5c687c19a0 with size = 20480 sectors 25150811-da2d-4f33-9750-47f62b4cf3c2::WARNING::2016-09-28 01:14:26,202::blockSD::786::storage.StorageDomainManifest::(_getOccupiedMetadataSlots) Could not find mapping for lv 5e358460-9892-4 766-87b4-226cc46aec7e/68cd54b1-7239-4870-af6f-fd5c687c19a0 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:26,497::lvm::1238::storage.LVM::(deactivateLVs) Deactivating lvs: vg=5e358460-9892-4766-87b4-226cc46aec7e lvs=[u'68cd54b1-7239-48 70-af6f-fd5c687c19a0'] 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:26,726::lvm::1226::storage.LVM::(activateLVs) Refreshing lvs: vg=5e358460-9892-4766-87b4-226cc46aec7e lvs=['metadata'] 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:27,238::lvm::1226::storage.LVM::(activateLVs) Refreshing lvs: vg=5e358460-9892-4766-87b4-226cc46aec7e lvs=['ids'] 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:27,443::lvm::1226::storage.LVM::(activateLVs) Refreshing lvs: vg=5e358460-9892-4766-87b4-226cc46aec7e lvs=['leases'] 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:27,655::lvm::1226::storage.LVM::(activateLVs) Refreshing lvs: vg=5e358460-9892-4766-87b4-226cc46aec7e lvs=['leases'] jsonrpc/0::INFO::2016-09-28 01:14:27,801::logUtils::49::dispatcher::(wrapper) Run and protect: getAllTasksStatuses(spUUID=None, options=None) jsonrpc/0::INFO::2016-09-28 01:14:27,802::logUtils::52::dispatcher::(wrapper) Run and protect: getAllTasksStatuses, Return response: {'allTasksStatus': {'25150811-da2d-4f33-9750-47f62b4cf3c2 ': {'code': 0, 'message': 'running job 1 of 1', 'taskState': 'running', 'taskResult': '', 'taskID': '25150811-da2d-4f33-9750-47f62b4cf3c2'}}} jsonrpc/0::INFO::2016-09-28 01:14:27,802::__init__::515::jsonrpc.JsonRpcServer::(_serveRequest) RPC call Host.getAllTasksStatuses succeeded in 0.00 seconds 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:28,298::blockVolume::629::storage.Volume::(extend) Request to extend LV 68cd54b1-7239-4870-af6f-fd5c687c19a0 of image eeddd389-e3 5a-467f-b37e-63c6dc5b9257 in VG 5e358460-9892-4766-87b4-226cc46aec7e with size = 20971520 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:28,657::volume::458::storage.VolumeManifest::(prepare) Volume: preparing volume 9a7a9545-ad6e-480c-be33-03342fcd59ae/68cd54b1-723 9-4870-af6f-fd5c687c19a0 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:28,666::volume::458::storage.VolumeManifest::(prepare) Volume: preparing volume 5e358460-9892-4766-87b4-226cc46aec7e/68cd54b1-723 9-4870-af6f-fd5c687c19a0 25150811-da2d-4f33-9750-47f62b4cf3c2::INFO::2016-09-28 01:14:28,711::lvm::1230::storage.LVM::(activateLVs) Activating lvs: vg=5e358460-9892-4766-87b4-226cc46aec7e lvs=[u'68cd54b1-7239-4870-a f6f-fd5c687c19a0'] this is the only engine request that was in that time: 2016-09-28 01:14:24,636 INFO [org.ovirt.engine.core.bll.storage.disk.image.CopyImageGroupCommand] (org.ovirt.thread.pool-6-thread-12) [1027415d] Running command: CopyImageGroupCommand inter nal: true. Entities affected : ID: 5e358460-9892-4766-87b4-226cc46aec7e Type: Storage 2016-09-28 01:14:24,779 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (org.ovirt.thread.pool-6-thread-12) [1027415d] START, CopyImageVDSCommand( CopyImageVDSCommandPa rameters:{runAsync='true', storagePoolId='f843c805-44f7-49e9-bea1-c2032710a53a', ignoreFailoverLimit='false', storageDomainId='9a7a9545-ad6e-480c-be33-03342fcd59ae', imageGroupId='eeddd389-e 35a-467f-b37e-63c6dc5b9257', imageId='68cd54b1-7239-4870-af6f-fd5c687c19a0', dstImageGroupId='eeddd389-e35a-467f-b37e-63c6dc5b9257', vmId='66aeb27f-5686-499d-9ccd-676ba771ae74', dstImageId=' 68cd54b1-7239-4870-af6f-fd5c687c19a0', imageDescription='', dstStorageDomainId='5e358460-9892-4766-87b4-226cc46aec7e', copyVolumeType='LeafVol', volumeFormat='COW', preallocate='Sparse', pos tZero='false', force='true'}), log id: 61389b13 There is no call for LV extend from the engine.
extend is called during copyCollapsed flow - we create a temporary volume (10M), and then extend it to the virtual size (raw format) or apparent size (cow format). Here we see the request parameters 04191f47-c011-4787-91dd-0e1040f1ceac::INFO::2016-09-28 00:02:13,137::image::764::storage.Image::(copyCollapsed) sdUUID=9a7a9545-ad6e-480c-be33-03342fcd59ae vmUUID= srcImgUUID=eeddd389-e35a-467f-b37e-63c6dc5b9257 srcVolUUID=68cd54b1-7239-4870-af6f-fd5c687c19a0 dstImgUUID=eeddd389-e35a-467f-b37e-63c6dc5b9257 dstVolUUID=68cd54b1-7239-4870-af6f-fd5c687c19a0 dstSdUUID=5e358460-9892-4766-87b4-226cc46aec7e volType=8 volFormat=COW preallocate=SPARSE force=True postZero=False and the parameters of the source volume: 04191f47-c011-4787-91dd-0e1040f1ceac::INFO::2016-09-28 00:02:13,160::image::814::storage.Image::(copyCollapsed) copy source 9a7a9545-ad6e-480c-be33-03342fcd59ae:eeddd389-e35a-467f-b37e-63c6dc5b9257:68cd54b1-7239-4870-af6f-fd5c687c19a0 vol size 20971520 destination 5e358460-9892-4766-87b4-226cc46aec7e:eeddd389-e35a-467f-b37e-63c6dc5b9257:68cd54b1-7239-4870-af6f-fd5c687c19a0 apparentsize 20971520 size = 20971520 apparentsize = 20971520 Both values are in blocks, so source volume virtual size and apparent size are 10G. In this case it is expected that the volume will be extended to 10G. Please check the actual size of the exported disk before the operation.
qemu-img info 68cd54b1-7239-4870-af6f-fd5c687c19a0 image: 68cd54b1-7239-4870-af6f-fd5c687c19a0 file format: raw virtual size: 10G (10737418240 bytes) disk size: 0 du -h * 0 68cd54b1-7239-4870-af6f-fd5c687c19a0 4.0K 68cd54b1-7239-4870-af6f-fd5c687c19a0.meta
This is also the output from vdsClient getVolumeInfo (truesize = 0): vdsClient -s 0 getVolumeInfo 9a7a9545-ad6e-480c-be33-03342fcd59ae f843c805-44f7-49e9-bea1-c2032710a53a eeddd389-e35a-467f-b37e-63c6dc5b9257 68cd54b1-7239-4870-af6f-fd5c687c19a0 status = OK lease = UNSUPPORTED domain = 9a7a9545-ad6e-480c-be33-03342fcd59ae capacity = 10737418240 voltype = LEAF description = {"DiskAlias":"VM_Disk1","DiskDescription":""} parent = 00000000-0000-0000-0000-000000000000 format = RAW image = eeddd389-e35a-467f-b37e-63c6dc5b9257 uuid = 68cd54b1-7239-4870-af6f-fd5c687c19a0 disktype = 2 legality = LEGAL mtime = 0 apparentsize = 10737418240 truesize = 0 type = SPARSE children = [] pool = ctime = 1475008740
(In reply to Maor from comment #18) > This is also the output from vdsClient getVolumeInfo (truesize = 0): > > vdsClient -s 0 getVolumeInfo 9a7a9545-ad6e-480c-be33-03342fcd59ae > f843c805-44f7-49e9-bea1-c2032710a53a eeddd389-e35a-467f-b37e-63c6dc5b9257 > 68cd54b1-7239-4870-af6f-fd5c687c19a0 ... > apparentsize = 10737418240 The raw file size is 10G, this is the value used when creating creating the destination volume. Typically raw sparse file size should be same as the truesize. Please validate the file size with stat /path/to/volume. Then check the original file size, and check why the file was exported like this.
(In reply to Nir Soffer from comment #19) > (In reply to Maor from comment #18) > > This is also the output from vdsClient getVolumeInfo (truesize = 0): > > > > vdsClient -s 0 getVolumeInfo 9a7a9545-ad6e-480c-be33-03342fcd59ae > > f843c805-44f7-49e9-bea1-c2032710a53a eeddd389-e35a-467f-b37e-63c6dc5b9257 > > 68cd54b1-7239-4870-af6f-fd5c687c19a0 > ... > > apparentsize = 10737418240 > > The raw file size is 10G, this is the value used when creating creating > the destination volume. > > Typically raw sparse file size should be same as the truesize. > > Please validate the file size with stat /path/to/volume. > > Then check the original file size, and check why the file was exported like > this. /rhev/data-center/mnt/10.35.16.43:_export_data__ovirt6/9a7a9545-ad6e-480c-be33-03342fcd59ae/images/eeddd389-e35a-467f-b37e-63c6dc5b9257 -sh-4.2$ stat /rhev/data-center/mnt/10.35.16.43:_export_data__ovirt6/9a7a9545-ad6e-480c-be33-03342fcd59ae/images/eeddd389-e35a-467f-b37e-63c6dc5b9257/68cd54b1-7239-4870-af6f-fd5c687c19a0 File: ‘/rhev/data-center/mnt/10.35.16.43:_export_data__ovirt6/9a7a9545-ad6e-480c-be33-03342fcd59ae/images/eeddd389-e35a-467f-b37e-63c6dc5b9257/68cd54b1-7239-4870-af6f-fd5c687c19a0’ Size: 10737418240 Blocks: 0 IO Block: 1048576 regular file Device: 2dh/45d Inode: 1045765 Links: 1 Access: (0660/-rw-rw----) Uid: (65534/nfsnobody) Gid: (65534/nfsnobody) Context: system_u:object_r:nfs_t:s0 Access: 2016-09-27 23:39:01.032656485 +0300 Modify: 2016-09-27 23:39:01.026656491 +0300 Change: 2016-09-27 23:39:01.026656491 +0300 Birth: -
Looking at the code creating RAW SPARSE volume, we truncate the file to the virtual size. This is equivalent to: truncate -s 10G /path/to/volume When exporting this volume using qemu-img convert, the file is copied as is: qemu-img convert -f raw -O raw /pat/to/orig /path/to/export when importing the exported volume using qemu-img convert to block storage, we create full size volume since we cannot predict the size of the qemu file after the copy. qemu-img convert -f raw -O qcow2 -O compat=0.10 /path/to/export /path/to/new I don't think there is anything new about this behavior, so this should be considered as expected behavior. We can try to optimize this flow in future version. The first way is to predict the need size after the copy before doing the copy: qemu-header size + (used blocks * block size * estimated qemu overhead) Kevin, is there a better way to do this? The second way is to shrink the destination volume after the copy. We can use qemu-img check to get the image end offset, and reduce the lv. We do this in cold merge flow.
Kevin, I forgot to mention that after estimating the needed size, we going to round up to 128MiB extent size. We also like to leave about 1GiB empty space for future automatic extension when a vm is running. So we don't need very precise estimate.
(In reply to Nir Soffer from comment #21) > Looking at the code creating RAW SPARSE volume, we truncate the file to the > virtual size. This is equivalent to: > > truncate -s 10G /path/to/volume > > When exporting this volume using qemu-img convert, the file is copied as is: > > qemu-img convert -f raw -O raw /pat/to/orig /path/to/export > > when importing the exported volume using qemu-img convert to block storage, > we create full size volume since we cannot predict the size of the qemu file > after the copy. > > qemu-img convert -f raw -O qcow2 -O compat=0.10 /path/to/export > /path/to/new > > I don't think there is anything new about this behavior, so this should be > considered as expected behavior. > If that is so, why not block the use of importing thin provisioning to block storage in the GUI (At least until this will be solved in qemu)?
(In reply to Maor from comment #23) > (In reply to Nir Soffer from comment #21) > If that is so, why not block the use of importing thin provisioning to block > storage in the GUI (At least until this will be solved in qemu)? There is nothing to fix in qemu, this is a know ovirt issue. Blocking useful operation because it can be optimized it does not make sense.
(In reply to Nir Soffer from comment #24) > (In reply to Maor from comment #23) > > (In reply to Nir Soffer from comment #21) > > If that is so, why not block the use of importing thin provisioning to block > > storage in the GUI (At least until this will be solved in qemu)? > > There is nothing to fix in qemu, this is a know ovirt issue. Blocking useful > operation because it can be optimized it does not make sense. but as you mentioned the appropriate fix will probably be in future version, until then, won't it be helpful to add a warning or an audit log that will at least indicate it until it will be discussed and fixed properly...
We can add the same warning we have today during live storage migration, about volumes becoming preallocated, hopefully we can use the same text/translation. Then we can close this bug and open an rfe for smarter allocation/shrinking.
(In reply to Nir Soffer from comment #21) > The first way is to predict the need size after the copy before doing the > copy: > > qemu-header size + (used blocks * block size * estimated qemu overhead) > > Kevin, is there a better way to do this? Given that you don't need a precise estimation, I would just do this: used_clusters * (cluster_size + 8 + 2) + cluster_size For each cluster, we have the guest data (cluster_size), a 64 bit L2 table entry (that's the 8) and a 16 bit refcount (the 2). We also have the image header (cluster_size), top level mapping/refcount structures and refcounts for metadata. The L2 tables/refcount needs aren't completely correct, because these aren't allocated entry by entry, but a whole table is allocated at once. Depending on how you determine the used clusters, you might be able to count used L2 tables, too, separately instead of trying to calculate something. Assuming 64k clusters, each 512 MB chunk that has a used cluster in it needs a L2 table (which is 64k in size). Also, the size of the latter parts are a bit tricky to calculate, but they are small, so I wouldn't bother in your case. The above formula aims a bit too low. If you rather want to aim a bit too high, just add another byte for each cluster and you should be good. Of course, whenever you estimate the space needed for a qcow2 image, you need to look out for future changes in qemu that might cause additional data to be written. The only one I'm currently aware of is persistent bitmaps, but those won't be created by conversion from a raw image, so that's okay.
After discussion with Kevin, this should work for estimating qcow2 file size when converting from raw to qcow: cluster_size = 65536 total_size = (used_blocks * block_size) + (virtual_size / cluster_size * 8) So for the image in the description (10G virtual size, 2.1G used), we would allocate: used_blocks = 4404020 block_size = 512 virtual_size = 10737418240 min_allocation = (4404020 * 512) + (10737418240 / 65536 * 8) = 2256168960 We probably like 1G empty space to avoid instant extend when starting to use this disk in a vm, so we would allocate 3329910784 byes. We may need to add some extra space to mitigate future qemu features adding more metadata per cluster.
Kevin, I've been trying to use the estimated size for qcow but encountered a problem when I tried to convert an empty raw image to qcwo2. Base on the calculation of: (used_blocks * block_size) + (virtual_size / cluster_size * 64 bit L2 table entry) The size that should be allocated is 0, although the qcow2 volume size is 196608 (see [1]) Any suggestion how the calculation should be changed? [1] -sh-4.2$ qemu-img create -f raw /var/tmp/test 0 Formatting '/var/tmp/test', fmt=raw size=0 -sh-4.2$ stat /var/tmp/test File: ‘/var/tmp/test’ Size: 0 Blocks: 0 IO Block: 4096 regular empty file Device: fd00h/64768d Inode: 391993 Links: 1 Access: (0644/-rw-r--r--) Uid: (15809/mlipchuk) Gid: (15809/mlipchuk) Context: unconfined_u:object_r:user_tmp_t:s0 Access: 2016-12-25 00:43:32.347549225 +0200 Modify: 2016-12-26 14:27:45.928819236 +0200 Change: 2016-12-26 14:27:45.928819236 +0200 -sh-4.2$ qemu-img convert -p -t none -T none /var/tmp/test -O qcow2 -o compat=0.10 /var/tmp/test_convert (100.00/100%) -sh-4.2$ stat /var/tmp/test_convert File: ‘/var/tmp/test_convert’ Size: 196608 Blocks: 384 IO Block: 4096 regular file Device: fd00h/64768d Inode: 392854 Links: 1 Access: (0644/-rw-r--r--) Uid: (15809/mlipchuk) Gid: (15809/mlipchuk)
After discussing with the relevant stake holders, the proper way to solve it will be using an QEMU API that will allow us to estimate the target size based on the actual used size in the image thus targeting to 4.2
The fix for this bug is dependent on qemuimg map which supports SEEK_HOLE and SEEK_DATA, this will allow map to detect sparseness. That means that we will support this fix only for NFS v4.2. Lower NFS versions than 4.2 will still use pre-allocation based on the virtual size of the volume. This fix will be used on create template from VM, create VM from template (on DC 4.0), import a VM/Template from export domain, and also many trivial use cases that includes copy or move.
We should open a new bug for optimizing copy of thin provisioned disks from nfs < 4.2 and copying preallocated disks from file/block storage. In both cases we cannot estimate the size of the qcow2 file before the copy, but we can reduce the image to optimal size after the copy, using the infrastructure added for shrink after cold merge. See https://gerrit.ovirt.org/#/q/topic:coldmerge-shrink-volume
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No relevant external trackers attached] For more info please contact: rhv-devops
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Project 'vdsm'/Component 'ovirt-engine' mismatch] For more info please contact: rhv-devops
Verified, ovirt-engine-4.2.0-0.0.master.20171112130303.git8bc889c.el7.centos.noarch The imported disk is now thin provision with 10GB virtual size and is 2GB actual size
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1488
BZ<2>Jira Resync