Description of problem: When cloning a VM from a QCOW based template on a block-based storage domain the system creates a VM with a disk that has the same actual and virtual size. While the VM is initializing the VM's disk looks like it has the same actual and virtual size as the template that was used. Only after the system finishes adding the VM, the size of the VM's disk gets updated to actual size == virtual size. This behavior results in a big waste of storage space for the user. Version-Release number of selected component (if applicable): ovirt-engine-4.4.10-0.17.el8ev.noarch Steps to Reproduce: 1. Create a template of a QCOW2 disk on block-based storage (ISCSI). 2. Validate that the virtual size of the template disk is bigger than the actual size. 3. Create a VM via UI using the template (when creating the VM enter the resource allocation window and choose: format -> QCOW2, target -> ISCSI, Disk profile -> ISCSI) 4. After you approve the creation of the VM you can see via the UI that the VM's disk virtual and actual size are identical to the template (virtual size > actual size). Wait until the VM completes initialization and then you will see that the actual and virtual sizes are equal. Actual results: The actual and virtual sizes of the VM's disk that was created from the template on block-based storage domain using QCOW2 format are the same. Expected results: On this type of storage domain with this type of disk format, the virtual size of the disk should be bigger than the actual size (similar to the template actual and virtual sizes) Additional info: Info regarding the template disk I used: <name>latest-rhel-guest-image-7.9-infra</name> <description>latest-rhel-guest-image-7.9-infra (b200508)</description> <link href="/ovirt-engine/api/disks/9321e99e-ddd2-4be2-a41c-b35d5c7b5ff5/disksnapshots" rel="disksnapshots"/> <link href="/ovirt-engine/api/disks/9321e99e-ddd2-4be2-a41c-b35d5c7b5ff5/permissions" rel="permissions"/> <link href="/ovirt-engine/api/disks/9321e99e-ddd2-4be2-a41c-b35d5c7b5ff5/statistics" rel="statistics"/> <actual_size>2482417664</actual_size> <alias>latest-rhel-guest-image-7.9-infra</alias> <backup>none</backup> <content_type>data</content_type> <format>cow</format> <image_id>7bd24e3e-7959-4bb5-b45a-02d361521104</image_id> <propagate_errors>false</propagate_errors> <provisioned_size>10737418240</provisioned_size> <qcow_version>qcow2_v3</qcow_version> <shareable>false</shareable> <sparse>true</sparse> <status>ok</status> <storage_type>image</storage_type> <total_size>2482417664</total_size> <wipe_after_delete>false</wipe_after_delete> <disk_profile href="/ovirt-engine/api/diskprofiles/969fb64c-e4ab-4ae2-a8e2-772438eb519e" id="969fb64c-e4ab-4ae2-a8e2-772438eb519e"/> <quota href="/ovirt-engine/api/datacenters/3a5f14bd-cf8c-42df-8e84-12a3ba0920dc/quotas/ff9114d3-ca94-428d-b15b-ac000759c0e0" id="ff9114d3-ca94-428d-b15b-ac000759c0e0"/> <storage_domains> <storage_domain href="/ovirt-engine/api/storagedomains/4bdc8de8-73fc-4787-982d-e1cc181d3afc" id="4bdc8de8-73fc-4787-982d-e1cc181d3afc"/> <storage_domain href="/ovirt-engine/api/storagedomains/a52e665a-1472-4e21-a1bb-8e17817305bc" id="a52e665a-1472-4e21-a1bb-8e17817305bc"/> <storage_domain href="/ovirt-engine/api/storagedomains/4276d749-c178-4428-bee1-1e5fe4b642bc" id="4276d749-c178-4428-bee1-1e5fe4b642bc"/> <storage_domain href="/ovirt-engine/api/storagedomains/94fb6c00-a543-4003-9099-c016634b7e28" id="94fb6c00-a543-4003-9099-c016634b7e28"/> <storage_domain href="/ovirt-engine/api/storagedomains/e6ad2044-eb89-4d33-b326-dbeadc120b52" id="e6ad2044-eb89-4d33-b326-dbeadc120b52"/> <storage_domain href="/ovirt-engine/api/storagedomains/80fb91a1-ee40-48fb-9703-37d485a38783" id="80fb91a1-ee40-48fb-9703-37d485a38783"/> <storage_domain href="/ovirt-engine/api/storagedomains/af38f95c-14d5-46bc-8de2-84c4f35f7825" id="af38f95c-14d5-46bc-8de2-84c4f35f7825"/> <storage_domain href="/ovirt-engine/api/storagedomains/3dac53ad-0502-411e-803b-028d5cd02a86" id="3dac53ad-0502-411e-803b-028d5cd02a86"/> <storage_domain href="/ovirt-engine/api/storagedomains/fb11a4fb-cff4-441f-8597-6c25d285204b" id="fb11a4fb-cff4-441f-8597-6c25d285204b"/> </storage_domains> </disk> Info regarding the disk of the VM that I created using the template: <name>latest-rhel-guest-image-7.9-infra</name> <description>latest-rhel-guest-image-7.9-infra (b200508)</description> <link href="/ovirt-engine/api/disks/e6745f45-6d14-46a7-b0ba-e628160949a1/disksnapshots" rel="disksnapshots"/> <link href="/ovirt-engine/api/disks/e6745f45-6d14-46a7-b0ba-e628160949a1/permissions" rel="permissions"/> <link href="/ovirt-engine/api/disks/e6745f45-6d14-46a7-b0ba-e628160949a1/statistics" rel="statistics"/> <actual_size>10871635968</actual_size> <alias>latest-rhel-guest-image-7.9-infra</alias> <backup>none</backup> <content_type>data</content_type> <format>cow</format> <image_id>5d0fbc89-6494-4c6d-9e99-21b23d64b848</image_id> <propagate_errors>false</propagate_errors> <provisioned_size>10737418240</provisioned_size> <qcow_version>qcow2_v3</qcow_version> <shareable>false</shareable> <sparse>true</sparse> <status>ok</status> <storage_type>image</storage_type> <total_size>10871635968</total_size> <wipe_after_delete>false</wipe_after_delete> <disk_profile href="/ovirt-engine/api/diskprofiles/79369169-2c78-482e-b529-9ec6f588a64b" id="79369169-2c78-482e-b529-9ec6f588a64b"/> <quota href="/ovirt-engine/api/datacenters/3a5f14bd-cf8c-42df-8e84-12a3ba0920dc/quotas/ff9114d3-ca94-428d-b15b-ac000759c0e0" id="ff9114d3-ca94-428d-b15b-ac000759c0e0"/> <storage_domains> <storage_domain href="/ovirt-engine/api/storagedomains/94fb6c00-a543-4003-9099-c016634b7e28" id="94fb6c00-a543-4003-9099-c016634b7e28"/> </storage_domains> </disk>
*** Bug 1932794 has been marked as a duplicate of this bug. ***
*** Bug 2025585 has been marked as a duplicate of this bug. ***
*** Bug 1459455 has been marked as a duplicate of this bug. ***
(In reply to Amit Sharir from comment #0) Are you sure you clone from qcow2 template? I could not reproduce this on my setup. When I clone qcow2 template I get the expected disk similar to the template disk. There are no logs attached to this bug, so we really cannot do anything with it. Please add engine and vdsm log showing how you created the template and how you cloned the vm. Also please provide output of "qemu-img measure" for both the template disk and the cloned disk: qemu-img measure -O qcow2 /dev/{storage-domain-id}/{lv-name}
Amit, see https://bugzilla.redhat.com/2025585#c8 - it shows how cloning cow template creates small logical volume as expected.
(In reply to Nir Soffer from comment #4) > (In reply to Amit Sharir from comment #0) > > Are you sure you clone from qcow2 template? I could not reproduce this > on my setup. When I clone qcow2 template I get the expected disk similar > to the template disk. > > There are no logs attached to this bug, so we really cannot do anything > with it. > > Please add engine and vdsm log showing how you created the template and > how you cloned the vm. > > Also please provide output of "qemu-img measure" for both the template disk > and the cloned disk: > > qemu-img measure -O qcow2 /dev/{storage-domain-id}/{lv-name} Yes, I am sure. You can also see the info regarding the template disk I used in the description of the problem. In addition, this is probably not a new issue since I saw old bugs that refer to the same problem (bugs: 1932794, 1459455 - that Ilan and Eyal opened). I will add the credentials of my environment in a private comment so you could see and reproduce the issue. I will also add the relevant logs and info you requested.
The attached log do not include the entire flow. Searching for new disk API calls: $ grep dca6948d-f3f3-40e6-b931-9ef858d7c28d *.log | grep START vdsm1.log:2021-12-22 14:59:07,981+0200 INFO (jsonrpc/6) [vdsm.api] START getVolumeInfo(sdUUID='e6ad2044-eb89-4d33-b326-dbeadc120b52', spUUID='3a5f14bd-cf8c-42df-8e84-12a3ba0920dc', imgUUID='293fd177-edd1-44e7-96fc-00dd8f5177a5', volUUID='dca6948d-f3f3-40e6-b931-9ef858d7c28d') from=::ffff:10.46.12.152,60714, flow_id=75093178-091a-422f-9846-8d0138a2c458, task_id=602ac622-9dcb-49a3-b137-de432e89ea07 (api:48) vdsm1.log:2021-12-22 14:59:08,303+0200 INFO (jsonrpc/0) [vdsm.api] START sdm_copy_data(job_id='e6626162-6f9e-4e09-9283-c3a53202c573', source={'endpoint_type': 'div', 'prepared': False, 'sd_id': 'e6ad2044-eb89-4d33-b326-dbeadc120b52', 'img_id': '9321e99e-ddd2-4be2-a41c-b35d5c7b5ff5', 'vol_id': '7bd24e3e-7959-4bb5-b45a-02d361521104'}, destination={'generation': 0, 'endpoint_type': 'div', 'prepared': False, 'sd_id': 'e6ad2044-eb89-4d33-b326-dbeadc120b52', 'img_id': '293fd177-edd1-44e7-96fc-00dd8f5177a5', 'vol_id': 'dca6948d-f3f3-40e6-b931-9ef858d7c28d'}, copy_bitmaps=False) from=::ffff:10.46.12.152,60714, flow_id=75093178-091a-422f-9846-8d0138a2c458, task_id=61c29d55-57a4-403b-a6ea-faa6235175ca (api:48) Searching for template API calls: $ grep 7bd24e3e-7959-4bb5-b45a-02d361521104 *.log | grep START vdsm1.log:2021-12-22 14:59:08,303+0200 INFO (jsonrpc/0) [vdsm.api] START sdm_copy_data(job_id='e6626162-6f9e-4e09-9283-c3a53202c573', source={'endpoint_type': 'div', 'prepared': False, 'sd_id': 'e6ad2044-eb89-4d33-b326-dbeadc120b52', 'img_id': '9321e99e-ddd2-4be2-a41c-b35d5c7b5ff5', 'vol_id': '7bd24e3e-7959-4bb5-b45a-02d361521104'}, destination={'generation': 0, 'endpoint_type': 'div', 'prepared': False, 'sd_id': 'e6ad2044-eb89-4d33-b326-dbeadc120b52', 'img_id': '293fd177-edd1-44e7-96fc-00dd8f5177a5', 'vol_id': 'dca6948d-f3f3-40e6-b931-9ef858d7c28d'}, copy_bitmaps=False) from=::ffff:10.46.12.152,60714, flow_id=75093178-091a-422f-9846-8d0138a2c458, task_id=61c29d55-57a4-403b-a6ea-faa6235175ca (api:48) Interesting, we see one measure log: $ grep measure *.log vdsm2.log:2021-12-22 12:05:40,318+0200 INFO (jsonrpc/7) [vdsm.api] START measure(sdUUID='f956805b-af47-471e-a17e-6d6435dfecb4', imgUUID='551dadfa-8834-45f6-98fd-fb6818399efe', volUUID='f8e164a5-9eb5-4af4-a407-ebdc11c8d5fd', dest_format=4, backing=True) from=::ffff:10.46.12.140,42934, flow_id=bc8ebace-2934-46fd-9671-a74de32787be, task_id=75231874-37c0-404e-9b87-ae884398d620 (api:48) vdsm2.log:2021-12-22 12:05:40,456+0200 INFO (jsonrpc/7) [vdsm.api] FINISH measure return={'result': {'bitmaps': 0, 'required': 3393847296, 'fully-allocated': 10739318784}} from=::ffff:10.46.12.140,42934, flow_id=bc8ebace-2934-46fd-9671-a74de32787be, task_id=75231874-37c0-404e-9b87-ae884398d620 (api:54) But this is for another volume on another storage domain, and it was done 3 hours before the copy_data job. The 3 logs starts at: $ head -1 *.log ==> vdsm1.log <== 2021-12-22 13:01:03,292+0200 DEBUG (check/loop) [storage.check] START check '/rhev/data-center/mnt/mantis-nfs-lif2.lab.eng.tlv2.redhat.com:_nas01_ge__storage5__nfs__1/4bdc8de8-73fc-4787-982d-e1cc181d3afc/dom_md/metadata' (delay=0.01) (check:289) ==> vdsm2.log <== 2021-12-22 12:01:02,773+0200 INFO (jsonrpc/7) [api.host] START getAllVmStats() from=::1,35614 (api:48) ==> vdsm3.log <== 2021-12-22 14:01:01,620+0200 DEBUG (check/loop) [storage.check] START check '/rhev/data-center/mnt/mantis-nfs-lif2.lab.eng.tlv2.redhat.com:_nas01_ge__storage2__nfs__1/22bba973-e222-4e33-aca8-d78cd53cb260/dom_md/metadata' (delay=0.01) (check:289) And ends at: $ tail -n1 *.log ==> vdsm1.log <== 2021-12-22 15:02:40,596+0200 DEBUG (periodic/0) [storage.TaskManager.Task] (Task='d762362f-dd3a-4191-946d-9abe3170639f') ref 0 aborting False (task:1000) ==> vdsm2.log <== 2021-12-22 15:27:22,107+0200 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) ==> vdsm3.log <== 2021-12-22 15:31:40,543+0200 INFO (ioprocess/1942204) [IOProcessClient] (glusterSD/gluster01.lab.eng.tlv2.redhat.com:_GE__storage2__volume02) ioprocess was terminated by signal 9 (__init__:200) Based on this, engine did not measure the template before the copy, which explains why we create a new volume with the wrong size. But I don't see the createVolume log, which must happen few seconds before copy_data and this does not make any sense. Amit, are these complete unmodified logs from the 3 hosts in your env?
After debugging on Amit enviroment, the issue is this: If we create a new template and it exists only on one block storage domain, clone from the template creates new disk in the correct size: # lvs -o vg_name,lv_name,size,tags | grep bd99b34b-3bfc-4053-86ea-03dbdd854902 678d1452-27f0-4497-9a75-aa55837061be 7aa3d821-2b32-4451-8f16-9c6ae01850c5 2.62g IU_bd99b34b-3bfc-4053-86ea-03dbdd854902,MD_9,PU_00000000-0000-0000-0000-000000000000 In vdsm log, we see: # grep measure vdsm-new-temlate.log 2021-12-22 19:50:09,752+0200 INFO (jsonrpc/3) [vdsm.api] START measure(sdUUID='678d1452-27f0-4497-9a75-aa55837061be', imgUUID='5b752f2d-e246-4020-855b-7cd69d35d717', volUUID='2d4868b1-2566-4b07-a462-519baa5a4723', dest_format=4, backing=True) from=::ffff:10.46.12.152,50592, flow_id=bfb58300-f349-4144-92c6-5a195e6317fa, task_id=532fa56c-ee8d-4ed7-8364-1f81c8c919ed (api:48) 2021-12-22 19:50:09,939+0200 INFO (jsonrpc/3) [vdsm.api] FINISH measure return={'result': {'bitmaps': 0, 'required': 2473590784, 'fully-allocated': 10739318784}} from=::ffff:10.46.12.152,50592, flow_id=bfb58300-f349-4144-92c6-5a195e6317fa, task_id=532fa56c-ee8d-4ed7-8364-1f81c8c919ed (api:54) And engine asks to create the destination volume with the expected initial_size: # grep 'START createVolume' vdsm-new-temlate.log 2021-12-22 19:50:09,976+0200 INFO (jsonrpc/1) [vdsm.api] START createVolume(sdUUID='678d1452-27f0-4497-9a75-aa55837061be', spUUID='2ece5cff-fa68-4310-b779-0fbf831e09e3', imgUUID='bd99b34b-3bfc-4053-86ea-03dbdd854902', size='10737418240', volFormat=4, preallocate=2, diskType='DATA', volUUID='7aa3d821-2b32-4451-8f16-9c6ae01850c5', desc='{"DiskAlias":"","DiskDescription":""}', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize='2473590784', addBitmaps=False) from=::ffff:10.46.12.152,51370, flow_id=bfb58300-f349-4144-92c6-5a195e6317fa, task_id=6113c0a9-4fef-4df3-8f4f-4b2d9e3bc28b (api:48) But if we use latest-rhel-guest-image-7.9-infra, which have disks on 9 storage domains (iscsi_0, iscsi_1, iscsi_2, nfs_0, nfs_1, nfs_2, test_glsuter_0, test_gluster_1, test_gluster_2), we don't run measure during the copy flow: # grep measure vdsm-old-temlate.log (nothing) And engine ask to create the destination with intial size of 9.0G: # grep 'START createVolume' vdsm-old-temlate.log 2021-12-22 20:17:24,712+0200 INFO (jsonrpc/5) [vdsm.api] START createVolume(sdUUID='678d1452-27f0-4497-9a75-aa55837061be', spUUID='2ece5cff-fa68-4310-b779-0fbf831e09e3', imgUUID='df14019b-69fd-4e5c-ad45-0a989a731448', size='10737418240', volFormat=4, preallocate=2, diskType='DATA', volUUID='21c23eff-1ca0-4c87-afba-52ee36be9d1f', desc='{"DiskAlias":"","DiskDescription":""}', srcImgUUID='00000000-0000-0000-0000-000000000000', srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize='9761289310', addBitmaps=False) from=::ffff:10.46.12.152,51370, flow_id=60490c7d-2aa5-4bd9-8206-708f976d50cf, task_id=8537e87f-6ef8-47cd-a826-d88b61eb856a (api:48) Vdsm will create a volume of 9761289310 * 1.1, which gives a volume size of 10.12g. Why engine does not call measure? Looking in CopyImageGroupWithDataCommand.java: private Long determineTotalImageInitialSize(DiskImage sourceImage, VolumeFormat destFormat, Guid srcDomain) { // Check if we have a host in the DC capable of running the measure volume verb, // otherwise fallback to the legacy method Guid hostId = imagesHandler.getHostForMeasurement(sourceImage.getStoragePoolId(), sourceImage.getId()); // We are collapsing the chain, so we want to measure the leaf to get the size // of the entire chain When copying a template image, there are no chains - template has single image. This search return all the images with the template disk id from multiple storage domains. Not all the snapshots of a single disk. List<DiskImage> images = diskImageDao.getAllSnapshotsForImageGroup(sourceImage.getId()); This sort is probably meaning less, since unrelated images from different storage domain have no order. imagesHandler.sortImageList(images); This assumes that the last image is the leaf - but since we have images from multiple storage domains, we just get a random image from the list, from random storage domain. DiskImage leaf = images.get(images.size() - 1); Since we have random image from random domain, and we have 3 block domains and 6 file domains, we have good chance to get an image on file domain. if (hostId == null || (leaf.getActive() && !leaf.getStorageTypes().get(0).isBlockDomain())) { So we get into this legacy code, that use some crappy heuristics to determine the size. return imagesHandler.determineTotalImageInitialSize(getDiskImage(), getParameters().getDestinationFormat(), getParameters().getSrcDomain(), getParameters().getDestDomain()); } else { Instead of this code, using qemu-img measure to get a good estimate: MeasureVolumeParameters parameters = new MeasureVolumeParameters(leaf.getStoragePoolId(), srcDomain, leaf.getId(), leaf.getImageId(), destFormat.getValue()); parameters.setParentCommand(getActionType()); parameters.setEndProcedure(EndProcedure.PARENT_MANAGED); parameters.setVdsRunningOn(hostId); parameters.setCorrelationId(getCorrelationId()); ActionReturnValue actionReturnValue = runInternalAction(ActionType.MeasureVolume, parameters, ExecutionHandler.createDefaultContextForTasks(getContext())); if (!actionReturnValue.getSucceeded()) { throw new RuntimeException("Could not measure volume"); } return actionReturnValue.getActionReturnValue(); } } I think the fix is: - choose which storage domain we want to copy from - check if the source image is a template - for template, measure the single image - otherwise use get the snapshots for the image group id for the selected storage domain and sort them to get the leaf Benny, what do you think?
Removing the need info since Nir finished the investigation in #c13
(In reply to Nir Soffer from comment #13) > After debugging on Amit enviroment, the issue is this: > > If we create a new template and it exists only on one block storage domain, > clone from the template creates new disk in the correct size: > > # lvs -o vg_name,lv_name,size,tags | grep > bd99b34b-3bfc-4053-86ea-03dbdd854902 > 678d1452-27f0-4497-9a75-aa55837061be 7aa3d821-2b32-4451-8f16-9c6ae01850c5 > 2.62g > IU_bd99b34b-3bfc-4053-86ea-03dbdd854902,MD_9,PU_00000000-0000-0000-0000- > 000000000000 > > In vdsm log, we see: > > # grep measure vdsm-new-temlate.log > 2021-12-22 19:50:09,752+0200 INFO (jsonrpc/3) [vdsm.api] START > measure(sdUUID='678d1452-27f0-4497-9a75-aa55837061be', > imgUUID='5b752f2d-e246-4020-855b-7cd69d35d717', > volUUID='2d4868b1-2566-4b07-a462-519baa5a4723', dest_format=4, backing=True) > from=::ffff:10.46.12.152,50592, > flow_id=bfb58300-f349-4144-92c6-5a195e6317fa, > task_id=532fa56c-ee8d-4ed7-8364-1f81c8c919ed (api:48) > 2021-12-22 19:50:09,939+0200 INFO (jsonrpc/3) [vdsm.api] FINISH measure > return={'result': {'bitmaps': 0, 'required': 2473590784, 'fully-allocated': > 10739318784}} from=::ffff:10.46.12.152,50592, > flow_id=bfb58300-f349-4144-92c6-5a195e6317fa, > task_id=532fa56c-ee8d-4ed7-8364-1f81c8c919ed (api:54) > > And engine asks to create the destination volume with the expected > initial_size: > > # grep 'START createVolume' vdsm-new-temlate.log > 2021-12-22 19:50:09,976+0200 INFO (jsonrpc/1) [vdsm.api] START > createVolume(sdUUID='678d1452-27f0-4497-9a75-aa55837061be', > spUUID='2ece5cff-fa68-4310-b779-0fbf831e09e3', > imgUUID='bd99b34b-3bfc-4053-86ea-03dbdd854902', size='10737418240', > volFormat=4, preallocate=2, diskType='DATA', > volUUID='7aa3d821-2b32-4451-8f16-9c6ae01850c5', > desc='{"DiskAlias":"","DiskDescription":""}', > srcImgUUID='00000000-0000-0000-0000-000000000000', > srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize='2473590784', > addBitmaps=False) from=::ffff:10.46.12.152,51370, > flow_id=bfb58300-f349-4144-92c6-5a195e6317fa, > task_id=6113c0a9-4fef-4df3-8f4f-4b2d9e3bc28b (api:48) > > But if we use latest-rhel-guest-image-7.9-infra, which have disks on > 9 storage domains (iscsi_0, iscsi_1, iscsi_2, nfs_0, nfs_1, nfs_2, > test_glsuter_0, test_gluster_1, test_gluster_2), we don't run measure > during the copy flow: > > # grep measure vdsm-old-temlate.log > (nothing) > > And engine ask to create the destination with intial size of 9.0G: > > # grep 'START createVolume' vdsm-old-temlate.log > 2021-12-22 20:17:24,712+0200 INFO (jsonrpc/5) [vdsm.api] START > createVolume(sdUUID='678d1452-27f0-4497-9a75-aa55837061be', > spUUID='2ece5cff-fa68-4310-b779-0fbf831e09e3', > imgUUID='df14019b-69fd-4e5c-ad45-0a989a731448', size='10737418240', > volFormat=4, preallocate=2, diskType='DATA', > volUUID='21c23eff-1ca0-4c87-afba-52ee36be9d1f', > desc='{"DiskAlias":"","DiskDescription":""}', > srcImgUUID='00000000-0000-0000-0000-000000000000', > srcVolUUID='00000000-0000-0000-0000-000000000000', initialSize='9761289310', > addBitmaps=False) from=::ffff:10.46.12.152,51370, > flow_id=60490c7d-2aa5-4bd9-8206-708f976d50cf, > task_id=8537e87f-6ef8-47cd-a826-d88b61eb856a (api:48) > > Vdsm will create a volume of 9761289310 * 1.1, which gives a volume size > of 10.12g. > > Why engine does not call measure? > > Looking in CopyImageGroupWithDataCommand.java: > > private Long determineTotalImageInitialSize(DiskImage sourceImage, > VolumeFormat destFormat, > Guid srcDomain) { > // Check if we have a host in the DC capable of running the measure > volume verb, > // otherwise fallback to the legacy method > Guid hostId = > imagesHandler.getHostForMeasurement(sourceImage.getStoragePoolId(), > sourceImage.getId()); > // We are collapsing the chain, so we want to measure the leaf to > get the size > // of the entire chain > > When copying a template image, there are no chains - template has single > image. > > This search return all the images with the template disk id from multiple > storage domains. Not all the snapshots of a single disk. > > List<DiskImage> images = > diskImageDao.getAllSnapshotsForImageGroup(sourceImage.getId()); > > This sort is probably meaning less, since unrelated images from different > storage > domain have no order. > > imagesHandler.sortImageList(images); > > This assumes that the last image is the leaf - but since we have images > from multiple storage domains, we just get a random image from the list, > from random storage domain. > > DiskImage leaf = images.get(images.size() - 1); > > Since we have random image from random domain, and we have 3 block domains > and 6 file domains, we have good chance to get an image on file domain. > > if (hostId == null || (leaf.getActive() && > !leaf.getStorageTypes().get(0).isBlockDomain())) { > > So we get into this legacy code, that use some crappy heuristics > to determine the size. > > return > imagesHandler.determineTotalImageInitialSize(getDiskImage(), > getParameters().getDestinationFormat(), > getParameters().getSrcDomain(), > getParameters().getDestDomain()); > } else { > > Instead of this code, using qemu-img measure to get a good estimate: > > MeasureVolumeParameters parameters = new > MeasureVolumeParameters(leaf.getStoragePoolId(), > srcDomain, > leaf.getId(), > leaf.getImageId(), > destFormat.getValue()); > parameters.setParentCommand(getActionType()); > parameters.setEndProcedure(EndProcedure.PARENT_MANAGED); > parameters.setVdsRunningOn(hostId); > parameters.setCorrelationId(getCorrelationId()); > ActionReturnValue actionReturnValue = > runInternalAction(ActionType.MeasureVolume, parameters, > > ExecutionHandler.createDefaultContextForTasks(getContext())); > > if (!actionReturnValue.getSucceeded()) { > throw new RuntimeException("Could not measure volume"); > } > > return actionReturnValue.getActionReturnValue(); > } > } > > I think the fix is: > > - choose which storage domain we want to copy from > - check if the source image is a template > - for template, measure the single image > - otherwise use get the snapshots for the image group id for the > selected storage domain and sort them to get the leaf > > Benny, what do you think? Sounds right, posted proposed patch https://gerrit.ovirt.org/c/ovirt-engine/+/118153
Benny, would you post https://gerrit.ovirt.org/c/ovirt-engine/+/118266 on GitHub?
(In reply to Arik from comment #18) > Benny, would you post https://gerrit.ovirt.org/c/ovirt-engine/+/118266 on > GitHub? As discussed, this patch is an attempt at an optimization, not directly related to this bug so I'm moving it to modified
Verified. Virtual size of the disk is bigger than an actual size after reproducing all the steps. Versions: ovirt-engine-4.5.0.2-0.7.el8ev vdsm-4.50.0.12-1.el8ev.x86_64
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.