Created attachment 1501362 [details] engine and vdsm logs Description of problem: Creating a template from VM with qcow disk increase the allocated size by the original VM's disk size (in my case 2GB) instead of the expected 1GB. The issue occurs mainly with this 2 VM's flow as seen below. The issue appeared for the first time in this build in automation TestCase11541 and not in prior releases. Issue reproduced also manually but not all the time . Version-Release number of selected component (if applicable): ovirt-engine 4.2.7.4-0.1.el7ev VDSM -4.20.43-1 How reproducible: ~ 50% Steps to Reproduce: Use RESTAPI get to storage domain to get the commited (allocated) size. (https://storage-ge-04.scl.lab.tlv.redhat.com/ovirt-engine/api/storagedomains/c1f85d4d-9397-4990-bcb8-74c53bac4fae) -Start test : <name>iscsi_0</name> <committed>24696061952</committed> => 23G 1.Create VM1 , VM2 - same committed size as expected. <name>iscsi_0</name> <committed>24696061952</committed> => 23G 2.Add 2G qcow sparse disk to VM1 and activate it - committed size 25G( +2G) as expected. RestAPI: <name>iscsi_0</name> <committed>26843545600</committed>=> 25G 3.Add 2G raw prealloc disk to VM2 and activate it- committed size 27G( +2G) as expected. RestAPI: <name>iscsi_0</name> <committed>28991029248</committed> = > 27G 4. Create a template t1 from VM1 with same disks params (2G qcow sparse) Actual results: Expected committed size: 28G( +1G) as expected . <committed>30064c771072</committed> Expected results: RestAPI actual: 29G( +2G) <name>iscsi_0</name> <committed>31138512896</committed> Additional info:
This bug was seen only on block SD. This bug was not seen prior to 4.2.7.4-0.1.el7ev via automation - looks like a regression, priority updated to high. More info: There are 2 scenario's for this bug - maybe 2 bugs ? PLEASE ADVISE IF SO. Scenario 1 - SD allocation size after successful VM template creation: Step 4(create a template from VM with QCOW disk) succeeds but committed (allocated) size of the SD increase by 2GB instead of 1GB . Scenario 2 - and more severe(which the logs of this bug suggest) : Step 4(create a template from VM with QCOW disk) FAILED (due to Bug 1588100) BUT AFTER THE TEMPLATE IS REMOVED STILL THE SD SIZE REMAINS THE SAME meaning the size of the SD increase by 2GB. Scenario2 indicate lack of clean up in the SD(allocated storage size leak) allocated size which can lead to filling up the SD allocated size although it's actually free. Rasing this bug severity to high. Currently attached logs are matching scenario2. I will reproduce scenario1 and attach those logs as well.
This bug was also reproduced manually running the scenario via UI in both scenarios ( successful VM template creation and failure to create VM template) No easy to reproduce. About ~30% chance to hit the issue with successful VM template creation(AKA scenario 1) About ~10% chance to hit the issue with failure VM template creation (AKA scenario 2)
Created attachment 1501705 [details] scenario1 - SD allocation size after successful VM template creation- 4.3 logs Scenario 1 - SD allocation size after successful VM template creation is wrong is seen also at 4.3. Engine - 4.3.0-0.0.master.20181016132820.gite60d148) VDSM - 4.30.1-25.gitce9e416 4.3 Logs are attached. Full details : -Start test : REST URL used : https://storage-ge-08.scl.lab.tlv.redhat.com/ovirt-engine/api/storagedomains/02b98d9e-82d7-468c-9456-d8fcb00159e6 used ISCSI storage domain: name = iscsi_0 ID = "02b98d9e-82d7-468c-9456-d8fcb00159e6" <name>iscsi_0</name> <committed>24696061952</committed> => 23G 1.Create VM1 , VM2 - same committed size as expected. 2018-11-05 11:06:24,803+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-180) [8dfe214] EVENT_ID: USER_ADD_VM(34), VM vm1 was created by admin@internal-authz. 2018-11-05 11:06:30,714+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-183) [5f140639] EVENT_ID: USER_ADD_VM(34), VM vm2 was created by admin@internal-authz. <name>iscsi_0</name> <committed>24696061952</committed> => 23G 2.Add 2G qcow sparse disk to VM1 and activate it - committed size 25G( +2G) as expected. 2018-11-05 11:06:53,290+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [] EVENT_ID: USER_ADD_DISK_TO_VM_FINISHED_SUCCESS(97), The disk vm1_Disk1 was successfully added to VM vm1. RestAPI: <name>iscsi_0</name> <committed>26843545600</committed>=> 25G 3.Add 2G raw prealloc disk to VM2 and activate it- committed size 27G( +2G) as expected. 2018-11-05 11:07:29,562+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-100) [] EVENT_ID: USER_ADD_DISK_TO_VM_FINISHED_SUCCESS(97), The disk vm2_Disk1 was successfully added to VM vm2. RestAPI: <name>iscsi_0</name> <committed>28991029248</committed> = > 27G 4. Create a template t1 from VM1 with same disks params (2G qcow sparse) 2018-11-05 11:09:25,849+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-180) [81aa6e60-38eb-4abd-bba7-8dc7a7d1b8ea] EVENT_ID: USER_ADD_VM_TEMPLATE(48), Creation of Template t1 from VM vm1 was initiated by admin@internal-authz. 2018-11-05 11:10:39,836+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-82) [81aa6e60-38eb-4abd-bba7-8dc7a7d1b8ea] EVENT_ID: USER_ADD_VM_TEMPLATE_FINISHED_SUCCESS(51), Creation of Template t1 from VM vm1 has been completed. Actual results: <name>iscsi_0</name> <committed>31138512896</committed> => 29G (+2G) Expected committed size: 28G( +1G) as expected . <committed>30064c771072</committed> => 28G(+1G)
Eyal, can you check if it's related to your latest work on preallocation?
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
No, it doesn't. 1) My work was on file-based domains only. 2) My patches merge long before the specified version
Created attachment 1502005 [details] scenario1 - SD allocation size after failed VM template creation- 4.2.7.4 logs 4.2.7.4 reproduction of scenario1 - SD allocation size after failed VM template creation. Timeline via engine log EVENT_ID : 2018-11-05 17:47:13,289+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-4) [58e1912c] EVENT_ID: USER_ADD_VM(34), VM vm_TestCase11541_0517471210 was created by admin@internal-authz. 2018-11-05 17:47:15,371+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-22) [2385cd58] EVENT_ID: NETWORK_ACTIVATE_VM_INTERFACE_SUCCESS(1,012), Network Interface nic1 (VirtIO) was plugged to VM vm_TestCase11541_0517471210. (User: admin@internal-authz) 2018-11-05 17:47:15,388+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-22) [2385cd58] EVENT_ID: NETWORK_ADD_VM_INTERFACE(932), Interface nic1 (VirtIO) was added to VM vm_TestCase11541_0517471210. (User: admin@internal-authz) 2018-11-05 17:47:16,827+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-9) [diskattachments_create_18e3bf5a-4af8] EVENT_ID: USER_ADD_DISK_TO_VM(78), Add-Disk operation of vm_TestCase11541_0517471210_Disk1 was initiated on VM vm_TestCase11541_0517471210 by admin@internal-authz. 2018-11-05 17:47:35,974+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-75) [] EVENT_ID: USER_ADD_DISK_TO_VM_FINISHED_SUCCESS(97), The disk vm_TestCase11541_0517471210_Disk1 was successfully added to VM vm_TestCase11541_0517471210. 2018-11-05 17:47:38,222+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-29) [7742a801] EVENT_ID: USER_ADD_VM(34), VM vm_TestCase11541_0517473742 was created by admin@internal-authz. 2018-11-05 17:47:39,693+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-14) [1505096] EVENT_ID: NETWORK_ACTIVATE_VM_INTERFACE_SUCCESS(1,012), Network Interface nic1 (VirtIO) was plugged to VM vm_TestCase11541_0517473742. (User: admin@internal-authz) 2018-11-05 17:47:39,712+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-14) [1505096] EVENT_ID: NETWORK_ADD_VM_INTERFACE(932), Interface nic1 (VirtIO) was added to VM vm_TestCase11541_0517473742. (User: admin@internal-authz) 2018-11-05 17:47:41,043+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-18) [diskattachments_create_80a75a90-e653] EVENT_ID: USER_ADD_DISK_TO_VM(78), Add-Disk operation of vm_TestCase11541_0517473742_Disk1 was initiated on VM vm_TestCase11541_0517473742 by admin@internal-authz. 2018-11-05 17:47:48,434+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-89) [] EVENT_ID: USER_ADD_DISK_TO_VM_FINISHED_SUCCESS(97), The disk vm_TestCase11541_0517473742_Disk1 was successfully added to VM vm_TestCase11541_0517473742. 2018-11-05 17:47:54,368+02 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-12) [templates_create_431a52a6-a408-46ba] EVENT_ID: USER_ADD_VM_TEMPLATE(48), Creation of Template templ_type_0517473742 from VM vm_TestCase11541_0517471210 was initiated by admin@internal-authz. 2018-11-05 17:48:59,563+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-51) [templates_create_431a52a6-a408-46ba] EVENT_ID: USER_ADD_VM_TEMPLATE_CREATE_TEMPLATE_FAILURE(1,322), Failed to create Template templ_type_0517473742 or its disks from VM vm_TestCase11541_0517471210. AFTER TEMPLATE DISK IS DELELTED the REST get of iscsi_0 SD looking at committed(allocated) size, the wrong value appears 31138512896 (29G - meaning +6G from starting point which is 23G) instead of the expected 28991029248 (27G +4G from starting point which is 23G) REST get response: <name>iscsi_0</name> <description/> <comment/> <link href="/ovirt-engine/api/storagedomains/c1f85d4d-9397-4990-bcb8-74c53bac4fae/permissions" rel="permissions"/> <link href="/ovirt-engine/api/storagedomains/c1f85d4d-9397-4990-bcb8-74c53bac4fae/templates" rel="templates"/> <link href="/ovirt-engine/api/storagedomains/c1f85d4d-9397-4990-bcb8-74c53bac4fae/vms" rel="vms"/> <link href="/ovirt-engine/api/storagedomains/c1f85d4d-9397-4990-bcb8-74c53bac4fae/storageconnections" rel="storageconnections"/> <link href="/ovirt-engine/api/storagedomains/c1f85d4d-9397-4990-bcb8-74c53bac4fae/diskprofiles" rel="diskprofiles"/> <link href="/ovirt-engine/api/storagedomains/c1f85d4d-9397-4990-bcb8-74c53bac4fae/disks" rel="disks"/> <link href="/ovirt-engine/api/storagedomains/c1f85d4d-9397-4990-bcb8-74c53bac4fae/disksnapshots" rel="disksnapshots"/> <available>144955146240</available> <backup>false</backup> <committed>31138512896</committed>
This bug is VERY HARD TO reproduce. I ran multiple automation tests and only after 21 iterations in 4.2.7.4 did this issue appear ! In 4.2.7.3 I ran 50 iterations and the bug did not appear, but still, I am not sure this is a regression bug, maybe just a hard to reproduce. Removing "regression" keyword for now.
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
Hi Avihai, The committed size value seems to be the result of duplicating the sum of all images' 'actual_size' value with 0.000000000931322574615478515625. (According to create_views.sql#CREATE OR REPLACE VIEW storage_domains_image_sizes) Therefore, due to rounding the result, it's likely to get an up rounding for some of the cases, so I think it's not a bug. What do you say?
(In reply to shani from comment #10) > Hi Avihai, > The committed size value seems to be the result of duplicating the sum of > all images' 'actual_size' value with 0.000000000931322574615478515625. > (According to create_views.sql#CREATE OR REPLACE VIEW > storage_domains_image_sizes) > > Therefore, due to rounding the result, it's likely to get an up rounding for > some of the cases, so I think it's not a bug. > > What do you say? Hi Shani, We have 2 scenario's that are hard to reproduce but does occur and the image size is the committed (allocated) and not the actual size. Scenario 1 - SD allocation size after successful VM template creation: Step 4(create a template from VM with QCOW disk) succeeds but committed (allocated) size of the SD increase by 2GB instead of 1GB . Scenario 2 - and more severe(which the logs of this bug suggest) : Step 4(create a template from VM with QCOW disk) FAILED (due to Bug 1588100) BUT AFTER THE TEMPLATE IS REMOVED STILL THE SD SIZE REMAINS THE SAME meaning the size of the SD increase by 2GB. About ~30% chance to hit the issue with successful VM template creation(AKA scenario 1) About ~10% chance to hit the issue with failure VM template creation (AKA scenario 2) If this is by design we should see the issue all of the time, right? The fact that you sum up all the storage_domains_image_sizes does not mean that the table is correct or reflecting the current images. Maybe we have a zombie/leftover image? We have 1G-> 2G difference between expected committed size in both scenarios so IMHO the questions should be: 1) Are the storage_domains_image_sizes in the DB that we use to sum the committed size are reflecting the current images or have a zombie/leftover image? THIS IS MY #1 SUSPECT! 2) Is the sum of the committed image sizes made correctly? This I assume is true as you checked it. 3) Is the round up made of the sum of committed image sizes in that SD correct?
> 1) Are the storage_domains_image_sizes in the DB that we use to sum the > committed size are reflecting the current images or have a zombie/leftover > image? THIS IS MY #1 SUSPECT! > 2) Is the sum of the committed image sizes made correctly? This I assume is > true as you checked it. > 3) Is the round up made of the sum of committed image sizes in that SD > correct? In order to verify that, and verify the calculation is being made as well, can you please supply DB dumps of both failed and successful scenarios? I want to make sure that the images (on images table) took into account while calculating the committed size are all part of the test and there're no leftovers/zombies. + can you please add the test case so I'll be able to follow the steps correctly? The round is down for 0-0.5 values and up for 0.5-1 values. I want to make sure there's no leftover disk (ovf for example) that isn't being deleted on time and affects the calculation.
(In reply to shani from comment #12) > > 1) Are the storage_domains_image_sizes in the DB that we use to sum the > > committed size are reflecting the current images or have a zombie/leftover > > image? THIS IS MY #1 SUSPECT! > > 2) Is the sum of the committed image sizes made correctly? This I assume is > > true as you checked it. > > 3) Is the round up made of the sum of committed image sizes in that SD > > correct? > > In order to verify that, and verify the calculation is being made as well, > can you please supply DB dumps of both failed and successful scenarios? This is hard to reproduce but I will try :) What is the best way I can provide you with DB dumps? > I want to make sure that the images (on images table) took into account > while calculating the committed size are all part of the test and there're > no leftovers/zombies. > + can you please add the test case so I'll be able to follow the steps > correctly? I added the exact test case scenario in the original description, Is there anything missing there? > The round is down for 0-0.5 values and up for 0.5-1 values. > I want to make sure there's no leftover disk (ovf for example) that isn't > being deleted on time and affects the calculation.
> This is hard to reproduce but I will try :) > What is the best way I can provide you with DB dumps? For a start, I would like to examine the data under images, image_storage_domain_map, and storage_domains_image_sizes. + please provide the relevant storage domain's id. I would like to calculate the images' size and by that determine is the reported committed size is correct.
See Bug 1147246 and commit https://gerrit.ovirt.org/#/c/33251 for the implementation of: "template images which are active but only their actual size should be taken into consideration since they are read only" (taken from Bug 1147246 description) This means any template disk created from VM should be calculated at the SD as 1GB as this is the block storage minimum actual size.
Created attachment 1544088 [details] Scenario1 4.3.2 logs - Wrong SD allocation size after successful VM template creation
Another reproduction on ovirt-engine 4.3.3.3-0.1.el7 (vdsm-4.30.13-1.el7ev.x86_64). ENV(storage-ge-08.scl.lab.tlv.redhat.com) IS FROZEN FOR YOUR INSPECTION -> PLEASE EXTRACT WHATEVER DATA YOU NEED AND PING ME WHEN DONE. user: admin pass: qum5net Reminding that the Issue is : Creating a template from VM with qcow disk increase the block storage domain allocated/committed size by the original VM's disk size (in my case 2GB) instead of the expected 1GB. Details requested: SD ID = ' 746f66f0-52a1-4d1b-bb98-5233e2b47a26' SD name = 'iscsi_2' Template disk id = 'a241adb0-58a2-46df-bb52-d2b1968fd1d2' Test steps via engine log EVENT_ID: -Start test : RestAPI get SD 'iscsi_2' committed(allocated) size: <name>iscsi_2</name> <committed>3221225472</committed> => 3G 1.Create VM1 , VM2 - same committed size as expected. 2019-04-21 12:52:50,508+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-63) [ab5fdce] EVENT_ID: USER_ADD_VM(34), VM vm1 was created by admin@internal-authz. 2019-04-21 12:52:58,747+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-68) [3a6f8838] EVENT_ID: USER_ADD_VM(34), VM vm2 was created by admin@internal-authz. RestAPI get SD 'iscsi_2' committed(allocated) size: <name>iscsi_2</name> <committed>3221225472</committed> => 3G 2.Add 2G qcow sparse disk to VM1 and activate it - committed size 5G( +2G) as expected. 2019-04-21 12:56:38,225+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (default task-63) [6ae413ae-e227-43f4-961f-13cd5e5cc577] START, CreateImageVDSCommand( CreateImageVDSCommandParameters:{storagePoolId='126908b3-4afb-4492-acdf-5dad110cae40', ignoreFailoverLimit='false', storageDomainId='746f66f0-52a1-4d1b-bb98-5233e2b47a26', imageGroupId='9de19d77-702c-4be9-b433-50a576c01c8e', imageSizeInBytes='2147483648', volumeFormat='COW', newImageId='2ba43287-d5ef-4db2-a587-dcd65fe82853', imageType='Sparse', newImageDescription='{"DiskAlias":"vm1_Disk1","DiskDescription":""}', imageInitialSizeInBytes='0'}), log id: 6cc07be0 2019-04-21 12:56:57,322+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [] EVENT_ID: USER_ADD_DISK_TO_VM_FINISHED_SUCCESS(97), The disk vm1_Disk1 was successfully added to VM vm1. RestAPI get SD 'iscsi_2' committed(allocated) size: <name>iscsi_2</name> <committed>5368709120</committed> => 5G 3.Add 2G raw prealloc disk to VM2 and activate it- committed size 7G( +2G) as expected. 2019-04-21 13:00:32,568+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateImageVDSCommand] (default task-63) [3989894d-bd84-43c2-a3f9-461ea66e0069] START, CreateImageVDSCommand( CreateImageVDSCommandParameters:{storagePoolId='126908b3-4afb-4492-acdf-5dad110cae40', ignoreFailoverLimit='false', storageDomainId='746f66f0-52a1-4d1b-bb98-5233e2b47a26', imageGroupId='080eae83-09f3-4b11-b853-5a7ed9a78fce', imageSizeInBytes='2147483648', volumeFormat='RAW', newImageId='1f15830d-29e8-4dfe-aaa2-67cb042f4c7a', imageType='Preallocated', newImageDescription='{"DiskAlias":"vm2_Disk1","DiskDescription":""}', imageInitialSizeInBytes='0'}), log id: 7332c8a2 2019-04-21 13:00:47,666+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-93) [] EVENT_ID: USER_ADD_DISK_TO_VM_FINISHED_SUCCESS(97), The disk vm2_Disk1 was successfully added to VM vm2. RestAPI get SD 'iscsi_2' committed(allocated) size: <name>iscsi_2</name> <committed>7516192768</committed> = > 7G 4. Create a template 'temp_vm1' from VM1 with same disks params (2G qcow sparse) 2019-04-21 13:04:28,448+03 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (default task-63) [35aaf835-9e0c-467e-80f6-53273cfaec9e] START, CopyImageVDSCommand( CopyImageVDSCommandParameters:{storagePoolId='126908b3-4afb-4492-acdf-5dad110cae40', ignoreFailoverLimit='false', storageDomainId='746f66f0-52a1-4d1b-bb98-5233e2b47a26', imageGroupId='9de19d77-702c-4be9-b433-50a576c01c8e', imageId='2ba43287-d5ef-4db2-a587-dcd65fe82853', dstImageGroupId='df854785-a5c5-494f-84f8-748cb733dd26', vmId='262a1d05-1410-4d19-a728-c5517b35d1bd', dstImageId='3466c63e-8c6e-4354-bf1c-280d051f653c', imageDescription='{"DiskAlias":"vm1_Disk1","DiskDescription":""}', dstStorageDomainId='746f66f0-52a1-4d1b-bb98-5233e2b47a26', copyVolumeType='SharedVol', volumeFormat='COW', preallocate='Sparse', postZero='false', discard='false', force='false'}), log id: 2d81ebfd Actual results: - committed size 9G( +2G) NOT as expected we expect 8G(+1G ) RestAPI get SD 'iscsi_2' committed(allocated) size: <name>iscsi_2</name> <committed>9663676416</committed> = > 9GB Expected results: Expected committed size: RestAPI get SD 'iscsi_2' committed(allocated) size THAT IS EXPECTED: 8G( +1G) as expected . <committed>8589934592</committed> => 8G
Created attachment 1556895 [details] 4.3.3.3_reproduction_logs_manual
Comment on attachment 1556895 [details] 4.3.3.3_reproduction_logs_manual Just to clarify This is Scenario 1 (successfull template creation)
Created attachment 1556896 [details] print screen of the issue - committed size of iscsi_2 SD is 9G