Description of problem: When trying to LSM a disk to a target SD that has even an extra 5G spare after the disk should move - the operation fails due to "Critical, Low disk space" Version-Release number of selected component (if applicable): ovirt-engine-4.5.2.1-0.1.el8ev vdsm-4.50.2.2-1.el8ev How reproducible: 100% Steps to Reproduce: 1. Need to have 2 SDs (ISCSI): for example: SD1 - has 55G disk and 5G free space SD2 - has 60G free space 2. Move the 55G to SD2 Actual results: "Error while executing action: Cannot move Virtual Disk. Low disk space on Storage Domain SD2." Expected results: After the disk is moved, SD2 should still have 5G free space left. Additional info: Reproduces also without a VM
all logs can be found in this bug 2102149 which is related to the new found issue.
Pavel, it is important to understand if this scenario ever worked. in bz 1980075 LSM failed because the free space on the source storage domain was smaller than the disk's virtual size and we decided not to handle that but to keep this as a requirement. So if that's the reason we fail here, it's unrelated to the initial size of the snapshot that recently changed, and there is no good reason to try to fix it now Can you please check if that's the case here?
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Benny will check if the same validation logic from bz 1980075 was applied to the destination storage domain
This is not related to https://bugzilla.redhat.com/show_bug.cgi?id=2102149#c11 Snapshots now consume extra space when copied (8gb by default since https://bugzilla.redhat.com/1958032), which would mean a 14gb disk would require ~22gb for the duration of LSM, so the problem in bug 2102149 was that the operation was allowed to start, and not the blocking validation
(In reply to Benny Zlotnik from comment #5) > This is not related to > https://bugzilla.redhat.com/show_bug.cgi?id=2102149#c11 > > Snapshots now consume extra space when copied (8gb by default since > https://bugzilla.redhat.com/1958032), > which would mean a 14gb disk would require ~22gb for the duration of LSM, so > the problem in bug 2102149 was that the operation was allowed to start, and > not the blocking validation Still the cleanup fix IMHO was also correct, since we can theoretically reach the "Volume group "<ID>" has insufficient free space..." error during "LiveDiskMigrateStage.CLONE_IMAGE_STRUCTURE" phase by operations running in parallel, something that is possible in real life. I.e., I tried to schedule with the help of debugger running LSM and adding disk after LSM starts and performs the SD available size validations...
This scenario works on rhv-4.4.10-8 Instead of failing this action there is just a warning there about low space on the target SD: "Warning, Low disk space. SD2 domain has 5 GB of free space." Versions: engine-4.4.10.7-0.4.el8ev vdsm-4.40.100.2-1.el8ev
(In reply to sshmulev from comment #7) > This scenario works on rhv-4.4.10-8 > > Instead of failing this action there is just a warning there about low space > on the target SD: > "Warning, Low disk space. SD2 domain has 5 GB of free space." yes, this seems to align with comment 5 - we now allocate more, so on the one hand it is less likely that we'll reach the limit during the migration but on the other hand, we allocate more so we require more free space on the destination
(In reply to Pavel Bar from comment #6) > Still the cleanup fix IMHO was also correct, since we can theoretically > reach the "Volume group "<ID>" has insufficient free space..." error during > "LiveDiskMigrateStage.CLONE_IMAGE_STRUCTURE" phase by operations running in > parallel, something that is possible in real life. not only that, the copy might fail for other reasons (e.g., when there's a connectivity issue with the source) - one can argue that most of the time if we fail to copy to the destination, we'll also fail to do the cleanup on there but we should at least try to cover cases we can do that..
(In reply to sshmulev from comment #7) > This scenario works on rhv-4.4.10-8 > > Instead of failing this action there is just a warning there about low space > on the target SD: > "Warning, Low disk space. SD2 domain has 5 GB of free space." > > Versions: > engine-4.4.10.7-0.4.el8ev > vdsm-4.40.100.2-1.el8ev The "Low disk space" warning above is a monitoring message that is triggered when a Storage Domain is above the 5 GiB (default) threshold of the critical space blocker. It blocks future operation, but doesn't fail the currently running LSM. The actual failure looks like: VDSM log: 2022-08-08 22:02:58,906+0300 ERROR (tasks/8) [storage.volume] Failed to create volume /rhev/data-center/mnt/blockSD/1721f247-94c9-43cb-907f-15ec2044d316/images/1b87472a-4c00-4a41-983d-ce2022927497/1852a039-7923-4ffd-affb-e0ee2e27c12d: Cannot create Logical Volume: 'cmd=[\'/sbin/lvm\', \'lvcreate\', \'--devices\', \'/dev/mapper/3600a098038314648593f517773636442,/dev/mapper/3600a098038314648593f517773636443,/dev/mapper/3600a098038314648593f517773636444,/dev/mapper/3600a098038314648593f517773636445\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 hints="none" obtain_device_list_from_udev=0 } global { prioritise_write_locks=1 wait_for_locks=1 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--autobackup\', \'n\', \'--contiguous\', \'n\', \'--size\', \'8448m\', \'--wipesignatures\', \'n\', \'--addtag\', \'OVIRT_VOL_INITIALIZING\', \'--name\', \'1852a039-7923-4ffd-affb-e0ee2e27c12d\', \'1721f247-94c9-43cb-907f-15ec2044d316\'] rc=5 out=[] err=[\' Volume group "1721f247-94c9-43cb-907f-15ec2044d316" has insufficient free space (31 extents): 66 required.\']' (volume:1296) Engine log: 2022-08-08 22:03:08,020+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-30) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM Host1 command HSMGetAllTasksStatusesVDS failed: value=Cannot create Logical Volume: 'cmd=[\'/sbin/lvm\', \'lvcreate\', \'--devices\', \'/dev/mapper/3600a098038314648593f517773636442,/dev/mapper/3600a098038314648593f517773636443,/dev/mapper/3600a098038314648593f517773636444,/dev/mapper/3600a098038314648593f517773636445\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 hints="none" obtain_device_list_from_udev=0 } global { prioritise_write_locks=1 wait_for_locks=1 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--autobackup\', \'n\', \'--contiguous\', \'n\', \'--size\', \'8448m\', \'--wipesignatures\', \'n\', \'--addtag\', \'OVIRT_VOL_INITIALIZING\', \'--name\', \'1852a039-7923-4ffd-affb-e0ee2e27c12d\', \'1721f247-94c9-43cb-907f-15ec2044d316\'] rc=5 out=[] err=[\' Volume group "1721f247-94c9-43cb-907f-15ec2044d316" has insufficient free space (31 extents): 66 required.\']' abortedcode=550 The solution that we discussed yesterday with Benny is to fail the command during validation phase (require more space from the beginning). Though it might be a little confusing for the user to see (for example) 17 GiB available and still receive the "insufficient free space" error when trying to LSM the 14 GiB disk.
(In reply to Pavel Bar from comment #10) > The solution that we discussed yesterday with Benny is to fail the command > during validation phase (require more space from the beginning). > Though it might be a little confusing for the user to see (for example) 17 > GiB available and still receive the "insufficient free space" error when > trying to LSM the 14 GiB disk. right, but it's the same case as getting (qcow2) disks with actual size > virtual size, let's say without understanding the low level details some things can look odd but as long as we can explain the reason, it's fine
QE instructions: The solution that was implemented is to fail the command during validation phase. That means to require more space from the beginning and not to start LSM operation that will eventually fail, run cleanup, etc. Additional 7.5 GiB are required. For example, in order LSM operation to start, the the 14 GiB disk requires 14+8==22 GiB free space on the destination Storage Domain. If not available, the operation will result in "Cannot move Virtual Disk. Low disk space on Storage Domain ${storageName}."
Verified on ovirt-engine-4.5.3.1 Raw required space: disk size + min(disk size, 7.5 GiB) Cow required space: disk size + min(disk size, 7.5 GiB) + qcow-overhead-10% If there is not enough space - a related msg pops up and blocks the operation. Moving to 'Verified'.