Description of problem: The issue was found in an automation test: first checking negative flow when we expect the LSM to fail because the target SD doesn't have enough space. After that try, we extend the LUN so there will be enough space for the disk to migrate to the target SD. The disk fails to migrate also after this try, although there is enough space in the target SD. 2022-06-29 11:47:22,424+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command HSMGetAllTasksStatusesVDS failed: value=Cannot create Logical Volume: 'cmd=[\'/sbin/lvm\', \'lvcreate\', \'--devices\', \'/dev/mapper/3600a09803830447a4f244c4657612f68,/dev/mapper/3600a09803830447a4f244c4657612f69,/dev/mapper/3600a09803830447a4f244c4657612f6a,/dev/mapper/3600a09803830447a4f244c4657612f6b,/dev/mapper/3600a09803830447a4f244c4657612f6c,/dev/mapper/3600a09803830447a4f244c4657612f6d,/dev/mapper/3600a09803830447a4f244c4657623647\', \'--config\', \'devices { preferred_names=["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 hints="none" obtain_device_list_from_udev=0 } global { prioritise_write_locks=1 wait_for_locks=1 use_lvmpolld=1 } backup { retain_min=50 retain_days=0 }\', \'--autobackup\', \'n\', \'--contiguous\', \'n\', \'--size\', \'8448m\', \'--wipesignatures\', \'n\', \'--addtag\', \'OVIRT_VOL_INITIALIZING\', \'--name\', \'0caead62-1995-409f-8086-5cf343d3a528\', \'090d90bf-e690-46a1-87d6-68ebe50179c5\'] rc=5 out=[] err=[\' Volume group "090d90bf-e690-46a1-87d6-68ebe50179c5" has insufficient free space (53 extents): 66 required.\']' abortedcode=550 2022-06-29 11:47:22,434+03 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-50) [] BaseAsyncTask::logEndTaskFailure: Task 'fac97aa3-abc9-49bb-b752-5031e90c3c3a' (Parent Command 'CreateVolumeContainer', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure: 2022-06-29 11:47:22,445+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand] (EE-ManagedThreadFactory-engine-Thread-92712) [disks_syncAction_69ee0cd6-4776-458a] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand' with failure. 2022-06-29 11:47:29,618+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [disks_syncAction_69ee0cd6-4776-458a] Ending command 'org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand' with failure. 2022-06-29 11:47:30,699+03 ERROR [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [disks_syncAction_69ee0cd6-4776-458a] Ending command 'org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand' with failure. 2022-06-29 11:47:30,915+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-36) [] EVENT_ID: USER_MOVED_DISK_FINISHED_FAILURE(2,011), User admin@internal-authz has failed to move disk disk_TestCase10144_2911461916 to domain sd_TestCase10144_2911405538. Version-Release number of selected component (if applicable): ovirt-engine-4.5.1.2-0.11.el8ev.noarch vdsm-4.50.1.4-1.el8ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create ISCSI SD with 12G free space size 2. Create a VM from template and attach to it a new preallocated disk of 14G and run it - on some other ISCSI SD 3. LSM of the newly created disk of the VM to the new SD target (12G size) - Action should fail as expected cause there is no space in that SD for 14G. 4. Extend that SD so it will have 20G free space size 5. LSM the same disk(14G) again to the extended ISCSI SD. Actual results: LSM fails due to insufficient free space on the target SD. The target SD is left with a 6G free space size, although the disk failed to migrate to it. Expected results: LSM of the 14G disk should succeed after extending the target SD
This issue was not seen in RHV-4.4, adding regression keyword
(In reply to sshmulev from comment #2) > This issue was not seen in RHV-4.4, adding regression keyword Does it reproduce if you change the test to have let's say not 20G free but 23G free at step 4?
(In reply to Arik from comment #3) > (In reply to sshmulev from comment #2) > > This issue was not seen in RHV-4.4, adding regression keyword > > Does it reproduce if you change the test to have let's say not 20G free but > 23G free at step 4? ok, I wrote that because the primary suspect is the allocation of 3 chunks on the destination now but that's probably not the right question to ask - need to take a better look at the logs leaving this to Pavel
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Can not mark blocker "-" -> Not a blocker bug .
The fix is likely removing this check[1], at this point the disk has already been created on the target [1] https://github.com/oVirt/ovirt-engine/blob/03cab35b639f5d2b1a4722ccd31eb46c85dbc798/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/storage/lsm/LiveMigrateDiskCommand.java#L767
The root cause is that the disk is not removed from the target after failure
QE testing instructions: Option 1: Use the automatic script that you already have - which performs additional things, most of which are not important. Option 2: 1) Create a disk on iSCSI Storage Domain #1 with size X GiB. 2) Have an iSCSI Storage Domain #2 with available space X+3 GiB. 3) LSM the above disk from SD1 to SD2. Result: Expected: The operation should have succeeded (SD2 has 3 GiB more space than required for the operation). In practice: Operation fails (that's another bug #2116309) and besides no cleanup is performed, SD2 is shown with 3 GiB available space, instead of X+3 GiB as one would expect since LSM has failed... Note: this bug handles the cleanup part only.
Verified. Cleanup is working after the migration trial, the SD remains as it was when migration failed. Versions: ovirt-engine-4.5.2.2-0.1.el8ev vdsm-4.50.2.2-1.el8ev
This bugzilla is included in oVirt 4.5.2 release, published on August 10th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.