Created attachment 1899034 [details] engine.log Description of problem: While verifying bug 2107985, ovirt-engine service restarts successfully but LSM gets stuck and disk remains locked. Version-Release number of selected component (if applicable): ovirt-engine-4.5.2-0.3.el8ev How reproducible: Always Steps to Reproduce: Restart ovirt-engine service during LSM operation Actual results: LSM gets stuck and disk remains locked Expected results: LSM should finish successfully and disk should not be locked Additional info: Attaching engine.log
It looks like the root cause is the lock is reacquired after restart by LiveMigrateDiskCommand, this can probably be resolved by either overriding reacquireLocks and not locking again if snapshot create has already started. Or by removing the command locks entirely since their acquisition is handled in MoveDiskCommand
we suspect this is not a new issue but the timing of restarting the engine during create-snapshot, which is a fairly quick operation compared to copying the disk, made us miss this before
I also suggest to update the log message. The current "Failed to acquire VM lock, will retry on the next polling cycle" is a little confusing. For example in this case the actual failure is a disk acquire (exclusive lock), not a VM (shared lock). I suggest to log the exact failure as can be retrieved from the "LockingResult" received from the "acquireLock()" call in the "LiveDiskMigrateStage.LIVE_MIGRATE_DISK_EXEC_COMPLETED" phase. Just a thought :)
This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000).
(In reply to Benny Zlotnik from comment #1) > It looks like the root cause is the lock is reacquired after restart by > LiveMigrateDiskCommand, this can probably be resolved by either overriding > reacquireLocks and not locking again if snapshot create has already started. > Or by removing the command locks entirely since their acquisition is handled > in MoveDiskCommand right, we chose to go with the latter
This bug has low overall severity and passed an automated regression suite, and is not going to be further verified by QE. If you believe special care is required, feel free to re-open to ON_QA status.