Created attachment 1332363 [details] engine , vdsm log Description of problem: Started Live storage migration & it reaches the point where it should delete Snapshot 'Auto-generated for Live Storage Migration' but it is stuck , snapshot remains in 'locked' state & migration is not complete. I see that it got stuck in the MergeStatusCommand part & an error on DestroyImageCheckCommand, see engine log below. Version-Release number of selected component (if applicable): Engine: ovirt-engine-4.2.0-0.0.master.20170927183005.git49790b2.el7.centos.noarch How reproducible: 2/2 so far. Steps to Reproduce: 1.Create VM + disk + snapshot 2.Start VM 3.Move disk to another SD (i moved between 2 ISCSI SD's) Actual results: Live SD start, copy the disk but when it reaches the point where it should delete Snapshot 'Auto-generated for Live Storage Migration' but it is stuck , snapshot remains in 'locked' state & migration is not complete. Expected results: Additional info: Stuck stage was initiated Sep 29, 2017, 1:43:02 PM . I see that it got stuck in the MergeStatusCommand part & an error on DestroyImageCheckCommand, see engine log below . Event: Snapshot 'Auto-generated for Live Storage Migration' deletion for VM 'vm1' was initiated by admin@internal-authz.) Engine: 2017-09-29 13:43:05,799+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [4942909d-ef50-44db-8180-9255b774c4c1] Executing Live Merge command step 'MERGE' 2017-09-29 13:43:21,296+03 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [4942909d-ef50-44db-8180-9255b774c4c1] Merge command (jobId = 32482192-3a69-4f5b-88ef-c6837a92cdd3) has completed for images '0397f760-c42c-4106-b9ab-f4323653d621'..'ec708a28-555d-40ff-a386-aae613363a5a' 2017-09-29 13:43:23,376+03 INFO [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [4942909d-ef50-44db-8180-9255b774c4c1] Running command: MergeStatusCommand internal: true. Entities affected : ID: 05058bf9-8e89-488d-8a7d-a35ed423d585 Type: Storage 2017-09-29 13:43:27,143+03 ERROR [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-6) [4942909d-ef50-44db-8180-9255b774c4c1] The following images were not re moved: [ec708a28-555d-40ff-a386-aae613363a5a] 2017-09-29 13:43:27,510+03 INFO [org.ovirt.engin 2017-09-29 14:14:34,857+03 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [4942909d-ef50-44db-8180-9255b774c4c1] Command 'RemoveSnapshot' (id: '8147212c-9e9d-4251-a841-f8b32d41aa8f') waiting on child command id: '63517efb-bcdc-40ee-9dab-73d8d81476bb' type:'RemoveSnapshotSingleDiskLive' to complete
Regression as ~2-3 weeks ago same test ran without issues. Engine build that the same test passed : ovirt-engine-4.2.0-0.0.master.20170907100709.git14accac.el7.centos.noarch Link to the run that passed: https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-storage/535
Ala, Benny - in the last couple of weeks we've merged changes both to live merge itself and to LSM that calls live merge where it ends. I'm assigning this to Ala as offhand it seems like a Live Merge bug, but please both be aware of it. Once this is solved on master, we need to see if the patch that caused the regression was backported to the 4.1.z branch, and if so, backport the fix(es) too.
Avihai, Can you please retry this on a latest build?
(In reply to Ala Hino from comment #3) > Avihai, > > Can you please retry this on a latest build? Retested on latest engine & test passed . Engine: ovirt-engine-4.2.0-0.0.master.20171025204923.git6f4cbc5.el7.centos.noarch VDSM: 4.20.3-224.gitef2ce48
As note in the previous comment by Avihai, the test passes on a lates tbuild. This bug probably resolved as a result of https://gerrit.ovirt.org/#/c/82528/. Moving to ON_QA. Avihai, feel free to re-test or just move it to VERIFIED.
verified on : Engine: ovirt-engine-4.2.0-0.0.master.20171025204923.git6f4cbc5.el7.centos.noarch VDSM: 4.20.3-224.gitef2ce48
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.