Created attachment 1332363 [details]
engine , vdsm log
Description of problem:
Started Live storage migration & it reaches the point where it should delete Snapshot 'Auto-generated for Live Storage Migration' but it is stuck , snapshot remains in 'locked' state & migration is not complete.
I see that it got stuck in the MergeStatusCommand part & an error on DestroyImageCheckCommand, see engine log below.
Version-Release number of selected component (if applicable):
2/2 so far.
Steps to Reproduce:
1.Create VM + disk + snapshot
3.Move disk to another SD (i moved between 2 ISCSI SD's)
Live SD start, copy the disk but when it reaches the point where it should delete Snapshot 'Auto-generated for Live Storage Migration' but it is stuck , snapshot remains in 'locked' state & migration is not complete.
Stuck stage was initiated Sep 29, 2017, 1:43:02 PM .
I see that it got stuck in the MergeStatusCommand part & an error on DestroyImageCheckCommand, see engine log below .
Snapshot 'Auto-generated for Live Storage Migration' deletion for VM 'vm1' was initiated by admin@internal-authz.)
2017-09-29 13:43:05,799+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [4942909d-ef50-44db-8180-9255b774c4c1] Executing Live Merge command step 'MERGE'
2017-09-29 13:43:21,296+03 INFO [org.ovirt.engine.core.bll.MergeCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [4942909d-ef50-44db-8180-9255b774c4c1] Merge command (jobId = 32482192-3a69-4f5b-88ef-c6837a92cdd3) has completed for images '0397f760-c42c-4106-b9ab-f4323653d621'..'ec708a28-555d-40ff-a386-aae613363a5a'
2017-09-29 13:43:23,376+03 INFO [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [4942909d-ef50-44db-8180-9255b774c4c1] Running command: MergeStatusCommand internal: true. Entities affected : ID: 05058bf9-8e89-488d-8a7d-a35ed423d585 Type: Storage
2017-09-29 13:43:27,143+03 ERROR [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-6) [4942909d-ef50-44db-8180-9255b774c4c1] The following images were not re
2017-09-29 13:43:27,510+03 INFO [org.ovirt.engin
2017-09-29 14:14:34,857+03 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [4942909d-ef50-44db-8180-9255b774c4c1] Command 'RemoveSnapshot' (id: '8147212c-9e9d-4251-a841-f8b32d41aa8f') waiting on child command id: '63517efb-bcdc-40ee-9dab-73d8d81476bb' type:'RemoveSnapshotSingleDiskLive' to complete
Regression as ~2-3 weeks ago same test ran without issues.
Engine build that the same test passed :
Link to the run that passed:
Ala, Benny - in the last couple of weeks we've merged changes both to live merge itself and to LSM that calls live merge where it ends.
I'm assigning this to Ala as offhand it seems like a Live Merge bug, but please both be aware of it.
Once this is solved on master, we need to see if the patch that caused the regression was backported to the 4.1.z branch, and if so, backport the fix(es) too.
Can you please retry this on a latest build?
(In reply to Ala Hino from comment #3)
> Can you please retry this on a latest build?
Retested on latest engine & test passed .
As note in the previous comment by Avihai, the test passes on a lates tbuild.
This bug probably resolved as a result of https://gerrit.ovirt.org/#/c/82528/.
Moving to ON_QA.
Avihai, feel free to re-test or just move it to VERIFIED.
verified on :
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.
Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.
If the solution does not work for you, please open a new bug report.