Bug 1497170 - Live storage migration does not complete - Snapshot 'Auto-generated for Live Storage Migration' stuck in the MergeStatusCommand
Summary: Live storage migration does not complete - Snapshot 'Auto-generated for Live ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high vote
Target Milestone: ovirt-4.2.0
: ---
Assignee: Ala Hino
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks: 1497117
TreeView+ depends on / blocked
 
Reported: 2017-09-29 11:23 UTC by Avihai
Modified: 2017-12-20 11:24 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-20 11:24:10 UTC
oVirt Team: Storage
rule-engine: ovirt-4.2+
rule-engine: blocker+


Attachments (Terms of Use)
engine , vdsm log (1.40 MB, application/x-gzip)
2017-09-29 11:23 UTC, Avihai
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 82528 0 None MERGED core: Retry failed live merge commands 2020-09-04 19:58:51 UTC

Description Avihai 2017-09-29 11:23:04 UTC
Created attachment 1332363 [details]
engine , vdsm  log

Description of problem:
Started Live storage migration & it reaches the point where it should delete Snapshot 'Auto-generated for Live Storage Migration' but it is stuck , snapshot remains in 'locked' state & migration is not complete.


I see that it got stuck in the MergeStatusCommand part & an error on DestroyImageCheckCommand, see engine log below.

Version-Release number of selected component (if applicable):
Engine:
ovirt-engine-4.2.0-0.0.master.20170927183005.git49790b2.el7.centos.noarch

How reproducible:
2/2 so far.

Steps to Reproduce:
1.Create VM + disk + snapshot
2.Start VM 
3.Move disk to another SD (i moved between 2 ISCSI SD's) 

Actual results:
Live SD start, copy the disk but when it reaches the point where it should delete Snapshot 'Auto-generated for Live Storage Migration' but it is stuck , snapshot remains in 'locked' state & migration is not complete.


Expected results:


Additional info:
Stuck stage was initiated Sep 29, 2017, 1:43:02 PM .

I see that it got stuck in the MergeStatusCommand part & an error on DestroyImageCheckCommand, see engine log below .


Event:
Snapshot 'Auto-generated for Live Storage Migration' deletion for VM 'vm1' was initiated by admin@internal-authz.) 

Engine:
2017-09-29 13:43:05,799+03 INFO  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-63) [4942909d-ef50-44db-8180-9255b774c4c1] Executing Live Merge command step 'MERGE'
2017-09-29 13:43:21,296+03 INFO  [org.ovirt.engine.core.bll.MergeCommandCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-64) [4942909d-ef50-44db-8180-9255b774c4c1] Merge command (jobId = 32482192-3a69-4f5b-88ef-c6837a92cdd3) has completed for images '0397f760-c42c-4106-b9ab-f4323653d621'..'ec708a28-555d-40ff-a386-aae613363a5a'

2017-09-29 13:43:23,376+03 INFO  [org.ovirt.engine.core.bll.MergeStatusCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-4) [4942909d-ef50-44db-8180-9255b774c4c1] Running command: MergeStatusCommand internal: true. Entities affected :  ID: 05058bf9-8e89-488d-8a7d-a35ed423d585 Type: Storage



2017-09-29 13:43:27,143+03 ERROR [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-6) [4942909d-ef50-44db-8180-9255b774c4c1] The following images were not re
moved: [ec708a28-555d-40ff-a386-aae613363a5a]
2017-09-29 13:43:27,510+03 INFO  [org.ovirt.engin

2017-09-29 14:14:34,857+03 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-56) [4942909d-ef50-44db-8180-9255b774c4c1] Command 'RemoveSnapshot' (id: '8147212c-9e9d-4251-a841-f8b32d41aa8f') waiting on child command id: '63517efb-bcdc-40ee-9dab-73d8d81476bb' type:'RemoveSnapshotSingleDiskLive' to complete

Comment 1 Avihai 2017-09-30 10:42:41 UTC
Regression as ~2-3 weeks ago same test ran without issues.

Engine build that the same test passed :
ovirt-engine-4.2.0-0.0.master.20170907100709.git14accac.el7.centos.noarch

Link to the run that passed:
https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-runner-storage/535

Comment 2 Allon Mureinik 2017-10-01 09:21:28 UTC
Ala, Benny - in the last couple of weeks we've merged changes both to live merge itself and to LSM that calls live merge where it ends.

I'm assigning this to Ala as offhand it seems like a Live Merge bug, but please both be aware of it.
Once this is solved on master, we need to see if the patch that caused the regression was backported to the 4.1.z branch, and if so, backport the fix(es) too.

Comment 3 Ala Hino 2017-10-25 21:38:54 UTC
Avihai,

Can you please retry this on a latest build?

Comment 4 Avihai 2017-10-26 09:07:14 UTC
(In reply to Ala Hino from comment #3)
> Avihai,
> 
> Can you please retry this on a latest build?

Retested on latest engine & test passed .

Engine:
ovirt-engine-4.2.0-0.0.master.20171025204923.git6f4cbc5.el7.centos.noarch 

VDSM:
4.20.3-224.gitef2ce48

Comment 5 Ala Hino 2017-10-26 09:15:02 UTC
As note in the previous comment by Avihai, the test passes on a lates tbuild.
This bug probably resolved as a result of https://gerrit.ovirt.org/#/c/82528/.

Moving to ON_QA.
Avihai, feel free to re-test or just move it to VERIFIED.

Comment 6 Avihai 2017-10-26 09:25:34 UTC
verified on :

Engine:
ovirt-engine-4.2.0-0.0.master.20171025204923.git6f4cbc5.el7.centos.noarch 

VDSM:
4.20.3-224.gitef2ce48

Comment 7 Sandro Bonazzola 2017-12-20 11:24:10 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.