Bug 1497355 - Live Storage Migration continued on after snapshot creation hung and timed out
Summary: Live Storage Migration continued on after snapshot creation hung and timed out
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.1.6
Hardware: Unspecified
OS: Linux
medium
high
Target Milestone: ovirt-4.3.0
: 4.3.0
Assignee: Benny Zlotnik
QA Contact: Yosi Ben Shimon
URL:
Whiteboard:
Depends On:
Blocks: 1585039
TreeView+ depends on / blocked
 
Reported: 2017-09-29 21:53 UTC by Gordon Watson
Modified: 2022-03-13 14:27 UTC (History)
12 users (show)

Fixed In Version: ovirt-engine-4.3.0_alpha
Doc Type: If docs needed, set a value
Doc Text:
This release ensures the live storage migration process completes properly after creating a snapshot.
Clone Of:
: 1585039 (view as bug list)
Environment:
Last Closed: 2019-05-08 12:36:48 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3194802 0 None None None 2017-09-29 22:37:37 UTC
Red Hat Product Errata RHEA-2019:1085 0 None None None 2019-05-08 12:37:11 UTC
oVirt gerrit 83836 0 'None' ABANDONED WIP core: introduce CreateAllSnapshotsFromVmCommandCallback 2020-11-18 09:49:55 UTC
oVirt gerrit 87671 0 'None' MERGED core: introduce CreateSnapshotForVm 2020-11-18 09:49:55 UTC
oVirt gerrit 87805 0 'None' MERGED core: when snapshot creation fails, do not cleanup target 2020-11-18 09:49:56 UTC
oVirt gerrit 89309 0 'None' MERGED core: when snapshot creation fails, do not cleanup target 2020-11-18 09:50:18 UTC
oVirt gerrit 89773 0 'None' MERGED core: fix error handling in CreateSnapshotForVmCommand 2020-11-18 09:50:18 UTC
oVirt gerrit 90670 0 'None' MERGED core: introduce CreateSnapshotForVm 2020-11-18 09:49:56 UTC
oVirt gerrit 90835 0 'None' MERGED core: fix error handling in CreateSnapshotForVmCommand 2020-11-18 09:49:57 UTC
oVirt gerrit 91359 0 'None' MERGED core: remove image from storage after failed snapshot 2020-11-18 09:49:57 UTC
oVirt gerrit 91658 0 'None' MERGED core: remove image from storage after failed snapshot 2020-11-18 09:49:58 UTC

Description Gordon Watson 2017-09-29 21:53:56 UTC
Description of problem:

The snapshot creation of a Live Storage Migration hung and timed out on the engine side. However, the engine then continued on with CloneImageGroupStructureVDSCommand and VmReplicateDiskStartVDSCommand, etc.

The result was that the LSM effectively failed, with the disk still residing in the source storage domain. However, volumes were created in the target storage domain, which caused a subsequent LSM to fail.


Version-Release number of selected component (if applicable):

RHV 4.1.6
RHVH 4.1.6
  vdsm-4.19.31-1.el7ev
  

How reproducible:

Not.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Allon Mureinik 2017-10-01 09:28:45 UTC
Benny, can you take a look please?

Tentitively targetting for 4.2.
If there's something safe enough to backport to 4.1.z, we should do that, but I'm not commiting on such a fix unless we see what the upstream fix contains.

Comment 12 Elad 2018-08-21 12:34:48 UTC
Verify according to https://bugzilla.redhat.com/show_bug.cgi?id=1585039#c15

Comment 13 Yosi Ben Shimon 2018-10-16 14:57:08 UTC
Verified using:
ovirt-engine-4.3.0-0.0.master.20181012165724.gitd25f971.el7.noarch
vdsm-4.30.0-640.git6fd8327.el7.x86_64

I blocked the connection between the host (host_mixed_3) to the destination storage domain (iscsi_2).

*** Engine log:


2018-10-16 15:07:46,561+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(default task-104) [14760f4a-6ff7-4c64-b713-0655f017afd0] EVENT_ID: USER_CREATE_SNAPSHOT(45), Snapshot '
yosi_test_Disk1 Auto-generated for Live Storage Migration' creation for VM 'yosi_test' was initiated by 
admin@internal-authz.

*** vdsm log:


2018-10-16 15:08:00,341+0300 INFO  (jsonrpc/3) [api.virt] START snapshot(snapDrives=[{u'baseVolumeID': u'6b5e067d-2520-4fcf-8e48-b42bec82c8ed', u'domainID': u'ca51be15-f214-4955-b9db-c7772c900104', u'volumeID': u'15149008-1995-42fc-b809-c0044f6f43aa', u'imageID': u'cd9af767-fe3e-4e91-b23d-b049ade7df23'}], snapMemory=None, frozen=True) from=::ffff:10.35.162.7,49886, flow_id=14760f4a-6ff7-4c64-b713-0655f017afd0, vmId=be44bd67-a80b-4f4c-8c58-4bc7554b2005 (api:48)

2018-10-16 15:08:35,470+0300 INFO  (jsonrpc/3) [api.virt] FINISH snapshot return={'status': {'message': '
Snapshot failed', 'code': 48}} from=::ffff:10.35.162.7,49886, flow_id=14760f4a-6ff7-4c64-b713-0655f017afd
0, vmId=be44bd67-a80b-4f4c-8c58-4bc7554b2005 (api:54)
2018-10-16 15:08:35,471+0300 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.snapshot failed (error
 48) in 35.13 seconds (__init__:312)


*** Engine log (end command):


2018-10-16 15:08:36,625+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [14760f4a-6ff7-4c64-b713-0655f017afd0] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand' with failure.


Verified upstream.

Comment 15 errata-xmlrpc 2019-05-08 12:36:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085

Comment 16 Daniel Gur 2019-08-28 13:13:06 UTC
sync2jira

Comment 17 Daniel Gur 2019-08-28 13:17:18 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.