1497355 – Live Storage Migration continued on after snapshot creation hung and timed out

Bug 1497355 - Live Storage Migration continued on after snapshot creation hung and timed out

Summary: Live Storage Migration continued on after snapshot creation hung and timed out

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	4.1.6
Hardware:	Unspecified
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	ovirt-4.3.0
Target Release:	4.3.0
Assignee:	Benny Zlotnik
QA Contact:	Yosi Ben Shimon
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1585039
TreeView+	depends on / blocked

Reported:	2017-09-29 21:53 UTC by Gordon Watson
Modified:	2022-03-13 14:27 UTC (History)
CC List:	12 users (show)
Fixed In Version:	ovirt-engine-4.3.0_alpha
Doc Type:	If docs needed, set a value
Doc Text:	This release ensures the live storage migration process completes properly after creating a snapshot.
Clone Of:
Clones:	1585039 (view as bug list)
Environment:
Last Closed:	2019-05-08 12:36:48 UTC
oVirt Team:	Storage
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3194802	None	None	None	2017-09-29 22:37:37 UTC
Red Hat Product Errata	RHEA-2019:1085	None	None	None	2019-05-08 12:37:11 UTC
oVirt gerrit	83836	'None'	ABANDONED	WIP core: introduce CreateAllSnapshotsFromVmCommandCallback	2020-11-18 09:49:55 UTC
oVirt gerrit	87671	'None'	MERGED	core: introduce CreateSnapshotForVm	2020-11-18 09:49:55 UTC
oVirt gerrit	87805	'None'	MERGED	core: when snapshot creation fails, do not cleanup target	2020-11-18 09:49:56 UTC
oVirt gerrit	89309	'None'	MERGED	core: when snapshot creation fails, do not cleanup target	2020-11-18 09:50:18 UTC
oVirt gerrit	89773	'None'	MERGED	core: fix error handling in CreateSnapshotForVmCommand	2020-11-18 09:50:18 UTC
oVirt gerrit	90670	'None'	MERGED	core: introduce CreateSnapshotForVm	2020-11-18 09:49:56 UTC
oVirt gerrit	90835	'None'	MERGED	core: fix error handling in CreateSnapshotForVmCommand	2020-11-18 09:49:57 UTC
oVirt gerrit	91359	'None'	MERGED	core: remove image from storage after failed snapshot	2020-11-18 09:49:57 UTC
oVirt gerrit	91658	'None'	MERGED	core: remove image from storage after failed snapshot	2020-11-18 09:49:58 UTC

Description Gordon Watson 2017-09-29 21:53:56 UTC

Description of problem:

The snapshot creation of a Live Storage Migration hung and timed out on the engine side. However, the engine then continued on with CloneImageGroupStructureVDSCommand and VmReplicateDiskStartVDSCommand, etc.

The result was that the LSM effectively failed, with the disk still residing in the source storage domain. However, volumes were created in the target storage domain, which caused a subsequent LSM to fail.


Version-Release number of selected component (if applicable):

RHV 4.1.6
RHVH 4.1.6
  vdsm-4.19.31-1.el7ev
  

How reproducible:

Not.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 Allon Mureinik 2017-10-01 09:28:45 UTC

Benny, can you take a look please?

Tentitively targetting for 4.2.
If there's something safe enough to backport to 4.1.z, we should do that, but I'm not commiting on such a fix unless we see what the upstream fix contains.

Comment 12 Elad 2018-08-21 12:34:48 UTC

Verify according to https://bugzilla.redhat.com/show_bug.cgi?id=1585039#c15

Comment 13 Yosi Ben Shimon 2018-10-16 14:57:08 UTC

Verified using:
ovirt-engine-4.3.0-0.0.master.20181012165724.gitd25f971.el7.noarch
vdsm-4.30.0-640.git6fd8327.el7.x86_64

I blocked the connection between the host (host_mixed_3) to the destination storage domain (iscsi_2).

*** Engine log:


2018-10-16 15:07:46,561+03 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(default task-104) [14760f4a-6ff7-4c64-b713-0655f017afd0] EVENT_ID: USER_CREATE_SNAPSHOT(45), Snapshot '
yosi_test_Disk1 Auto-generated for Live Storage Migration' creation for VM 'yosi_test' was initiated by 
admin@internal-authz.

*** vdsm log:


2018-10-16 15:08:00,341+0300 INFO  (jsonrpc/3) [api.virt] START snapshot(snapDrives=[{u'baseVolumeID': u'6b5e067d-2520-4fcf-8e48-b42bec82c8ed', u'domainID': u'ca51be15-f214-4955-b9db-c7772c900104', u'volumeID': u'15149008-1995-42fc-b809-c0044f6f43aa', u'imageID': u'cd9af767-fe3e-4e91-b23d-b049ade7df23'}], snapMemory=None, frozen=True) from=::ffff:10.35.162.7,49886, flow_id=14760f4a-6ff7-4c64-b713-0655f017afd0, vmId=be44bd67-a80b-4f4c-8c58-4bc7554b2005 (api:48)

2018-10-16 15:08:35,470+0300 INFO  (jsonrpc/3) [api.virt] FINISH snapshot return={'status': {'message': '
Snapshot failed', 'code': 48}} from=::ffff:10.35.162.7,49886, flow_id=14760f4a-6ff7-4c64-b713-0655f017afd
0, vmId=be44bd67-a80b-4f4c-8c58-4bc7554b2005 (api:54)
2018-10-16 15:08:35,471+0300 INFO  (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.snapshot failed (error
 48) in 35.13 seconds (__init__:312)


*** Engine log (end command):


2018-10-16 15:08:36,625+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [14760f4a-6ff7-4c64-b713-0655f017afd0] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand' with failure.


Verified upstream.

Comment 15 errata-xmlrpc 2019-05-08 12:36:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085

Comment 16 Daniel Gur 2019-08-28 13:13:06 UTC

sync2jira

Comment 17 Daniel Gur 2019-08-28 13:17:18 UTC

sync2jira

Note You need to log in before you can comment on or make changes to this bug.