Description of problem: The snapshot creation of a Live Storage Migration hung and timed out on the engine side. However, the engine then continued on with CloneImageGroupStructureVDSCommand and VmReplicateDiskStartVDSCommand, etc. The result was that the LSM effectively failed, with the disk still residing in the source storage domain. However, volumes were created in the target storage domain, which caused a subsequent LSM to fail. Version-Release number of selected component (if applicable): RHV 4.1.6 RHVH 4.1.6 vdsm-4.19.31-1.el7ev How reproducible: Not. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Benny, can you take a look please? Tentitively targetting for 4.2. If there's something safe enough to backport to 4.1.z, we should do that, but I'm not commiting on such a fix unless we see what the upstream fix contains.
Verify according to https://bugzilla.redhat.com/show_bug.cgi?id=1585039#c15
Verified using: ovirt-engine-4.3.0-0.0.master.20181012165724.gitd25f971.el7.noarch vdsm-4.30.0-640.git6fd8327.el7.x86_64 I blocked the connection between the host (host_mixed_3) to the destination storage domain (iscsi_2). *** Engine log: 2018-10-16 15:07:46,561+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-104) [14760f4a-6ff7-4c64-b713-0655f017afd0] EVENT_ID: USER_CREATE_SNAPSHOT(45), Snapshot ' yosi_test_Disk1 Auto-generated for Live Storage Migration' creation for VM 'yosi_test' was initiated by admin@internal-authz. *** vdsm log: 2018-10-16 15:08:00,341+0300 INFO (jsonrpc/3) [api.virt] START snapshot(snapDrives=[{u'baseVolumeID': u'6b5e067d-2520-4fcf-8e48-b42bec82c8ed', u'domainID': u'ca51be15-f214-4955-b9db-c7772c900104', u'volumeID': u'15149008-1995-42fc-b809-c0044f6f43aa', u'imageID': u'cd9af767-fe3e-4e91-b23d-b049ade7df23'}], snapMemory=None, frozen=True) from=::ffff:10.35.162.7,49886, flow_id=14760f4a-6ff7-4c64-b713-0655f017afd0, vmId=be44bd67-a80b-4f4c-8c58-4bc7554b2005 (api:48) 2018-10-16 15:08:35,470+0300 INFO (jsonrpc/3) [api.virt] FINISH snapshot return={'status': {'message': ' Snapshot failed', 'code': 48}} from=::ffff:10.35.162.7,49886, flow_id=14760f4a-6ff7-4c64-b713-0655f017afd 0, vmId=be44bd67-a80b-4f4c-8c58-4bc7554b2005 (api:54) 2018-10-16 15:08:35,471+0300 INFO (jsonrpc/3) [jsonrpc.JsonRpcServer] RPC call VM.snapshot failed (error 48) in 35.13 seconds (__init__:312) *** Engine log (end command): 2018-10-16 15:08:36,625+03 ERROR [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [14760f4a-6ff7-4c64-b713-0655f017afd0] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand' with failure. Verified upstream.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1085
sync2jira