Bug 1459216

Summary: [downstream clone - 4.1.3] Live Storage Migration sequence did not complete, SyncImage task failed on the SPM
Product: Red Hat Enterprise Virtualization Manager Reporter: rhev-integ
Component: vdsmAssignee: Daniel Erez <derez>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Eyal Shenitzky <eshenitz>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.9CC: amureini, bazulay, bcholler, derez, gveitmic, kmashalk, lsurette, mkalinin, ratamir, srevivo, tnisan, ycui, ykaul, ylavi
Target Milestone: ovirt-4.1.3Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1443137 Environment:
Last Closed: 2017-06-07 05:00:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1443137    
Bug Blocks:    

Description rhev-integ 2017-06-06 14:41:44 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1443137 +++
======================================================================

Description of problem:

A Live Storage Migration 'hung'. The task handling the internal (base) volume copy on the SPM encountered an error, yet the 'copy' (qemu-img convert) appeared to have completed successfully. A subsequent "UnknownTask" error occurred while trying to stop/clear the task. 

On the engine, the SyncImageGroupDataVDSCommand completed without 'error', yet the LSM sequence just stopped, and so the VmReplicateDiskFinishVDSCommand was never executed. 

As a result, the symptoms were that the base volume copy seemed to have completed, but the active volume 'block copy' job was still in running.



Version-Release number of selected component (if applicable):

RHEV 3.6.9
RHEL 7.2 host;
   vdsm-4.17.35-1.el7
   libvirt-1.2.17-13.el7_2.5
   qemu-kvm-rhev-2.3.0-31.el7_2.21


How reproducible:

Not reproducible.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

(Originally by Gordon Watson)

Comment 12 rhev-integ 2017-06-06 14:42:58 UTC
LSM implementation in engine changed drastically in 4.0 (from SEAT to CoCo infra). I couldn't reproduce the issue on latest build. Moving to ON_QA for verification.

(Originally by Daniel Erez)

Comment 13 rhev-integ 2017-06-06 14:43:04 UTC
Can you pls add steps to reproduce?

(Originally by Eyal Shenitzky)

Comment 14 rhev-integ 2017-06-06 14:43:10 UTC
(In reply to Eyal Shenitzky from comment #12)
> Can you pls add steps to reproduce?

There's actually no exact reproducing steps. It seems like a race that happens occasionally. If you didn't encounter this issue since 4.0, we can close the bug on insufficient_data and reopen if the issue is reproduced.

(Originally by Daniel Erez)

Comment 15 Eyal Shenitzky 2017-06-07 05:00:48 UTC
I didn't encounter that issue,
I Closing the bug and reopen in case it will be reproduced.

Close as insufficient_data as suggested in
 
https://bugzilla.redhat.com/show_bug.cgi?id=1443137#c13