Bug 1443137 - Live Storage Migration sequence did not complete, SyncImage task failed on the SPM
Summary: Live Storage Migration sequence did not complete, SyncImage task failed on th...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.6.9
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ovirt-4.2.0
: ---
Assignee: Daniel Erez
QA Contact: Eyal Shenitzky
URL:
Whiteboard:
Depends On:
Blocks: 1459216
TreeView+ depends on / blocked
 
Reported: 2017-04-18 14:52 UTC by Gordon Watson
Modified: 2020-12-14 09:17 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1459216 (view as bug list)
Environment:
Last Closed: 2017-07-20 13:24:17 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3005271 0 None None None 2017-04-18 16:35:56 UTC

Description Gordon Watson 2017-04-18 14:52:09 UTC
Description of problem:

A Live Storage Migration 'hung'. The task handling the internal (base) volume copy on the SPM encountered an error, yet the 'copy' (qemu-img convert) appeared to have completed successfully. A subsequent "UnknownTask" error occurred while trying to stop/clear the task. 

On the engine, the SyncImageGroupDataVDSCommand completed without 'error', yet the LSM sequence just stopped, and so the VmReplicateDiskFinishVDSCommand was never executed. 

As a result, the symptoms were that the base volume copy seemed to have completed, but the active volume 'block copy' job was still in running.



Version-Release number of selected component (if applicable):

RHEV 3.6.9
RHEL 7.2 host;
   vdsm-4.17.35-1.el7
   libvirt-1.2.17-13.el7_2.5
   qemu-kvm-rhev-2.3.0-31.el7_2.21


How reproducible:

Not reproducible.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 11 Daniel Erez 2017-06-06 10:59:37 UTC
LSM implementation in engine changed drastically in 4.0 (from SEAT to CoCo infra). I couldn't reproduce the issue on latest build. Moving to ON_QA for verification.

Comment 12 Eyal Shenitzky 2017-06-06 11:43:32 UTC
Can you pls add steps to reproduce?

Comment 13 Daniel Erez 2017-06-06 13:35:41 UTC
(In reply to Eyal Shenitzky from comment #12)
> Can you pls add steps to reproduce?

There's actually no exact reproducing steps. It seems like a race that happens occasionally. If you didn't encounter this issue since 4.0, we can close the bug on insufficient_data and reopen if the issue is reproduced.

Comment 15 Eyal Shenitzky 2017-06-07 04:53:37 UTC
I didn't encounter that issue,
I Closing the bug and reopen in case it will be reproduced.


Note You need to log in before you can comment on or make changes to this bug.