Bug 1161261

Summary:

VmReplicateDiskFinishVDSCommand is not executed when Live StorageMigration (LSM) is initated, leaving unfinished job

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Bimal Chollera <bcholler>

Component:

ovirt-engine

Assignee:

Daniel Erez <derez>

Status:

CLOSED CURRENTRELEASE

QA Contact:

lkuchlan <lkuchlan>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.4.3

CC:

amureini, ecohen, gklein, gwatson, iheim, lpeer, lsurette, mkalinin, rbalakri, Rhev-m-bugs, scohen, tnisan, yeylon

Target Milestone:

---

Target Release:

3.5.0

Hardware:

All

OS:

Linux

Whiteboard:

storage

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-02-16 19:09:01 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
images	none

Description Bimal Chollera 2014-11-06 19:12:36 UTC

Description of problem:

After a Live Storage Migration (LSM), a disk remains in "locked" status and the LSM sequence doesn't complete on the engine side. VmReplicateDiskFinishVDSCommand and DeleteImageGroupVDSCommand are never executed.

The associated job table entry in the RHEV database remains "STARTED".

In the Admin Portal, the Disks menu will report the disk attached to the VM in "locked" status.

The engine sequence below completes but VmReplicateDiskFinishVDSCommand and DeleteImageGroupVDSCommand are never executed after the SyncImageGroupDataVDSCommand.

CloneImageGroupStructureVDSCommand
VmReplicateDiskStartVDSCommand
SyncImageGroupDataVDSCommand

I have been able to recreate this several times by live migrating 20 (arbitrary number) disks at once.

Version-Release number of selected component (if applicable):

Test environment;

- RHEV-M 3.4.3
- Single host with (essentially) vdsm-4.14.13-2

How reproducible:

If I try to live migrate 20 disks at once I have encountered this problem every time.

Steps to Reproduce:

1. In my specific case I created a pool of 20 VMs based off a template in an NFS data domain.
2. I then started all 20 VMs.
3. I then copied the template to a second NFS domain.
4. I then live migrated all 20 disks to the second NFS domain.

Actual results:

One of the 20 failed as described above.

Expected results:

All of the LSMs should complete and the associated jobs in the database be marked as "FINISHED".

Comment 8 Daniel Erez 2014-11-18 14:44:31 UTC

Should be fixed in 3.5 build.

Comment 9 lkuchlan 2014-12-01 15:20:29 UTC

Created attachment 963328 [details]
images

Tested using RHEVM 3.5 vt11
All of the LSMs completed and the associated jobs in the database be marked as "FINISHED"

Comment 10 Allon Mureinik 2015-02-16 19:09:01 UTC

RHEV-M 3.5.0 has been released, closing this bug.