Bug 1161261 - VmReplicateDiskFinishVDSCommand is not executed when Live StorageMigration (LSM) is initated, leaving unfinished job
Summary: VmReplicateDiskFinishVDSCommand is not executed when Live StorageMigration (L...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.3
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 3.5.0
Assignee: Daniel Erez
QA Contact: lkuchlan
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-06 19:12 UTC by Bimal Chollera
Modified: 2016-03-16 18:57 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-16 19:09:01 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)
images (345.44 KB, application/x-gzip)
2014-12-01 15:20 UTC, lkuchlan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1298893 0 None None None 2016-03-16 18:57:16 UTC

Description Bimal Chollera 2014-11-06 19:12:36 UTC
Description of problem:

After a Live Storage Migration (LSM), a disk remains in "locked" status and the LSM sequence doesn't complete on the engine side. VmReplicateDiskFinishVDSCommand and DeleteImageGroupVDSCommand are never executed.

The associated job table entry in the RHEV database remains "STARTED".

 correlation_id |                job_id                |   action_type   |                     description                     | status  
----------------+--------------------------------------+-----------------+-----------------------------------------------------+---------
 edf0edf        | de4e97cf-e747-4657-bd07-e60aa278c0f1 | LiveMigrateDisk | Migrating Disk lsm-vm_Disk1 from LSM_GFW to NFS_GFW | STARTED

In the Admin Portal, the Disks menu will report the disk attached to the VM in "locked" status.

The engine sequence below completes but VmReplicateDiskFinishVDSCommand and DeleteImageGroupVDSCommand are never executed after the SyncImageGroupDataVDSCommand.

CloneImageGroupStructureVDSCommand
VmReplicateDiskStartVDSCommand
SyncImageGroupDataVDSCommand

I have been able to recreate this several times by live migrating 20 (arbitrary number) disks at once.

Version-Release number of selected component (if applicable):

Test environment;

   - RHEV-M 3.4.3 
   - Single host with (essentially) vdsm-4.14.13-2

How reproducible:

If I try to live migrate 20 disks at once I have encountered this problem every time. 

Steps to Reproduce:

1. In my specific case I created a pool of 20 VMs based off a template in an NFS data domain. 
2. I then started all 20 VMs. 
3. I then copied the template to a second NFS domain.
4. I then live migrated all 20 disks to the second NFS domain.

Actual results:

One of the 20 failed as described above.

Expected results:

All of the LSMs should complete and the associated jobs in the database be marked as "FINISHED".

Comment 8 Daniel Erez 2014-11-18 14:44:31 UTC
Should be fixed in 3.5 build.

Comment 9 lkuchlan 2014-12-01 15:20:29 UTC
Created attachment 963328 [details]
images

Tested using RHEVM 3.5 vt11
All of the LSMs completed and the associated jobs in the database be marked as "FINISHED"

Comment 10 Allon Mureinik 2015-02-16 19:09:01 UTC
RHEV-M 3.5.0 has been released, closing this bug.


Note You need to log in before you can comment on or make changes to this bug.