Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 922823

Summary:

live-migration: Cannot move disk after different disk movement has been finished

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Jakub Libosvar <jlibosva>

Component:

ovirt-engine

Assignee:

Ayal Baron <abaron>

Status:

CLOSED WONTFIX

QA Contact:

Dafna Ron <dron>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.2.0

CC:

acathrow, amureini, derez, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, yeylon, ykaul

Target Milestone:

---

Target Release:

3.2.0

Hardware:

All

OS:

Linux

Whiteboard:

storage

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-03-19 21:46:32 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
engine log	none

Description Jakub Libosvar 2013-03-18 15:35:41 UTC

Created attachment 712066 [details]
engine log

Description of problem:
I have running VM with two disks. I start live migration of disk1, both images get locked. After disk gets to "ok" state, I start live migration of the second disk but it fails in canDoAction with false message:
Cannot move Virtual Machine Disk:

2013-03-18 14:19:52,933 INFO  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (ajp-/127.0.0.1:8702-2) [2ba6e2f2] Failed to Acquire Lock to object EngineLock [exclusiveLocks= key: 126d352b-da34-406e-b59c-5c2cf5424e6f value: VM
, sharedLocks= ]
2013-03-18 14:19:52,933 WARN  [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (ajp-/127.0.0.1:8702-2) [2ba6e2f2] CanDoAction of action LiveMigrateVmDisks failed. Reasons:VAR__ACTION__MOVE,VAR__TYPE__VM_DISK,ACTION_TYPE_FAILED_OBJECT_LOCKED
2013-03-18 14:19:52,941 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp-/127.0.0.1:8702-2) Operation Failed: [Cannot move Virtual Machine Disk. Related operation is currently in progress. Please try again later.]


But before starting new live migration I obtained via API states of both disks:
<disks>
    <disk href="/api/vms/126d352b-da34-406e-b59c-5c2cf5424e6f/disks/19d872b3-52c4-4bed-8245-f35699698e1f" id="19d872b3-52c4-4bed-8245-f35699698e1f">
        <name>vm_virtio_cow_hsm_2_Disk1</name>
        ...
        <alias>vm_virtio_cow_hsm_2_Disk1</alias>
        <image_id>af2a6435-0f4b-42ea-b894-39b36b5da69a</image_id>
        ...
        <status>
            <state>ok</state>
        </status>
        ...
    </disk>
    <disk href="/api/vms/126d352b-da34-406e-b59c-5c2cf5424e6f/disks/5698acd3-688b-44d7-8e84-470bb4e2871f" id="5698acd3-688b-44d7-8e84-470bb4e2871f">
        ...
        <name>vm_virtio_cow_hsm_2_Disk2</name>
        ...
        <alias>vm_virtio_cow_hsm_2_Disk2</alias>
        <image_id>659a516f-f4eb-44dc-b05f-aca2df78fdf3</image_id>
        ...
        <status>
            <state>ok</state>
        </status>
        ...
    </disk>
</disks>


Version-Release number of selected component (if applicable):
rhevm-3.2.0-10.14.beta1.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Have running VM with two disks
2. Via API start live migration of first disk
3. Wait until disks get unlocked
4. Start live migration of second disk
  
Actual results:
CanDoAction fails cause disks are locked

Expected results:
It shouldn't fail cause disks aren't locked or disks should be in locked state

Additional info:
I also tried to wait 5 seconds after disks got unlocked and still fails

Comment 2 Ayal Baron 2013-03-19 08:33:50 UTC

Daniel, have you taken a look at this?

Comment 3 Daniel Erez 2013-03-19 09:42:44 UTC

Jakub, on the second attempt to move a disk (step 4),
did the operation appear to be completed (in events log / tasks tab)?

According to the log, it seems that the second attempt to move a disk was right after the live snapshot step has been finished (in which the disk is still locked in memory for the rest of live migration process).

Comment 4 Jakub Libosvar 2013-03-19 09:52:26 UTC

I talked about it and the flow is following:
 1) User starts live migration - live snapshots of all vms' disks are created
    That leads to locked disks.
 2) After snapshots are created, disks come back to "ok" status and live migration process starts
 3) Migrated disk goes to locked status again
 4) After process is finished, disk gets back to "ok" and operations can be done successfully with the disk.

I didn't know about locking disk twice, so the error in log is correct. From user point of view - why are two locks needed (I understand the disk flow - snapshot, migration)?

(In reply to comment #3)
> Jakub, on the second attempt to move a disk (step 4),
> did the operation appear to be completed (in events log / tasks tab)?
Probably it was still going on since I hit the state between snapshot creation completion and migration process.

Comment 5 Ayal Baron 2013-03-19 21:46:32 UTC

There are 2 ways of making the flow "simpler" -
1. have single disk level snapshots
2. have an infrastructure that enables keeping the lock across operations.
Currently neither exists.
We will implement 1 above in a future version and this bug will not be relevant then.

Comment 6 Haim 2013-03-24 13:25:09 UTC

(In reply to comment #5)
> There are 2 ways of making the flow "simpler" -
> 1. have single disk level snapshots
> 2. have an infrastructure that enables keeping the lock across operations.
> Currently neither exists.
> We will implement 1 above in a future version and this bug will not be
> relevant then.

but its relevant now so why closing the bug? how am I suppose to track those changes for future use? suggest to re-open and set future mile stone.