Bug 922823
| Summary: | live-migration: Cannot move disk after different disk movement has been finished | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Jakub Libosvar <jlibosva> | ||||
| Component: | ovirt-engine | Assignee: | Ayal Baron <abaron> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | Dafna Ron <dron> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.2.0 | CC: | acathrow, amureini, derez, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, yeylon, ykaul | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.2.0 | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-03-19 21:46:32 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Daniel, have you taken a look at this? Jakub, on the second attempt to move a disk (step 4), did the operation appear to be completed (in events log / tasks tab)? According to the log, it seems that the second attempt to move a disk was right after the live snapshot step has been finished (in which the disk is still locked in memory for the rest of live migration process). I talked about it and the flow is following:
1) User starts live migration - live snapshots of all vms' disks are created
That leads to locked disks.
2) After snapshots are created, disks come back to "ok" status and live migration process starts
3) Migrated disk goes to locked status again
4) After process is finished, disk gets back to "ok" and operations can be done successfully with the disk.
I didn't know about locking disk twice, so the error in log is correct. From user point of view - why are two locks needed (I understand the disk flow - snapshot, migration)?
(In reply to comment #3)
> Jakub, on the second attempt to move a disk (step 4),
> did the operation appear to be completed (in events log / tasks tab)?
Probably it was still going on since I hit the state between snapshot creation completion and migration process.
There are 2 ways of making the flow "simpler" - 1. have single disk level snapshots 2. have an infrastructure that enables keeping the lock across operations. Currently neither exists. We will implement 1 above in a future version and this bug will not be relevant then. (In reply to comment #5) > There are 2 ways of making the flow "simpler" - > 1. have single disk level snapshots > 2. have an infrastructure that enables keeping the lock across operations. > Currently neither exists. > We will implement 1 above in a future version and this bug will not be > relevant then. but its relevant now so why closing the bug? how am I suppose to track those changes for future use? suggest to re-open and set future mile stone. |
Created attachment 712066 [details] engine log Description of problem: I have running VM with two disks. I start live migration of disk1, both images get locked. After disk gets to "ok" state, I start live migration of the second disk but it fails in canDoAction with false message: Cannot move Virtual Machine Disk: 2013-03-18 14:19:52,933 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (ajp-/127.0.0.1:8702-2) [2ba6e2f2] Failed to Acquire Lock to object EngineLock [exclusiveLocks= key: 126d352b-da34-406e-b59c-5c2cf5424e6f value: VM , sharedLocks= ] 2013-03-18 14:19:52,933 WARN [org.ovirt.engine.core.bll.lsm.LiveMigrateVmDisksCommand] (ajp-/127.0.0.1:8702-2) [2ba6e2f2] CanDoAction of action LiveMigrateVmDisks failed. Reasons:VAR__ACTION__MOVE,VAR__TYPE__VM_DISK,ACTION_TYPE_FAILED_OBJECT_LOCKED 2013-03-18 14:19:52,941 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (ajp-/127.0.0.1:8702-2) Operation Failed: [Cannot move Virtual Machine Disk. Related operation is currently in progress. Please try again later.] But before starting new live migration I obtained via API states of both disks: <disks> <disk href="/api/vms/126d352b-da34-406e-b59c-5c2cf5424e6f/disks/19d872b3-52c4-4bed-8245-f35699698e1f" id="19d872b3-52c4-4bed-8245-f35699698e1f"> <name>vm_virtio_cow_hsm_2_Disk1</name> ... <alias>vm_virtio_cow_hsm_2_Disk1</alias> <image_id>af2a6435-0f4b-42ea-b894-39b36b5da69a</image_id> ... <status> <state>ok</state> </status> ... </disk> <disk href="/api/vms/126d352b-da34-406e-b59c-5c2cf5424e6f/disks/5698acd3-688b-44d7-8e84-470bb4e2871f" id="5698acd3-688b-44d7-8e84-470bb4e2871f"> ... <name>vm_virtio_cow_hsm_2_Disk2</name> ... <alias>vm_virtio_cow_hsm_2_Disk2</alias> <image_id>659a516f-f4eb-44dc-b05f-aca2df78fdf3</image_id> ... <status> <state>ok</state> </status> ... </disk> </disks> Version-Release number of selected component (if applicable): rhevm-3.2.0-10.14.beta1.el6ev.noarch How reproducible: Always Steps to Reproduce: 1. Have running VM with two disks 2. Via API start live migration of first disk 3. Wait until disks get unlocked 4. Start live migration of second disk Actual results: CanDoAction fails cause disks are locked Expected results: It shouldn't fail cause disks aren't locked or disks should be in locked state Additional info: I also tried to wait 5 seconds after disks got unlocked and still fails