Bug 1313744 - VM is inoperative after power off during Live storage migration
Summary: VM is inoperative after power off during Live storage migration
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 3.6.3.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-3.6.5
: 3.6.5
Assignee: Daniel Erez
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-02 09:57 UTC by Eyal Shenitzky
Modified: 2021-08-30 10:40 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-21 14:37:54 UTC
oVirt Team: Storage
Embargoed:
amureini: ovirt-3.6.z?
rule-engine: blocker?
rule-engine: planning_ack?
tnisan: devel_ack+
rule-engine: testing_ack+


Attachments (Terms of Use)
engine logs (10.19 MB, text/plain)
2016-03-02 09:57 UTC, Eyal Shenitzky
no flags Details
vdsm log (2.65 MB, text/plain)
2016-03-02 09:59 UTC, Eyal Shenitzky
no flags Details
new engine log (330.82 KB, application/x-bzip)
2016-03-08 13:37 UTC, Eyal Shenitzky
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43165 0 None None None 2021-08-30 10:40:46 UTC
Red Hat Knowledge Base (Solution) 2461971 0 None None None 2016-07-25 05:18:37 UTC
oVirt gerrit 54730 0 master MERGED core: LiveMigrateDiskCommand - shared locks are unneeded 2020-03-13 16:24:58 UTC
oVirt gerrit 54776 0 ovirt-engine-3.6 MERGED core: LiveMigrateDiskCommand - shared locks are unneeded 2020-03-13 16:24:58 UTC

Description Eyal Shenitzky 2016-03-02 09:57:18 UTC
Created attachment 1132215 [details]
engine logs

Description of problem:

Power off a VM during Live storage migration of file disk after snapshot has been created will cause the VM to become inoperative - cannot run, removed.

engine action massage: 
Cannot <run\remove> VM. Disk <disk name> is being moved or copied

snapshot cannot removed due to the same massage.

Version-Release number of selected component (if applicable):
Engine - 3.6.3.4-0.1.el6
VDSM - 4.17.23-0.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create VM with file based disk
2. Run the VM
3. Live Migrate the disk
4. Power off the VM after Live Storage Migrate snapshot created 

Actual results:
The vm does power off but become inoperative as mentioned above 

Expected results:
VM should power off nicely and be able to run again.

Additional info:
VDSM and Engine log attached

Comment 1 Eyal Shenitzky 2016-03-02 09:59:36 UTC
Created attachment 1132229 [details]
vdsm log

Comment 2 Allon Mureinik 2016-03-02 11:24:55 UTC
Eyal - why is this marked as a regression? Can you attach the logs from a clean run?

Comment 3 Red Hat Bugzilla Rules Engine 2016-03-02 11:24:59 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 Raz Tamir 2016-03-03 13:15:42 UTC
Hi Allon,
This is a regression according bug 1128582
It used to work but with exception.

Comment 6 Yaniv Lavi 2016-03-06 15:11:11 UTC
Any status update of this one?

Comment 7 Daniel Erez 2016-03-07 12:48:01 UTC
Hi Eyal,

Are you referring to a validation message when running the VM (i.e. "Cannot run VM. Disk is being moved or copied.")? If so, there's a workaround for the issue by restarting engine service.

Comment 8 Daniel Erez 2016-03-07 13:19:43 UTC
(In reply to Daniel Erez from comment #7)
> Hi Eyal,
> 
> Are you referring to a validation message when running the VM (i.e. "Cannot
> run VM. Disk is being moved or copied.")? If so, there's a workaround for
> the issue by restarting engine service.

Just reproduced the issue, there's indeed a gap between powering off the VM during live migration and being able to run it again, but this is expected. Since the disk is still being migrated when powering the VM down, running the VM again is blocked until the operation is completely finished (i.e. until migration is failed and the lock in memory is freed). If the disk in small enough it should finish in a couple of minutes, then you should be able to run the VM again. Closing the bug since that's the expected behavior, please open again if the the operation hangs infinitely.

Comment 9 Eyal Shenitzky 2016-03-07 13:50:31 UTC
Every time I try this scenario the operation hangs infinitely no matter what is the disk size.
Powering off a VM during Live Storage Migration should rollback the operation and then power off the vm, it doesn't supposed to wait until the operation is finished.

Comment 10 Daniel Erez 2016-03-07 15:13:37 UTC
(In reply to Eyal Shenitzky from comment #9)
> Every time I try this scenario the operation hangs infinitely no matter what
> is the disk size.
> Powering off a VM during Live Storage Migration should rollback the
> operation and then power off the vm, it doesn't supposed to wait until the
> operation is finished.

We can't rollback the operation immediately since we don't cancel on going tasks. An operation rollback can be performed only when a failure is detected by vdsm.
* Can you please check if engine restart resolves the issue to understand if we're referring to the same problem.
* Can you please attach a list of running vdsm tasks after reproducing the scenario ('vdsClient -s 0 getAllTasks')
* While at it, please attach 'clean' engine logs; i.e. a log containing only the relevant period of time executing the scenario.

Thanks!

Comment 11 Eyal Shenitzky 2016-03-08 13:34:13 UTC
Engine restart resolves does resolve the problem.

There is no running task in the VDSM after reproduction

I attached new Engine log please look at the log massages around - 8/3/16 15:31

Comment 12 Eyal Shenitzky 2016-03-08 13:37:07 UTC
Created attachment 1134171 [details]
new engine log

Comment 13 Eyal Shenitzky 2016-03-08 14:04:03 UTC
Please pay attention that the migration does failed when the VM  is power-off.

Comment 14 Elad 2016-03-31 12:32:36 UTC
Steps:
1. Create VM with file based disk
2. Run the VM
3. Live Migrate the disk
4. Power off the VM after Live Storage Migrate snapshot created 

VM is operative after live storage migration failure

Verified using:
rhevm-3.6.5-0.1.el6.noarch
vdsm-4.17.25-0.el7ev.noarch


Note You need to log in before you can comment on or make changes to this bug.