Bug 1313744

Summary: VM is inoperative after power off during Live storage migration
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.3.3CC: amureini, bugs, derez, eshenitz, gchakkar, gklein, ratamir, sbonazzo, tnisan, ylavi
Target Milestone: ovirt-3.6.5Keywords: Automation, Regression, Reopened
Target Release: 3.6.5Flags: amureini: ovirt-3.6.z?
rule-engine: blocker?
rule-engine: planning_ack?
tnisan: devel_ack+
rule-engine: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-04-21 14:37:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
engine logs
none
vdsm log
none
new engine log none

Description Eyal Shenitzky 2016-03-02 09:57:18 UTC
Created attachment 1132215 [details]
engine logs

Description of problem:

Power off a VM during Live storage migration of file disk after snapshot has been created will cause the VM to become inoperative - cannot run, removed.

engine action massage: 
Cannot <run\remove> VM. Disk <disk name> is being moved or copied

snapshot cannot removed due to the same massage.

Version-Release number of selected component (if applicable):
Engine - 3.6.3.4-0.1.el6
VDSM - 4.17.23-0.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Create VM with file based disk
2. Run the VM
3. Live Migrate the disk
4. Power off the VM after Live Storage Migrate snapshot created 

Actual results:
The vm does power off but become inoperative as mentioned above 

Expected results:
VM should power off nicely and be able to run again.

Additional info:
VDSM and Engine log attached

Comment 1 Eyal Shenitzky 2016-03-02 09:59:36 UTC
Created attachment 1132229 [details]
vdsm log

Comment 2 Allon Mureinik 2016-03-02 11:24:55 UTC
Eyal - why is this marked as a regression? Can you attach the logs from a clean run?

Comment 3 Red Hat Bugzilla Rules Engine 2016-03-02 11:24:59 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 4 Raz Tamir 2016-03-03 13:15:42 UTC
Hi Allon,
This is a regression according bug 1128582
It used to work but with exception.

Comment 6 Yaniv Lavi 2016-03-06 15:11:11 UTC
Any status update of this one?

Comment 7 Daniel Erez 2016-03-07 12:48:01 UTC
Hi Eyal,

Are you referring to a validation message when running the VM (i.e. "Cannot run VM. Disk is being moved or copied.")? If so, there's a workaround for the issue by restarting engine service.

Comment 8 Daniel Erez 2016-03-07 13:19:43 UTC
(In reply to Daniel Erez from comment #7)
> Hi Eyal,
> 
> Are you referring to a validation message when running the VM (i.e. "Cannot
> run VM. Disk is being moved or copied.")? If so, there's a workaround for
> the issue by restarting engine service.

Just reproduced the issue, there's indeed a gap between powering off the VM during live migration and being able to run it again, but this is expected. Since the disk is still being migrated when powering the VM down, running the VM again is blocked until the operation is completely finished (i.e. until migration is failed and the lock in memory is freed). If the disk in small enough it should finish in a couple of minutes, then you should be able to run the VM again. Closing the bug since that's the expected behavior, please open again if the the operation hangs infinitely.

Comment 9 Eyal Shenitzky 2016-03-07 13:50:31 UTC
Every time I try this scenario the operation hangs infinitely no matter what is the disk size.
Powering off a VM during Live Storage Migration should rollback the operation and then power off the vm, it doesn't supposed to wait until the operation is finished.

Comment 10 Daniel Erez 2016-03-07 15:13:37 UTC
(In reply to Eyal Shenitzky from comment #9)
> Every time I try this scenario the operation hangs infinitely no matter what
> is the disk size.
> Powering off a VM during Live Storage Migration should rollback the
> operation and then power off the vm, it doesn't supposed to wait until the
> operation is finished.

We can't rollback the operation immediately since we don't cancel on going tasks. An operation rollback can be performed only when a failure is detected by vdsm.
* Can you please check if engine restart resolves the issue to understand if we're referring to the same problem.
* Can you please attach a list of running vdsm tasks after reproducing the scenario ('vdsClient -s 0 getAllTasks')
* While at it, please attach 'clean' engine logs; i.e. a log containing only the relevant period of time executing the scenario.

Thanks!

Comment 11 Eyal Shenitzky 2016-03-08 13:34:13 UTC
Engine restart resolves does resolve the problem.

There is no running task in the VDSM after reproduction

I attached new Engine log please look at the log massages around - 8/3/16 15:31

Comment 12 Eyal Shenitzky 2016-03-08 13:37:07 UTC
Created attachment 1134171 [details]
new engine log

Comment 13 Eyal Shenitzky 2016-03-08 14:04:03 UTC
Please pay attention that the migration does failed when the VM  is power-off.

Comment 14 Elad 2016-03-31 12:32:36 UTC
Steps:
1. Create VM with file based disk
2. Run the VM
3. Live Migrate the disk
4. Power off the VM after Live Storage Migrate snapshot created 

VM is operative after live storage migration failure

Verified using:
rhevm-3.6.5-0.1.el6.noarch
vdsm-4.17.25-0.el7ev.noarch