Bug 1421417

Summary: Disk remains in 'locked' state when blocking the connection from host to storage-domain during live storage migration
Product: [oVirt] ovirt-engine Reporter: Eyal Shenitzky <eshenitz>
Component: BLL.StorageAssignee: Fred Rolland <frolland>
Status: CLOSED CURRENTRELEASE QA Contact: Eyal Shenitzky <eshenitz>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0.4CC: amureini, bugs, eshenitz, stirabos, tnisan
Target Milestone: ovirt-4.1.3Keywords: Automation
Target Release: 4.1.3.5Flags: rule-engine: ovirt-4.1+
rule-engine: blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-06 13:40:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine and vdsm logs none

Description Eyal Shenitzky 2017-02-12 07:55:06 UTC
Description of problem:

When performing Live storage migration to VM's disk and blocking the connection from the VM's host to the disk's storage domain during the migration using iptables, the disk remain in 'locked' state and the migration failed due to:

Failed to complete snapshot 'Auto-generated for Live Storage Migration' creation for VM 'xxx'.
 

Version-Release number of selected component (if applicable):
Engine - 4.1.0.4-0.1.el7
VDSM - 4.19.4-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with disk
2. Start the VM
3. Move the disk to other storage-domain
4. Block the connection from the VM's host to the disk's storage-domain using iptables

Actual results:
Live storage migration failed and disk remains in 'locked' state

Expected results:
Live storage migration should fail nicely and disk should be in 'active' state

Additional info:
Engine and VDSM logs are attached

Comment 1 Eyal Shenitzky 2017-02-12 09:09:29 UTC
Bug also happend when when live migrate HA VM and restart the SPM host during the migration.

Steps to Reproduce:
1. Create a VM with disk
2. Update the VM to be highly available
3. Start the VM
4. Move the disk to another storage-domain
5. Restart the SPM host during the migration

Comment 2 Eyal Shenitzky 2017-02-12 10:06:26 UTC
Created attachment 1249482 [details]
engine and vdsm logs

Comment 3 Yaniv Kaul 2017-02-22 11:13:06 UTC
Is this a regression?

Comment 4 Eyal Shenitzky 2017-03-07 12:01:58 UTC
We didn't this kind of test for a long time so I don't have any reference to know if it is a regression or not

Comment 5 Eyal Shenitzky 2017-03-07 12:02:45 UTC
We didn't run* this kind...

Comment 6 Yaniv Kaul 2017-03-09 09:26:01 UTC
(In reply to Eyal Shenitzky from comment #5)
> We didn't run* this kind...

Well, you have 4.0.7 now - can you check (and btw, makes sense to automate this) ?

Comment 7 Eyal Shenitzky 2017-03-15 08:10:44 UTC
I will check It on 4.0 and add a flag in case of regression.
This bug found when I automated the above scenario.

Comment 8 Red Hat Bugzilla Rules Engine 2017-03-21 05:52:58 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 9 Eyal Shenitzky 2017-05-07 09:39:58 UTC
Remove regression tag, occur in 4.1 and also in 4.0

Comment 10 Allon Mureinik 2017-06-15 11:08:54 UTC
Patch isn't ready, and we have an unlocker utility. Pushing out to 4.1.4.

Comment 11 Eyal Shenitzky 2017-06-25 12:12:41 UTC
 Verified with the following code:
----------------------------------

VDSM - 4.19.20-1.el7ev.x86_64
RHEVM - 4.1.3.5-0.1.el7
 
Steps to reproduce:
------------------------
1. Create a VM with disk
2. Start the VM
3. Move the disk to other storage-domain
4. Block the connection from the VM's host to the disk's storage-domain using 

Moving to VERIFIED