Bug 1417456

Summary: Disk move stuck after vdsmd restart on SPM during disk move between storage domains
Product: [oVirt] ovirt-engine Reporter: Avihai <aefrat>
Component: BLL.StorageAssignee: Liron Aravot <laravot>
Status: CLOSED DUPLICATE QA Contact: Raz Tamir <ratamir>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0.2CC: bazulay, bugs, gklein, laravot, lsurette, srevivo, tnisan, ycui, ykaul
Target Milestone: ovirt-4.1.0-rcFlags: rule-engine: ovirt-4.1?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-30 13:37:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine & vdsm logs none

Description Avihai 2017-01-29 15:01:42 UTC
Created attachment 1245582 [details]
engine & vdsm logs

Description of problem:
Disk move stuck after vdsmd restart on SPM during disk move between storage domains.

Version-Release number of selected component (if applicable):
Engine = ovirt-engine-4.1.0.2-0.2.el7.noarch
vdsm = 4.19.2-2

How reproducible:
Happened once on 10G disk , did not happen in smaller size disk as 5G .

Steps to Reproduce:
1. Create a 10G preallocated disk
2. Move disk to a different storage domain 
   I moved disk between 2 iscsi storage domains , source = iscsi_2 target = iscsi_1
3. Restart vdsmd on SPM host (host_mixed_2)
4. New SPM is selected (host_mixed_3) 
5. Check the disk status .

Actual results:
Disk is stuck in "LOCKED" status forever (1H+) without any change ,shows 19% progress.

Expected results:
Action should be rolled back and disk should not be copied & available on source storage domain after new SPM host is up


Additional info:
From new SPM host vdsm.log:
2017-01-29 15:30:39,015 ERROR (upgrade/c9d819f) [storage.StoragePool] Unhandled exception (utils:371)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 368, in wrapper
    return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/concurrent.py", line 180, in run
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 232, in _upgradePoolDomain
    self._finalizePoolUpgradeIfNeeded()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper
    raise SecureError("Secured object is not in safe state")


Timeline (last event first) :

Jan 29, 2017 3:31:03 PM
Status of host host_mixed_2 was set to Up.


Jan 29, 2017 3:30:57 PM
VDSM host_mixed_2 command GetCapabilitiesVDS failed: Client close


Jan 29, 2017 3:30:39 PM
Storage Pool Manager runs on Host host_mixed_3 (Address: storage-ge4-vdsm3.qa.lab.tlv.redhat.com).


Jan 29, 2017 3:30:38 PM
VDSM host_mixed_2 command HSMGetAllTasksStatusesVDS failed: Not SPM: ()


Jan 29, 2017 3:30:38 PM
Invalid status on Data Center golden_env_mixed. Setting status to Non Responsive.


Jan 29, 2017 3:30:34 PM
Host host_mixed_2 is not responding. Host cannot be fenced automatically because power management for the host is disabled.


Jan 29, 2017 3:30:33 PM


Jan 29, 2017 3:30:18 PM
User admin@internal-authz moving disk preallocated_disk to domain iscsi_1.

Jan 29, 2017 3:30:16 PM
The disk 'preallocated_disk' was successfully added.

Comment 1 Raz Tamir 2017-01-29 17:58:23 UTC
Might be related to new HSM infrastructure on cold move disk

Comment 2 Tal Nisan 2017-01-29 18:38:47 UTC
Liron, seems to me like a duplicate of 1415502, isn't it?

Comment 3 Liron Aravot 2017-01-30 13:37:04 UTC
Tal, Indeed.

*** This bug has been marked as a duplicate of bug 1415502 ***