Created attachment 1245582 [details]
engine & vdsm logs
Description of problem:
Disk move stuck after vdsmd restart on SPM during disk move between storage domains.
Version-Release number of selected component (if applicable):
Engine = ovirt-engine-18.104.22.168-0.2.el7.noarch
vdsm = 4.19.2-2
Happened once on 10G disk , did not happen in smaller size disk as 5G .
Steps to Reproduce:
1. Create a 10G preallocated disk
2. Move disk to a different storage domain
I moved disk between 2 iscsi storage domains , source = iscsi_2 target = iscsi_1
3. Restart vdsmd on SPM host (host_mixed_2)
4. New SPM is selected (host_mixed_3)
5. Check the disk status .
Disk is stuck in "LOCKED" status forever (1H+) without any change ,shows 19% progress.
Action should be rolled back and disk should not be copied & available on source storage domain after new SPM host is up
From new SPM host vdsm.log:
2017-01-29 15:30:39,015 ERROR (upgrade/c9d819f) [storage.StoragePool] Unhandled exception (utils:371)
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 368, in wrapper
return f(*a, **kw)
File "/usr/lib/python2.7/site-packages/vdsm/concurrent.py", line 180, in run
return func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
return method(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 232, in _upgradePoolDomain
File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper
raise SecureError("Secured object is not in safe state")
Timeline (last event first) :
Jan 29, 2017 3:31:03 PM
Status of host host_mixed_2 was set to Up.
Jan 29, 2017 3:30:57 PM
VDSM host_mixed_2 command GetCapabilitiesVDS failed: Client close
Jan 29, 2017 3:30:39 PM
Storage Pool Manager runs on Host host_mixed_3 (Address: storage-ge4-vdsm3.qa.lab.tlv.redhat.com).
Jan 29, 2017 3:30:38 PM
VDSM host_mixed_2 command HSMGetAllTasksStatusesVDS failed: Not SPM: ()
Jan 29, 2017 3:30:38 PM
Invalid status on Data Center golden_env_mixed. Setting status to Non Responsive.
Jan 29, 2017 3:30:34 PM
Host host_mixed_2 is not responding. Host cannot be fenced automatically because power management for the host is disabled.
Jan 29, 2017 3:30:33 PM
Jan 29, 2017 3:30:18 PM
User admin@internal-authz moving disk preallocated_disk to domain iscsi_1.
Jan 29, 2017 3:30:16 PM
The disk 'preallocated_disk' was successfully added.
Might be related to new HSM infrastructure on cold move disk
Liron, seems to me like a duplicate of 1415502, isn't it?
*** This bug has been marked as a duplicate of bug 1415502 ***