Bug 1417456 - Disk move stuck after vdsmd restart on SPM during disk move between storage domains
Summary: Disk move stuck after vdsmd restart on SPM during disk move between storage d...
Keywords:
Status: CLOSED DUPLICATE of bug 1415502
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.1.0.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.1.0-rc
: ---
Assignee: Liron Aravot
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-29 15:01 UTC by Avihai
Modified: 2017-01-30 13:39 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-30 13:37:04 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.1?


Attachments (Terms of Use)
engine & vdsm logs (1.64 MB, application/x-gzip)
2017-01-29 15:01 UTC, Avihai
no flags Details

Description Avihai 2017-01-29 15:01:42 UTC
Created attachment 1245582 [details]
engine & vdsm logs

Description of problem:
Disk move stuck after vdsmd restart on SPM during disk move between storage domains.

Version-Release number of selected component (if applicable):
Engine = ovirt-engine-4.1.0.2-0.2.el7.noarch
vdsm = 4.19.2-2

How reproducible:
Happened once on 10G disk , did not happen in smaller size disk as 5G .

Steps to Reproduce:
1. Create a 10G preallocated disk
2. Move disk to a different storage domain 
   I moved disk between 2 iscsi storage domains , source = iscsi_2 target = iscsi_1
3. Restart vdsmd on SPM host (host_mixed_2)
4. New SPM is selected (host_mixed_3) 
5. Check the disk status .

Actual results:
Disk is stuck in "LOCKED" status forever (1H+) without any change ,shows 19% progress.

Expected results:
Action should be rolled back and disk should not be copied & available on source storage domain after new SPM host is up


Additional info:
From new SPM host vdsm.log:
2017-01-29 15:30:39,015 ERROR (upgrade/c9d819f) [storage.StoragePool] Unhandled exception (utils:371)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 368, in wrapper
    return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/concurrent.py", line 180, in run
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 232, in _upgradePoolDomain
    self._finalizePoolUpgradeIfNeeded()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper
    raise SecureError("Secured object is not in safe state")


Timeline (last event first) :

Jan 29, 2017 3:31:03 PM
Status of host host_mixed_2 was set to Up.


Jan 29, 2017 3:30:57 PM
VDSM host_mixed_2 command GetCapabilitiesVDS failed: Client close


Jan 29, 2017 3:30:39 PM
Storage Pool Manager runs on Host host_mixed_3 (Address: storage-ge4-vdsm3.qa.lab.tlv.redhat.com).


Jan 29, 2017 3:30:38 PM
VDSM host_mixed_2 command HSMGetAllTasksStatusesVDS failed: Not SPM: ()


Jan 29, 2017 3:30:38 PM
Invalid status on Data Center golden_env_mixed. Setting status to Non Responsive.


Jan 29, 2017 3:30:34 PM
Host host_mixed_2 is not responding. Host cannot be fenced automatically because power management for the host is disabled.


Jan 29, 2017 3:30:33 PM


Jan 29, 2017 3:30:18 PM
User admin@internal-authz moving disk preallocated_disk to domain iscsi_1.

Jan 29, 2017 3:30:16 PM
The disk 'preallocated_disk' was successfully added.

Comment 1 Raz Tamir 2017-01-29 17:58:23 UTC
Might be related to new HSM infrastructure on cold move disk

Comment 2 Tal Nisan 2017-01-29 18:38:47 UTC
Liron, seems to me like a duplicate of 1415502, isn't it?

Comment 3 Liron Aravot 2017-01-30 13:37:04 UTC
Tal, Indeed.

*** This bug has been marked as a duplicate of bug 1415502 ***


Note You need to log in before you can comment on or make changes to this bug.