Bug 1417456

Summary:

Disk move stuck after vdsmd restart on SPM during disk move between storage domains

Product:

[oVirt] ovirt-engine

Reporter:

Avihai <aefrat>

Component:

BLL.Storage

Assignee:

Liron Aravot <laravot>

Status:

CLOSED DUPLICATE

QA Contact:

Raz Tamir <ratamir>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.1.0.2

CC:

bazulay, bugs, gklein, laravot, lsurette, srevivo, tnisan, ycui, ykaul

Target Milestone:

ovirt-4.1.0-rc

Flags:

rule-engine: ovirt-4.1?

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-01-30 13:37:04 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
engine & vdsm logs	none

Description Avihai 2017-01-29 15:01:42 UTC

Created attachment 1245582 [details]
engine & vdsm logs

Description of problem:
Disk move stuck after vdsmd restart on SPM during disk move between storage domains.

Version-Release number of selected component (if applicable):
Engine = ovirt-engine-4.1.0.2-0.2.el7.noarch
vdsm = 4.19.2-2

How reproducible:
Happened once on 10G disk , did not happen in smaller size disk as 5G .

Steps to Reproduce:
1. Create a 10G preallocated disk
2. Move disk to a different storage domain 
   I moved disk between 2 iscsi storage domains , source = iscsi_2 target = iscsi_1
3. Restart vdsmd on SPM host (host_mixed_2)
4. New SPM is selected (host_mixed_3) 
5. Check the disk status .

Actual results:
Disk is stuck in "LOCKED" status forever (1H+) without any change ,shows 19% progress.

Expected results:
Action should be rolled back and disk should not be copied & available on source storage domain after new SPM host is up


Additional info:
From new SPM host vdsm.log:
2017-01-29 15:30:39,015 ERROR (upgrade/c9d819f) [storage.StoragePool] Unhandled exception (utils:371)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 368, in wrapper
    return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/vdsm/concurrent.py", line 180, in run
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 232, in _upgradePoolDomain
    self._finalizePoolUpgradeIfNeeded()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 77, in wrapper
    raise SecureError("Secured object is not in safe state")


Timeline (last event first) :

Jan 29, 2017 3:31:03 PM
Status of host host_mixed_2 was set to Up.


Jan 29, 2017 3:30:57 PM
VDSM host_mixed_2 command GetCapabilitiesVDS failed: Client close


Jan 29, 2017 3:30:39 PM
Storage Pool Manager runs on Host host_mixed_3 (Address: storage-ge4-vdsm3.qa.lab.tlv.redhat.com).


Jan 29, 2017 3:30:38 PM
VDSM host_mixed_2 command HSMGetAllTasksStatusesVDS failed: Not SPM: ()


Jan 29, 2017 3:30:38 PM
Invalid status on Data Center golden_env_mixed. Setting status to Non Responsive.


Jan 29, 2017 3:30:34 PM
Host host_mixed_2 is not responding. Host cannot be fenced automatically because power management for the host is disabled.


Jan 29, 2017 3:30:33 PM


Jan 29, 2017 3:30:18 PM
User admin@internal-authz moving disk preallocated_disk to domain iscsi_1.

Jan 29, 2017 3:30:16 PM
The disk 'preallocated_disk' was successfully added.

Comment 1 Raz Tamir 2017-01-29 17:58:23 UTC

Might be related to new HSM infrastructure on cold move disk

Comment 2 Tal Nisan 2017-01-29 18:38:47 UTC

Liron, seems to me like a duplicate of 1415502, isn't it?

Comment 3 Liron Aravot 2017-01-30 13:37:04 UTC

Tal, Indeed.

*** This bug has been marked as a duplicate of bug 1415502 ***