Bug 1218551 - [vdsm] "Error validating master storage domain: ('Version or spm id invalid',)" while host connected to a local storage pool
Summary: [vdsm] "Error validating master storage domain: ('Version or spm id invalid',...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.6
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: m1
: 3.6.0
Assignee: Liron Aravot
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks: 1099412
TreeView+ depends on / blocked
 
Reported: 2015-05-05 09:04 UTC by Elad
Modified: 2016-03-10 06:18 UTC (History)
14 users (show)

Fixed In Version: v4.17.0.4
Clone Of:
Environment:
Last Closed: 2015-11-04 13:57:33 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
logs from host and engine (1.09 MB, application/x-gzip)
2015-05-05 09:04 UTC, Elad
no flags Details

Description Elad 2015-05-05 09:04:17 UTC
Created attachment 1022116 [details]
logs from host and engine

Description of problem:
On a local DC, the master domain cannot be activated since the host SPM is:

[root@green-vdsc 5337773e-2135-4005-87a1-a749d139b2b9]# vdsClient -s 0 getSpmStatus c49b02ff-9fa7-480d-a0a0-701f8241b253
Error validating master storage domain: ('Version or spm id invalid',)


Version-Release number of selected component (if applicable):
ovirt-3.6.0-1
vdsm-4.17.0-632.git19a83a2.el7.x86_64
ovirt-engine-3.6.0-0.0.master.20150412172306.git55ba764.el6.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a local DC
2. Create a new local data domain attached to the local DC
3.

Actual results:
Storage domain moves to active and right after to inactive. 

On engine:

2015-05-05 11:37:16,566 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-7) [6d2911b0] Command 'SpmStatusVDSCommand(HostName = green-vdsc, HostId = 13038b02-01c1-4f1e-a045-5d87bd9d0002, storagePoolId = c49b02ff-9fa7-480d-a0a0-701f8241b253)' execution failed: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('Version or spm id invalid',)

On vdsm:

Thread-53978::ERROR::2015-05-05 11:59:42,234::hsm::639::Storage.HSM::(getSpmStatus) Non existent or invalid MD key
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 630, in getSpmStatus
    status = self._getSpmStatusInfo(pool)
  File "/usr/share/vdsm/storage/hsm.py", line 624, in _getSpmStatusInfo
    (pool.spmRole,) + pool.getSpmStatus()))
  File "/usr/share/vdsm/storage/sp.py", line 126, in getSpmStatus
    return self._backend.getSpmStatus()
  File "/usr/share/vdsm/storage/spbackends.py", line 416, in getSpmStatus
    lVer, spmId = self.masterDomain.inquireClusterLock()
ValueError: too many values to unpack
Thread-53978::ERROR::2015-05-05 11:59:42,234::task::863::Storage.TaskManager.Task::(_setError) Task=`782faafa-f2a5-45d8-94f6-7d1b0dbba06f`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 870, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 640, in getSpmStatus
    raise se.StorageDomainMasterError("Version or spm id invalid")
StorageDomainMasterError: Error validating master storage domain: ('Version or spm id invalid',)
Thread-53978::DEBUG::2015-05-05 11:59:42,234::task::882::Storage.TaskManager.Task::(_run) Task=`782faafa-f2a5-45d8-94f6-7d1b0dbba06f`::Task._run: 782faafa-f2a5-45d8-94f6-7d1b0dbba06f (u'c49b02ff-9fa7-480d-a0a0-701f8241b253',) {} failed - stopping task



Expected results:
Local storage pool should be initialized correctly  

Additional info:
logs from host and engine

Comment 1 Sven Kieske 2015-06-02 12:38:00 UTC
this should block the 3.6. release tracker bug as it's a regression

Comment 2 Elad 2015-06-17 19:37:15 UTC
Tal, it might be that this bug got fixed by https://gerrit.ovirt.org/#/c/40652/

Comment 3 Liron Aravot 2015-07-01 11:11:23 UTC
Elad, the issue seems to be solved by the provided patch.
please test again just to be sure and report if you manage to reproduce it.

thanks,
Liron.

Comment 4 Allon Mureinik 2015-07-01 13:40:26 UTC
Moving to ON_QA based on this statement.

Comment 5 Elad 2015-07-08 06:09:35 UTC
The issue is now fixed, local data domain remains active, tested basic sanity on it (create disks, run VM and activated another local data domain in the DC).

Verified using 
vdsm-4.17.0-1054.git562e711.el7.noarch

Comment 6 Sandro Bonazzola 2015-11-04 13:57:33 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.


Note You need to log in before you can comment on or make changes to this bug.