Bug 1218325 - All in one: getSpmStatus failing with StorageDomainMasterError: Error validating master storage domain: ('Version or spm id invalid',)
Summary: All in one: getSpmStatus failing with StorageDomainMasterError: Error validat...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: 3.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: m1
: 3.6.0
Assignee: Liron Aravot
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks: 1155425
TreeView+ depends on / blocked
 
Reported: 2015-05-04 15:34 UTC by Sandro Bonazzola
Modified: 2016-03-10 06:17 UTC (History)
15 users (show)

Fixed In Version: ovirt-3.6.0-alpha1.2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-04 12:58:29 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
vdsm logs (453.95 KB, application/x-xz)
2015-05-04 15:36 UTC, Sandro Bonazzola
no flags Details
log collector report (12.15 MB, application/x-xz)
2015-05-06 11:57 UTC, Sandro Bonazzola
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 40652 0 master MERGED clusterLock: fix returned retval Never

Description Sandro Bonazzola 2015-05-04 15:34:44 UTC
Installed all-in-one from nightly, the storage domain becomes unstable and vdsm logs show:

Thread-924::INFO::2015-05-04 13:13:54,531::logUtils::48::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID=u'd9939338-7fbf-451f-a87b-448a50b77685', options=None)
Thread-924::ERROR::2015-05-04 13:13:54,531::hsm::640::Storage.HSM::(getSpmStatus) Non existent or invalid MD key
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 631, in getSpmStatus
    status = self._getSpmStatusInfo(pool)
  File "/usr/share/vdsm/storage/hsm.py", line 625, in _getSpmStatusInfo
    (pool.spmRole,) + pool.getSpmStatus()))
  File "/usr/share/vdsm/storage/sp.py", line 126, in getSpmStatus
    return self._backend.getSpmStatus()
  File "/usr/share/vdsm/storage/spbackends.py", line 416, in getSpmStatus
    lVer, spmId = self.masterDomain.inquireClusterLock()
ValueError: too many values to unpack
Thread-924::ERROR::2015-05-04 13:13:54,531::task::863::Storage.TaskManager.Task::(_setError) Task=`a74ce819-8155-486f-b55c-6887b50538a0`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 870, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 641, in getSpmStatus
    raise se.StorageDomainMasterError("Version or spm id invalid")
StorageDomainMasterError: Error validating master storage domain: ('Version or spm id invalid',)
Thread-924::DEBUG::2015-05-04 13:13:54,531::task::882::Storage.TaskManager.Task::(_run) Task=`a74ce819-8155-486f-b55c-6887b50538a0`::Task._run: a74ce819-8155-486f-b55c-6887b50538a0 (u'd9939338-7fbf-451f-a87b-448
a50b77685',) {} failed - stopping task



ovirt-engine-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-backend-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-cli-3.6.0.0-0.2.20150428.git4f69cc9.el7.centos.noarch
ovirt-engine-dbscripts-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-extensions-api-impl-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-jboss-as-7.1.1-1.el7.x86_64
ovirt-engine-lib-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-restapi-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-sdk-python-3.6.0.0-0.11.20150421.gitf9dc275.el7.centos.noarch
ovirt-engine-setup-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-setup-base-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-setup-plugin-allinone-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-setup-plugin-ovirt-engine-common-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-setup-plugin-websocket-proxy-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-tools-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-userportal-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-webadmin-portal-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-engine-websocket-proxy-3.6.0-0.0.master.20150503172216.git542f29f.el7.centos.noarch
ovirt-host-deploy-1.4.0-0.0.master.20150420172155.gitca4f58b.el7.noarch
ovirt-host-deploy-java-1.4.0-0.0.master.20150420172155.gitca4f58b.el7.noarch
ovirt-host-deploy-offline-1.4.0-0.0.master.20150420172155.gitca4f58b.el7.x86_64
ovirt-image-uploader-3.6.0-0.0.master.20150128151259.git3f60704.el7.noarch
ovirt-iso-uploader-3.6.0-0.0.master.20150410144603.git1a680f9.el7.noarch
ovirt-release35-003-1.noarch
ovirt-release-master-001-0.7.master.noarch
vdsm-4.17.0-743.gite5856da.el7.x86_64
vdsm-cli-4.17.0-743.gite5856da.el7.noarch
vdsm-infra-4.17.0-743.gite5856da.el7.noarch
vdsm-jsonrpc-4.17.0-743.gite5856da.el7.noarch
vdsm-jsonrpc-java-1.1.1-0.0.master.20150430093147.git6efacdc.el7.noarch
vdsm-python-4.17.0-743.gite5856da.el7.noarch
vdsm-python-zombiereaper-4.16.14-0.el7.noarch
vdsm-xmlrpc-4.17.0-743.gite5856da.el7.noarch
vdsm-yajsonrpc-4.17.0-743.gite5856da.el7.noarch

Comment 1 Sandro Bonazzola 2015-05-04 15:36:06 UTC
Created attachment 1021787 [details]
vdsm logs

Comment 2 Allon Mureinik 2015-05-04 15:59:53 UTC
Related to the metadata changes in 3.5?

Comment 3 Sandro Bonazzola 2015-05-06 11:52:24 UTC
Blocking oVirt 3.6.0 Alpha release on this.
Can't get the storage domain up and running so no test can be done on alpha.

reproduced on a clean CentOS 7.1 setup.
I'll try to attach log collector data.

Comment 4 Sandro Bonazzola 2015-05-06 11:57:37 UTC
Created attachment 1022637 [details]
log collector report

Comment 5 Liron Aravot 2015-05-07 09:09:10 UTC
Sandro, can you please attach the sanlock log as well?

thanks.

Comment 6 Sandro Bonazzola 2015-05-07 09:26:58 UTC
it's included in attachment #1022637 [details]

Comment 7 Sandro Bonazzola 2015-05-07 09:27:28 UTC
restoring needinfo on Liron

Comment 8 Liron Aravot 2015-05-07 10:02:36 UTC
the logcollector doesn't include the sanlock log from the time the issue occured (at least from what i could find there).
can you please attach the log from the time of the bug?

Comment 9 Liron Aravot 2015-05-07 10:35:02 UTC
as the use here is of allinone setup, we use LocalLock instead of SANLock.
The issue was caused by this change 
https://gerrit.ovirt.org/#/c/38552 which added a third element to the returned tuple.

Comment 10 Elad 2015-06-16 12:49:49 UTC
Installed all-in-one using ovirt-3.6.0-alpha1.2.
Setup succeeded and the local storage domain is stable. I created images on it and it went fine.

ovirt-engine-setup-plugin-allinone-3.6.0-0.0.master.20150519172222.git9a2e2b3.el7.centos.noarch
vdsm-4.17.0-912.git25a063d.el7.noarch
ovirt-engine-3.6.0-0.0.master.20150519172222.git9a2e2b3.el7.centos.noarch

Comment 11 Sandro Bonazzola 2015-11-04 12:58:29 UTC
oVirt 3.6.0 has been released on November 4th, 2015 and should fix this issue.
If problems still persist, please open a new BZ and reference this one.


Note You need to log in before you can comment on or make changes to this bug.