Bug 1107585
| Summary: | Failed to activate Storage Domain: SanlockException(5, 'Sanlock lockspace add failure', 'Input/output error') | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | lily <junli.li> | ||||
| Component: | vdsm | Assignee: | Dan Kenigsberg <danken> | ||||
| Status: | CLOSED WORKSFORME | QA Contact: | Gil Klein <gklein> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.3 | CC: | amureini, bazulay, bugs, gklein, iheim, junli.li, mgoldboi, pedro.aleluia, rbalakri, sabose, yeylon | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.5.0 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | gluster | ||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-10-01 14:26:27 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Gluster | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Lily, can you please add oVirt Engine, VDSM and sanlock logs? Created attachment 907500 [details]
logs for this bug
including engine.log, vdsm.log and sanlock.log
This is the typical behavior when the gluster mount either doesn't have the quorum or there's a split-brain. 2014-06-10 11:00:55+0800 1627 [6981]: sanlock daemon started 2.8 host ead0bc03-69f7-487f-8460-3865bd9f8637.su04 2014-06-10 11:02:55+0800 1747 [6986]: s1 lockspace c13bf545-2d42-4ee8-8ca1-a1413e01595e:1:/rhev/data-center/mnt/glusterSD/su04:neofs-images/c13bf545-2d42-4ee8-8ca1-a1413e01595e/dom_md/ids:0 2014-06-10 11:02:55+0800 1747 [7419]: c13bf545 aio collect 0 0x7f90840008c0:0x7f90840008d0:0x7f9084001000 result -5:0 match res 2014-06-10 11:02:55+0800 1747 [7419]: read_sectors delta_leader offset 0 rv -5 /rhev/data-center/mnt/glusterSD/su04:neofs-images/c13bf545-2d42-4ee8-8ca1-a1413e01595e/dom_md/ids 2014-06-10 11:02:56+0800 1748 [6986]: s1 add_lockspace fail result -5 2014-06-10 11:03:01+0800 1753 [6985]: s2 lockspace c13bf545-2d42-4ee8-8ca1-a1413e01595e:1:/rhev/data-center/mnt/glusterSD/su04:neofs-images/c13bf545-2d42-4ee8-8ca1-a1413e01595e/dom_md/ids:0 2014-06-10 11:03:01+0800 1753 [7436]: c13bf545 aio collect 0 0x7f90840008c0:0x7f90840008d0:0x7f9084101000 result -5:0 match res 2014-06-10 11:03:01+0800 1753 [7436]: read_sectors delta_leader offset 0 rv -5 /rhev/data-center/mnt/glusterSD/su04:neofs-images/c13bf545-2d42-4ee8-8ca1-a1413e01595e/dom_md/ids 2014-06-10 11:03:02+0800 1754 [6985]: s2 add_lockspace fail result -5 fail result -5 == EIO (Input/output error) Anyway we're not able to read/write from/to the gluster mount, there's nothing much that we can do (it's just as any other unreachable storage domain) If you add more logs and information about gluster maybe someone from the gluster team can help you. (In reply to Federico Simoncelli from comment #3) > Anyway we're not able to read/write from/to the gluster mount, there's > nothing much that we can do (it's just as any other unreachable storage > domain) > > If you add more logs and information about gluster maybe someone from the > gluster team can help you. Moving whiteboard to gluster to take charge of this issue. Same error, NFS Data (Master)
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 873, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 981, in createStoragePool
leaseParams)
File "/usr/share/vdsm/storage/sp.py", line 580, in create
self._acquireTemporaryClusterLock(msdUUID, leaseParams)
File "/usr/share/vdsm/storage/sp.py", line 512, in _acquireTemporaryClusterLock
msd.acquireHostId(self.id)
File "/usr/share/vdsm/storage/sd.py", line 477, in acquireHostId
self._clusterLock.acquireHostId(hostId, async)
File "/usr/share/vdsm/storage/clusterlock.py", line 237, in acquireHostId
raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: (u'69a52632-5037-400a-8761-276e624743fa', SanlockException(-203, 'Sanlock lockspace add failure', 'Sanlock exception'))
Thread-91::DEBUG::2014-09-07 13:48:40,289::task::885::Storage.TaskManager.Task::(_run) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::Task._run: aea3b71a-e312-4748-862a-28282e940cd6 (None, 'abb1264c-5e4d-48ab-b64b-253f01a0dd7d', 'Branch_Offices', '69a52632-5037-400a-8761-276e624743fa', ['69a52632-5037-400a-8761-276e624743fa'], 19, None, 5, 60, 10, 3) {} failed - stopping task
Thread-91::DEBUG::2014-09-07 13:48:40,290::task::1217::Storage.TaskManager.Task::(stop) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::stopping in state preparing (force False)
Thread-91::DEBUG::2014-09-07 13:48:40,290::task::993::Storage.TaskManager.Task::(_decref) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::ref 1 aborting True
Thread-91::INFO::2014-09-07 13:48:40,291::task::1171::Storage.TaskManager.Task::(prepare) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::aborting: Task is aborted: 'Cannot acquire host id' - code 661
Thread-91::DEBUG::2014-09-07 13:48:40,291::task::1176::Storage.TaskManager.Task::(prepare) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::Prepare: aborted: Cannot acquire host id
Thread-91::DEBUG::2014-09-07 13:48:40,291::task::993::Storage.TaskManager.Task::(_decref) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::ref 0 aborting True
Thread-91::DEBUG::2014-09-07 13:48:40,291::task::928::Storage.TaskManager.Task::(_doAbort) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::Task._doAbort: force False
Thread-91::DEBUG::2014-09-07 13:48:40,292::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-91::DEBUG::2014-09-07 13:48:40,292::task::595::Storage.TaskManager.Task::(_updateState) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::moving from state preparing -> state aborting
Thread-91::DEBUG::2014-09-07 13:48:40,292::task::550::Storage.TaskManager.Task::(__state_aborting) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::_aborting: recover policy none
Thread-91::DEBUG::2014-09-07 13:48:40,293::task::595::Storage.TaskManager.Task::(_updateState) Task=`aea3b71a-e312-4748-862a-28282e940cd6`::moving from state aborting -> state failed
Thread-91::DEBUG::2014-09-07 13:48:40,293::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.abb1264c-5e4d-48ab-b64b-253f01a0dd7d': < ResourceRef 'Storage.abb1264c-5e4d-48ab-b64b-253f01a0dd7d', isValid: 'True' obj: 'None'>, 'Storage.69a52632-5037-400a-8761-276e624743fa': < ResourceRef 'Storage.69a52632-5037-400a-8761-276e624743fa', isValid: 'True' obj: 'None'>}
Thread-91::DEBUG::2014-09-07 13:48:40,293::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-91::DEBUG::2014-09-07 13:48:40,294::resourceManager::616::Storage.ResourceManager::(releaseResource) Trying to release resource 'Storage.abb1264c-5e4d-48ab-b64b-253f01a0dd7d'
Thread-91::DEBUG::2014-09-07 13:48:40,294::resourceManager::635::Storage.ResourceManager::(releaseResource) Released resource 'Storage.abb1264c-5e4d-48ab-b64b-253f01a0dd7d' (0 active users)
Thread-91::DEBUG::2014-09-07 13:48:40,295::resourceManager::641::Storage.ResourceManager::(releaseResource) Resource 'Storage.abb1264c-5e4d-48ab-b64b-253f01a0dd7d' is free, finding out if anyone is waiting for it.
Thread-91::DEBUG::2014-09-07 13:48:40,295::resourceManager::649::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.abb1264c-5e4d-48ab-b64b-253f01a0dd7d', Clearing records.
Thread-91::DEBUG::2014-09-07 13:48:40,296::resourceManager::616::Storage.ResourceManager::(releaseResource) Trying to release resource 'Storage.69a52632-5037-400a-8761-276e624743fa'
Thread-91::DEBUG::2014-09-07 13:48:40,296::resourceManager::635::Storage.ResourceManager::(releaseResource) Released resource 'Storage.69a52632-5037-400a-8761-276e624743fa' (0 active users)
Thread-91::DEBUG::2014-09-07 13:48:40,296::resourceManager::641::Storage.ResourceManager::(releaseResource) Resource 'Storage.69a52632-5037-400a-8761-276e624743fa' is free, finding out if anyone is waiting for it.
Thread-91::DEBUG::2014-09-07 13:48:40,297::resourceManager::649::Storage.ResourceManager::(releaseResource) No one is waiting for resource 'Storage.69a52632-5037-400a-8761-276e624743fa', Clearing records.
Thread-91::ERROR::2014-09-07 13:48:40,297::dispatcher::76::Storage.Dispatcher::(wrapper) {'status': {'message': "Cannot acquire host id: (u'69a52632-5037-400a-8761-276e624743fa', SanlockException(-203, 'Sanlock lockspace add failure', 'Sanlock exception'))", 'code': 661}}
Thread-91::DEBUG::2014-09-07 13:48:43,047::BindingXMLRPC::1132::vds::(wrapper) client [10.100.207.8]::call volumesList with () {}
Thread-91::DEBUG::2014-09-07 13:48:43,065::BindingXMLRPC::1139::vds::(wrapper) return volumesList with {'status': {'message': 'Done', 'code': 0}, 'volumes': {}}
Thread-91::DEBUG::2014-09-07 13:48:48,576::BindingXMLRPC::1132::vds::(wrapper) client [10.100.207.8]::call volumesList with () {}
Thread-91::DEBUG::2014-09-07 13:48:48,594::BindingXMLRPC::1139::vds::(wrapper) return volumesList with {'status': {'message': 'Done', 'code': 0}, 'volumes': {}}
Thread-91::DEBUG::2014-09-07 13:48:52,298::task::595::Storage.TaskManager.Task::(_updateState) Task=`1cd58055-d4a8-425b-9b12-86355736dbf1`::moving from state init -> state preparing
Thread-91::INFO::2014-09-07 13:48:52,301::logUtils::44::dispatcher::(wrapper) Run and protect: repoStats(options=None)
Thread-91::INFO::2014-09-07 13:48:52,303::logUtils::47::dispatcher::(wrapper) Run and protect: repoStats, Return response: {}
Thread-91::DEBUG::2014-09-07 13:48:52,305::task::1191::Storage.TaskManager.Task::(prepare) Task=`1cd58055-d4a8-425b-9b12-86355736dbf1`::finished: {}
Thread-91::DEBUG::2014-09-07 13:48:52,307::task::595::Storage.TaskManager.Task::(_updateState) Task=`1cd58055-d4a8-425b-9b12-86355736dbf1`::moving from state preparing -> state finished
Thread-91::DEBUG::2014-09-07 13:48:52,309::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-91::DEBUG::2014-09-07 13:48:52,311::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-91::DEBUG::2014-09-07 13:48:52,314::task::993::Storage.TaskManager.Task::(_decref) Task=`1cd58055-d4a8-425b-9b12-86355736dbf1`::ref 0 aborting False
Thread-91::DEBUG::2014-09-07 13:48:54,119::BindingXMLRPC::1132::vds::(wrapper) client [10.100.207.8]::call volumesList with () {}
Thread-91::DEBUG::2014-09-07 13:48:54,138::BindingXMLRPC::1139::vds::(wrapper) return volumesList with {'status': {'message': 'Done', 'code': 0}, 'volumes': {}}
Thread-91::DEBUG::2014-09-07 13:48:59,658::BindingXMLRPC::1132::vds::(wrapper) client [10.100.207.8]::call volumesList with () {} flowID [69ac36bb]
Thread-91::DEBUG::2014-09-07 13:48:59,677::BindingXMLRPC::1139::vds::(wrapper) return volumesList with {'status': {'message': 'Done', 'code': 0}, 'volumes': {}}
Thread-91::DEBUG::2014-09-07 13:49:05,228::BindingXMLRPC::1132::vds::(wrapper) client [10.100.207.8]::call volumesList with () {}
We finally find that the file of /rhev/data-center/mnt/glusterSD/su04:neofs-images/c13bf545-2d42-4ee8-8ca1-a1413e01595e/dom_md/ids has split brain. After solving split brain, this bug disappeard. |
Description of problem: The Storage Domain's type is GlusterFS. Today its status changed from active to UNKnown. Then manullay activate it, but failed. vdsm.log: 9a3d82da-cc52-4f1a-bf2d-ad43c44555ed::ERROR::2014-06-10 11:03:02,071::task::850::TaskManager.Task::(_setError) Task=`9a3d82da-cc52-4f1a-bf2d-ad43c44555ed`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 318, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/sp.py", line 273, in startSpm self.masterDomain.acquireHostId(self.id) File "/usr/share/vdsm/storage/sd.py", line 458, in acquireHostId self._clusterLock.acquireHostId(hostId, async) File "/usr/share/vdsm/storage/clusterlock.py", line 189, in acquireHostId raise se.AcquireHostIdFailure(self._sdUUID, e) AcquireHostIdFailure: Cannot acquire host id: ('c13bf545-2d42-4ee8-8ca1-a1413e01595e', SanlockException(5, 'Sanlock lockspace add failure', 'Input/output error')) Version-Release number of selected component (if applicable): How reproducible: Rare Steps to Reproduce: 1. 2. 3. Actual results: Expected results: activate succesfully Additional info: