Bug 1327121 - VDSM reports storage domain as 'either partially accessible or entirely inaccessible'
Summary: VDSM reports storage domain as 'either partially accessible or entirely inac...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.17.23.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On: 1327102 1327516
Blocks: Gluster-HC-1 1361547
TreeView+ depends on / blocked
 
Reported: 2016-04-14 10:04 UTC by RamaKasturi
Modified: 2016-08-04 07:04 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1327102
: 1361547 (view as bug list)
Environment:
RHEV RHGS HCI RHEL 7.2
Last Closed: 2016-08-04 07:04:14 UTC
oVirt Team: Gluster
Embargoed:
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)

Description RamaKasturi 2016-04-14 10:04:09 UTC
+++ This bug was initially created as a clone of Bug #1327102 +++

Description of problem:
Events tab of UI and vdsm logs always logs that storage domain is 'either partially accessible or entirely inaccessible' even when all the domains in the UI shows active and functional.

Version-Release number of selected component (if applicable):
vdsm-4.17.23.2-1.1.el7ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Install HC setup.
2.
3.

Actual results:
Events tab of UI and vdsm logs reports 'storage domain is either partially accessible or entirely inaccessible'.

Expected results:
Events tab of UI should not get flodded with the message 'either partially accessible or entirely inaccessible' since storage domains are accessible and in active state.

Additional info:

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-04-14 05:47:46 EDT ---

This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from RamaKasturi on 2016-04-14 05:48:50 EDT ---

Thread-12855::DEBUG::2016-04-12
15:14:23,199::task::595::Storage.TaskManager.Task::(_updateState)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::moving from state init ->
state preparing
Thread-12855::INFO::2016-04-12
15:14:23,200::logUtils::48::dispatcher::(wrapper) Run and protect:
getStorageDomainInfo(sdUUID='1c1ce771-e9e9-4a78-ae28-2006442e6cd6',
options=None)
Thread-12855::INFO::2016-04-12
15:14:23,200::fileSD::357::Storage.StorageDomain::(validate)
sdUUID=1c1ce771-e9e9-4a78-ae28-2006442e6cd6
Thread-12855::DEBUG::2016-04-12
15:14:23,201::persistentDict::234::Storage.PersistentDict::(refresh)
read lines (FileMetadataRW)=[]
Thread-12855::DEBUG::2016-04-12
15:14:23,201::persistentDict::252::Storage.PersistentDict::(refresh)
Empty metadata
Thread-12855::ERROR::2016-04-12
15:14:23,201::task::866::Storage.TaskManager.Task::(_setError)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2835, in getStorageDomainInfo
    dom = self.validateSdUUID(sdUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
    sdDom.validate()
  File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
    raise se.StorageDomainAccessError(self.sdUUID)
StorageDomainAccessError: Domain is either partially accessible or
entirely inaccessible: (u'1c1ce771-e9e9-4a78-ae28-2006442e6cd6',)
Thread-12855::DEBUG::2016-04-12
15:14:23,202::task::885::Storage.TaskManager.Task::(_run)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::Task._run:
beb3ab38-a9b2-49c5-ba8c-50bb29caad7f
('1c1ce771-e9e9-4a78-ae28-2006442e6cd6',) {} failed - stopping task
Thread-12855::DEBUG::2016-04-12
15:14:23,202::task::1246::Storage.TaskManager.Task::(stop)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::stopping in state
preparing (force False)
Thread-12855::DEBUG::2016-04-12
15:14:23,202::task::993::Storage.TaskManager.Task::(_decref)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::ref 1 aborting True
Thread-12855::INFO::2016-04-12
15:14:23,202::task::1171::Storage.TaskManager.Task::(prepare)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::aborting: Task is
aborted: 'Domain is either partially accessible or entirely
inaccessible' - code 379
Thread-12855::DEBUG::2016-04-12
15:14:23,202::task::1176::Storage.TaskManager.Task::(prepare)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::Prepare: aborted: Domain
is either partially accessible or entirely inaccessible
Thread-12855::DEBUG::2016-04-12
15:14:23,203::task::993::Storage.TaskManager.Task::(_decref)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::ref 0 aborting True
Thread-12855::DEBUG::2016-04-12
15:14:23,203::task::928::Storage.TaskManager.Task::(_doAbort)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::Task._doAbort: force
False
Thread-12855::DEBUG::2016-04-12
15:14:23,203::resourceManager::980::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-12855::DEBUG::2016-04-12
15:14:23,203::task::595::Storage.TaskManager.Task::(_updateState)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::moving from state
preparing -> state aborting
Thread-12855::DEBUG::2016-04-12
15:14:23,203::task::550::Storage.TaskManager.Task::(__state_aborting)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::_aborting: recover policy
none
Thread-12855::DEBUG::2016-04-12
15:14:23,203::task::595::Storage.TaskManager.Task::(_updateState)
Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::moving from state
aborting -> state failed
Thread-12855::DEBUG::2016-04-12
15:14:23,203::resourceManager::943::Storage.ResourceManager.Owner::(releaseAll)
Owner.releaseAll requests {} resources {}
Thread-12855::DEBUG::2016-04-12
15:14:23,203::resourceManager::980::Storage.ResourceManager.Owner::(cancelAll)
Owner.cancelAll requests {}
Thread-12855::ERROR::2016-04-12
15:14:23,204::dispatcher::76::Storage.Dispatcher::(wrapper) {'status':
{'message': "Domain is either partially accessible or entirely
inaccessible: (u'1c1ce771-e9e9-4a78-ae28-2006442e6cd6',)", 'code':
379}}
Thread-12855::INFO::2016-04-12
15:14:23,205::xmlrpc::92::vds.XMLRPCServer::(_process_requests)
Request handler for 127.0.0.1:43377 stopped

Comment 1 SATHEESARAN 2016-04-18 11:10:05 UTC
Update from Simone :

On 04/12/2016 07:38 PM, Simone Tiraboschi wrote:
> Hi,
> on my opinion the issue is here:
> we call getStorageDomainInfo on the hosted-engine storage domain
> ('1c1ce771-e9e9-4a78-ae28-2006442e6cd6') but for any reasons it fails
> within VDSM ("Domain is either partially accessible or entirely
> inaccessible:)
> and so the error accessing it.
> Now the issue is understanding why VDSM reports it as 'either
> partially accessible or entirely inaccessible'

Comment 2 Sahina Bose 2016-04-21 06:37:57 UTC
Is there any impact due to this error? Is hosted-engine --vm-status giving error?

Comment 3 RamaKasturi 2016-04-25 06:19:10 UTC
There is no impact due this to error but will give a false impression to the user. hosted-engine --vm-status does not give any error. It works fine.

Comment 4 Yaniv Lavi 2016-05-02 11:43:35 UTC
Moving to gluster since this seems like a HCI specific issue. If you can reproduce this on non-HCI, please open a different bug with steps to reproduce.

Comment 5 Simone Tiraboschi 2016-05-02 12:01:39 UTC
I think that this simply happens because, in order to avoid the SPOF issue, we try to mount the hosted-engine gluster volume from localhost:/volume 
The issue is that obviously localhost differently resolves on different hosts resulting in 'either partially accessible or entirely inaccessible' if just one of the VDSM hosts is not able to talk with the gluster daemon locally running.

So using localhost fro gluster, instead of resolving the single point of failure issue on the gluster entry point, create an every point of failure where a single host unable to locally access gluster flags the storage domain as 'either partially accessible or entirely inaccessible'.

Comment 6 Sahina Bose 2016-05-02 12:56:03 UTC
(In reply to Simone Tiraboschi from comment #5)
> I think that this simply happens because, in order to avoid the SPOF issue,
> we try to mount the hosted-engine gluster volume from localhost:/volume 
> The issue is that obviously localhost differently resolves on different
> hosts resulting in 'either partially accessible or entirely inaccessible' if
> just one of the VDSM hosts is not able to talk with the gluster daemon
> locally running.
> 
> So using localhost fro gluster, instead of resolving the single point of
> failure issue on the gluster entry point, create an every point of failure
> where a single host unable to locally access gluster flags the storage
> domain as 'either partially accessible or entirely inaccessible'.

Simone, this error was seen when HE storage domain was mounted using one of the servers - not localhost:/engine but server1:/engine

Comment 7 Sahina Bose 2016-07-05 08:20:39 UTC
With 3.6.7 and the backup-volfile-server support for HE storage domain, have not been able to reproduce this. Kasturi, can you check if you see this in your setup?

Comment 8 Yaniv Kaul 2016-07-11 10:59:56 UTC
(In reply to Sahina Bose from comment #7)
> With 3.6.7 and the backup-volfile-server support for HE storage domain, have
> not been able to reproduce this. Kasturi, can you check if you see this in
> your setup?

I'd like to CLOSE-WONTFIX if it is not reproducible. Please promptly reproduce or close.

Comment 9 Sandro Bonazzola 2016-07-29 11:18:09 UTC
3.6 is gone EOL; Please re-target this bug to a 4.0 release.

Comment 10 RamaKasturi 2016-08-04 06:26:05 UTC
Do not see this issue happening with 3.6.7 / 3.6.8. will reopen in case this issue is seen again.

Comment 11 Sahina Bose 2016-08-04 07:04:14 UTC
Based on Comment 10


Note You need to log in before you can comment on or make changes to this bug.