+++ This bug was initially created as a clone of Bug #1327102 +++ Description of problem: Events tab of UI and vdsm logs always logs that storage domain is 'either partially accessible or entirely inaccessible' even when all the domains in the UI shows active and functional. Version-Release number of selected component (if applicable): vdsm-4.17.23.2-1.1.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Install HC setup. 2. 3. Actual results: Events tab of UI and vdsm logs reports 'storage domain is either partially accessible or entirely inaccessible'. Expected results: Events tab of UI should not get flodded with the message 'either partially accessible or entirely inaccessible' since storage domains are accessible and in active state. Additional info: --- Additional comment from Red Hat Bugzilla Rules Engine on 2016-04-14 05:47:46 EDT --- This bug is automatically being proposed for the current z-stream release of Red Hat Gluster Storage 3 by setting the release flag 'rhgs‑3.1.z' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from RamaKasturi on 2016-04-14 05:48:50 EDT --- Thread-12855::DEBUG::2016-04-12 15:14:23,199::task::595::Storage.TaskManager.Task::(_updateState) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::moving from state init -> state preparing Thread-12855::INFO::2016-04-12 15:14:23,200::logUtils::48::dispatcher::(wrapper) Run and protect: getStorageDomainInfo(sdUUID='1c1ce771-e9e9-4a78-ae28-2006442e6cd6', options=None) Thread-12855::INFO::2016-04-12 15:14:23,200::fileSD::357::Storage.StorageDomain::(validate) sdUUID=1c1ce771-e9e9-4a78-ae28-2006442e6cd6 Thread-12855::DEBUG::2016-04-12 15:14:23,201::persistentDict::234::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=[] Thread-12855::DEBUG::2016-04-12 15:14:23,201::persistentDict::252::Storage.PersistentDict::(refresh) Empty metadata Thread-12855::ERROR::2016-04-12 15:14:23,201::task::866::Storage.TaskManager.Task::(_setError) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2835, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID sdDom.validate() File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate raise se.StorageDomainAccessError(self.sdUUID) StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: (u'1c1ce771-e9e9-4a78-ae28-2006442e6cd6',) Thread-12855::DEBUG::2016-04-12 15:14:23,202::task::885::Storage.TaskManager.Task::(_run) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::Task._run: beb3ab38-a9b2-49c5-ba8c-50bb29caad7f ('1c1ce771-e9e9-4a78-ae28-2006442e6cd6',) {} failed - stopping task Thread-12855::DEBUG::2016-04-12 15:14:23,202::task::1246::Storage.TaskManager.Task::(stop) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::stopping in state preparing (force False) Thread-12855::DEBUG::2016-04-12 15:14:23,202::task::993::Storage.TaskManager.Task::(_decref) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::ref 1 aborting True Thread-12855::INFO::2016-04-12 15:14:23,202::task::1171::Storage.TaskManager.Task::(prepare) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::aborting: Task is aborted: 'Domain is either partially accessible or entirely inaccessible' - code 379 Thread-12855::DEBUG::2016-04-12 15:14:23,202::task::1176::Storage.TaskManager.Task::(prepare) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::Prepare: aborted: Domain is either partially accessible or entirely inaccessible Thread-12855::DEBUG::2016-04-12 15:14:23,203::task::993::Storage.TaskManager.Task::(_decref) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::ref 0 aborting True Thread-12855::DEBUG::2016-04-12 15:14:23,203::task::928::Storage.TaskManager.Task::(_doAbort) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::Task._doAbort: force False Thread-12855::DEBUG::2016-04-12 15:14:23,203::resourceManager::980::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-12855::DEBUG::2016-04-12 15:14:23,203::task::595::Storage.TaskManager.Task::(_updateState) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::moving from state preparing -> state aborting Thread-12855::DEBUG::2016-04-12 15:14:23,203::task::550::Storage.TaskManager.Task::(__state_aborting) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::_aborting: recover policy none Thread-12855::DEBUG::2016-04-12 15:14:23,203::task::595::Storage.TaskManager.Task::(_updateState) Task=`beb3ab38-a9b2-49c5-ba8c-50bb29caad7f`::moving from state aborting -> state failed Thread-12855::DEBUG::2016-04-12 15:14:23,203::resourceManager::943::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-12855::DEBUG::2016-04-12 15:14:23,203::resourceManager::980::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-12855::ERROR::2016-04-12 15:14:23,204::dispatcher::76::Storage.Dispatcher::(wrapper) {'status': {'message': "Domain is either partially accessible or entirely inaccessible: (u'1c1ce771-e9e9-4a78-ae28-2006442e6cd6',)", 'code': 379}} Thread-12855::INFO::2016-04-12 15:14:23,205::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:43377 stopped
Update from Simone : On 04/12/2016 07:38 PM, Simone Tiraboschi wrote: > Hi, > on my opinion the issue is here: > we call getStorageDomainInfo on the hosted-engine storage domain > ('1c1ce771-e9e9-4a78-ae28-2006442e6cd6') but for any reasons it fails > within VDSM ("Domain is either partially accessible or entirely > inaccessible:) > and so the error accessing it. > Now the issue is understanding why VDSM reports it as 'either > partially accessible or entirely inaccessible'
Is there any impact due to this error? Is hosted-engine --vm-status giving error?
There is no impact due this to error but will give a false impression to the user. hosted-engine --vm-status does not give any error. It works fine.
Moving to gluster since this seems like a HCI specific issue. If you can reproduce this on non-HCI, please open a different bug with steps to reproduce.
I think that this simply happens because, in order to avoid the SPOF issue, we try to mount the hosted-engine gluster volume from localhost:/volume The issue is that obviously localhost differently resolves on different hosts resulting in 'either partially accessible or entirely inaccessible' if just one of the VDSM hosts is not able to talk with the gluster daemon locally running. So using localhost fro gluster, instead of resolving the single point of failure issue on the gluster entry point, create an every point of failure where a single host unable to locally access gluster flags the storage domain as 'either partially accessible or entirely inaccessible'.
(In reply to Simone Tiraboschi from comment #5) > I think that this simply happens because, in order to avoid the SPOF issue, > we try to mount the hosted-engine gluster volume from localhost:/volume > The issue is that obviously localhost differently resolves on different > hosts resulting in 'either partially accessible or entirely inaccessible' if > just one of the VDSM hosts is not able to talk with the gluster daemon > locally running. > > So using localhost fro gluster, instead of resolving the single point of > failure issue on the gluster entry point, create an every point of failure > where a single host unable to locally access gluster flags the storage > domain as 'either partially accessible or entirely inaccessible'. Simone, this error was seen when HE storage domain was mounted using one of the servers - not localhost:/engine but server1:/engine
With 3.6.7 and the backup-volfile-server support for HE storage domain, have not been able to reproduce this. Kasturi, can you check if you see this in your setup?
(In reply to Sahina Bose from comment #7) > With 3.6.7 and the backup-volfile-server support for HE storage domain, have > not been able to reproduce this. Kasturi, can you check if you see this in > your setup? I'd like to CLOSE-WONTFIX if it is not reproducible. Please promptly reproduce or close.
3.6 is gone EOL; Please re-target this bug to a 4.0 release.
Do not see this issue happening with 3.6.7 / 3.6.8. will reopen in case this issue is seen again.
Based on Comment 10