Description of problem: ------------------------ After deploying the hosted-engine on one host, there were lots of error messages in vdsm.log on that host Version-Release number of selected component (if applicable): -------------------------------------------------------------- RHEV 3.6.5.3-1 ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.3-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.5.0-1.el7ev.noarch How reproducible: ----------------- Always Steps to Reproduce: -------------------- 1. On RHEL 7.2 perform deploy hosted-engine ( hosted-engine --deploy ) 2. Use glusterfs volume ( replica 3 + sharded volume ) for the hosted-engine backend 3. After hosted-engine deployment is successful, check for vdsm logs Actual results: --------------- vdsm logs are populated with errors and tracebacks Expected results: ------------------ There shouldn't be any errors or tracebacks Additional info: ---------------- Thread-16900::ERROR::2016-04-28 22:01:25,673::monitor::394::Storage.Monitor::(_releaseHostId) Error releasing host id 1 for domain c75896b9-76d7-4327-b8bc-d2aa593e5e1b Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 391, in _releaseHostId self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 536, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 249, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'c75896b9-76d7-4327-b8bc-d2aa593e5e1b', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) <--cut--> Thread-18998::DEBUG::2016-04-28 22:20:26,544::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`bbfca596-dfb3-45b3-b259-6612e5b0c0ba`::_resourcesAcquired: Storage.c75896b9-76d7-4327-b8bc-d2aa593e5e1b (shared) Thread-18998::DEBUG::2016-04-28 22:20:26,544::task::993::Storage.TaskManager.Task::(_decref) Task=`bbfca596-dfb3-45b3-b259-6612e5b0c0ba`::ref 1 aborting False jsonrpc.Executor/2::ERROR::2016-04-28 22:20:26,552::API::1869::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1849, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 105, in get_all_stats stats = broker.get_stats_from_storage(service) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage result = self._checked_communicate(request) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate .format(message or response)) RequestError: Request failed: failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/localhost:_enginevol/c75896b9-76d7-4327-b8bc-d2aa593e5e1b/ha_agent/hosted-engine.metad ata' <--cut--> Thread-19064::ERROR::2016-04-28 22:21:06,079::API::1869::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1849, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': 'c75896b9-76d7-4327-b8bc-d2aa593e5e1b'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_back ends.BackendFailureException'> <--cut--> Thread-19088::ERROR::2016-04-28 22:21:40,409::monitor::394::Storage.Monitor::(_releaseHostId) Error releasing host id 1 for domain c75896b9-76d7-4327-b8bc-d2aa593e5e1b Traceback (most recent call last): File "/usr/share/vdsm/storage/monitor.py", line 391, in _releaseHostId self.domain.releaseHostId(self.hostId, unused=True) File "/usr/share/vdsm/storage/sd.py", line 536, in releaseHostId self._clusterLock.releaseHostId(hostId, async, unused) File "/usr/share/vdsm/storage/clusterlock.py", line 249, in releaseHostId raise se.ReleaseHostIdFailure(self._sdUUID, e) ReleaseHostIdFailure: Cannot release host id: (u'c75896b9-76d7-4327-b8bc-d2aa593e5e1b', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy')) <--cut--> Thread-19126::ERROR::2016-04-28 22:21:40,907::task::866::Storage.TaskManager.Task::(_setError) Task=`72eece33-b9b9-4a0a-80f0-a83dd7fbea0c`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2835, in getStorageDomainInfo dom = self.validateSdUUID(sdUUID) File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID sdDom.validate() File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate raise se.StorageDomainAccessError(self.sdUUID) StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: (u'c75896b9-76d7-4327-b8bc-d2aa593e5e1b',) <--cut--> Thread-19140::ERROR::2016-04-28 22:21:41,801::task::866::Storage.TaskManager.Task::(_setError) Task=`5091d821-479b-4c1c-8908-5a6cdba3550f`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 49, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3310, in getImagesList images = dom.getAllImages() File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages self.getPools()[0], IndexError: list index out of range <--cut---> Thread-19140::ERROR::2016-04-28 22:21:41,803::dispatcher::79::Storage.Dispatcher::(wrapper) list index out of range Traceback (most recent call last): File "/usr/share/vdsm/storage/dispatcher.py", line 71, in wrapper result = ctask.prepare(func, *args, **kwargs) File "/usr/share/vdsm/storage/task.py", line 104, in wrapper return m(self, *a, **kw) File "/usr/share/vdsm/storage/task.py", line 1179, in prepare raise self.error IndexError: list index out of range
But there are no functional issues, hosted-engine is up and running on this host # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : cambridge.lab.eng.blr.redhat.com Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 7cfd99bf Host timestamp : 34339
Created attachment 1151954 [details] vdsm.log
Isn't this a duplicate of bug 1331526?
(In reply to Sahina Bose from comment #3) > Isn't this a duplicate of bug 1331526? There are multiple error messages in vdsm.log. One error messages is related to sanlock - unable to add lockspace, while the other error messages says "IndexError: list out of range" I am not able to figure out what is the cause for all, so created a bug for error messages in {broker[1], agent[2]}.log If all issues are cause of one another, then this bug can be marked as duplicate of other. The other issues are - [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1331526 [2] - https://bugzilla.redhat.com/show_bug.cgi?id=1331514
Created attachment 1152171 [details] sosreport from the node
Created attachment 1152198 [details] supervdsm.log
Created attachment 1152199 [details] broker.log
Created attachment 1152200 [details] agent.log
Moving to first RC, since things should not be targeted to second one at this point.
IndexError: list out of range on getImagesList is just a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1274622 *** This bug has been marked as a duplicate of bug 1274622 ***