1331503 – Multiple error messages seen in vdsm.log after deploying hosted-engine on one host

Bug 1331503 - Multiple error messages seen in vdsm.log after deploying hosted-engine on one host

Summary: Multiple error messages seen in vdsm.log after deploying hosted-engine on one...

Keywords:
Status:	CLOSED DUPLICATE of bug 1274622
Alias:	None
Product:	ovirt-hosted-engine-setup
Classification:	oVirt
Component:	General
Sub Component:
Version:	1.3.5.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	ovirt-4.0.0-rc
Target Release:	---
Assignee:	Simone Tiraboschi
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Gluster-HC-2
TreeView+	depends on / blocked

Reported:	2016-04-28 17:06 UTC by SATHEESARAN
Modified:	2017-05-11 09:31 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:	RHEV RHGS HCI RHEV 3.6.5 RHEL 7.2
Last Closed:	2016-05-20 12:20:14 UTC
oVirt Team:	Integration
Embargoed:
Dependent Products:
Flags:	sabose: ovirt-4.0.0? sasundar: planning_ack? sasundar: devel_ack? sasundar: testing_ack?

Attachments	(Terms of Use)
vdsm.log (5.67 MB, text/plain) 2016-04-28 17:11 UTC, SATHEESARAN	no flags	Details
sosreport from the node (12.22 MB, application/x-xz) 2016-04-29 08:48 UTC, SATHEESARAN	no flags	Details
supervdsm.log (5.24 MB, text/plain) 2016-04-29 09:16 UTC, SATHEESARAN	no flags	Details
broker.log (1.06 MB, text/plain) 2016-04-29 09:17 UTC, SATHEESARAN	no flags	Details
agent.log (1.07 MB, text/plain) 2016-04-29 09:17 UTC, SATHEESARAN	no flags	Details
View All

Description SATHEESARAN 2016-04-28 17:06:11 UTC

Description of problem:
------------------------
After deploying the hosted-engine on one host, there were lots of error messages in vdsm.log on that host

Version-Release number of selected component (if applicable):
--------------------------------------------------------------
RHEV 3.6.5.3-1
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.3-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.5.0-1.el7ev.noarch

How reproducible:
-----------------
Always

Steps to Reproduce:
--------------------
1. On RHEL 7.2 perform deploy hosted-engine ( hosted-engine --deploy )
2. Use glusterfs volume ( replica 3 + sharded volume ) for the hosted-engine backend
3. After hosted-engine deployment is successful, check for vdsm logs

Actual results:
---------------
vdsm logs are populated with errors and tracebacks

Expected results:
------------------
There shouldn't be any errors or tracebacks

Additional info:
----------------
Thread-16900::ERROR::2016-04-28 22:01:25,673::monitor::394::Storage.Monitor::(_releaseHostId) Error releasing host id 1 for domain c75896b9-76d7-4327-b8bc-d2aa593e5e1b
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/monitor.py", line 391, in _releaseHostId
    self.domain.releaseHostId(self.hostId, unused=True)
  File "/usr/share/vdsm/storage/sd.py", line 536, in releaseHostId
    self._clusterLock.releaseHostId(hostId, async, unused)
  File "/usr/share/vdsm/storage/clusterlock.py", line 249, in releaseHostId
    raise se.ReleaseHostIdFailure(self._sdUUID, e)
ReleaseHostIdFailure: Cannot release host id: (u'c75896b9-76d7-4327-b8bc-d2aa593e5e1b', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))

<--cut-->

Thread-18998::DEBUG::2016-04-28 22:20:26,544::task::827::Storage.TaskManager.Task::(resourceAcquired) Task=`bbfca596-dfb3-45b3-b259-6612e5b0c0ba`::_resourcesAcquired: Storage.c75896b9-76d7-4327-b8bc-d2aa593e5e1b
 (shared)
Thread-18998::DEBUG::2016-04-28 22:20:26,544::task::993::Storage.TaskManager.Task::(_decref) Task=`bbfca596-dfb3-45b3-b259-6612e5b0c0ba`::ref 1 aborting False
jsonrpc.Executor/2::ERROR::2016-04-28 22:20:26,552::API::1869::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1849, in _getHaInfo
    stats = instance.get_all_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 105, in get_all_stats
    stats = broker.get_stats_from_storage(service)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 232, in get_stats_from_storage
    result = self._checked_communicate(request)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 260, in _checked_communicate
    .format(message or response))
RequestError: Request failed: failed to read metadata: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/localhost:_enginevol/c75896b9-76d7-4327-b8bc-d2aa593e5e1b/ha_agent/hosted-engine.metad
ata'

<--cut-->
Thread-19064::ERROR::2016-04-28 22:21:06,079::API::1869::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1849, in _getHaInfo
    stats = instance.get_all_stats()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
    self._configure_broker_conn(broker)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
    dom_type=dom_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 176, in set_storage_domain
    .format(sd_type, options, e))
RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'glusterfs', 'sd_uuid': 'c75896b9-76d7-4327-b8bc-d2aa593e5e1b'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_back
ends.BackendFailureException'>
<--cut-->

Thread-19088::ERROR::2016-04-28 22:21:40,409::monitor::394::Storage.Monitor::(_releaseHostId) Error releasing host id 1 for domain c75896b9-76d7-4327-b8bc-d2aa593e5e1b
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/monitor.py", line 391, in _releaseHostId
    self.domain.releaseHostId(self.hostId, unused=True)
  File "/usr/share/vdsm/storage/sd.py", line 536, in releaseHostId
    self._clusterLock.releaseHostId(hostId, async, unused)
  File "/usr/share/vdsm/storage/clusterlock.py", line 249, in releaseHostId
    raise se.ReleaseHostIdFailure(self._sdUUID, e)
ReleaseHostIdFailure: Cannot release host id: (u'c75896b9-76d7-4327-b8bc-d2aa593e5e1b', SanlockException(16, 'Sanlock lockspace remove failure', 'Device or resource busy'))

<--cut-->
Thread-19126::ERROR::2016-04-28 22:21:40,907::task::866::Storage.TaskManager.Task::(_setError) Task=`72eece33-b9b9-4a0a-80f0-a83dd7fbea0c`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2835, in getStorageDomainInfo
    dom = self.validateSdUUID(sdUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 278, in validateSdUUID
    sdDom.validate()
  File "/usr/share/vdsm/storage/fileSD.py", line 360, in validate
    raise se.StorageDomainAccessError(self.sdUUID)
StorageDomainAccessError: Domain is either partially accessible or entirely inaccessible: (u'c75896b9-76d7-4327-b8bc-d2aa593e5e1b',)

<--cut-->

Thread-19140::ERROR::2016-04-28 22:21:41,801::task::866::Storage.TaskManager.Task::(_setError) Task=`5091d821-479b-4c1c-8908-5a6cdba3550f`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 49, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 3310, in getImagesList
    images = dom.getAllImages()
  File "/usr/share/vdsm/storage/fileSD.py", line 373, in getAllImages
    self.getPools()[0],
IndexError: list index out of range


<--cut--->
Thread-19140::ERROR::2016-04-28 22:21:41,803::dispatcher::79::Storage.Dispatcher::(wrapper) list index out of range
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 71, in wrapper
    result = ctask.prepare(func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 104, in wrapper
    return m(self, *a, **kw)
  File "/usr/share/vdsm/storage/task.py", line 1179, in prepare
    raise self.error
IndexError: list index out of range

Comment 1 SATHEESARAN 2016-04-28 17:07:56 UTC

But there are no functional issues, hosted-engine is up and running on this host

# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : cambridge.lab.eng.blr.redhat.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 7cfd99bf
Host timestamp                     : 34339

Comment 2 SATHEESARAN 2016-04-28 17:11:02 UTC

Created attachment 1151954 [details]
vdsm.log

Comment 3 Sahina Bose 2016-04-29 08:23:27 UTC

Isn't this a duplicate of bug 1331526?

Comment 4 SATHEESARAN 2016-04-29 08:47:05 UTC

(In reply to Sahina Bose from comment #3)
> Isn't this a duplicate of bug 1331526?

There are multiple error messages in vdsm.log. One error messages is related to sanlock - unable to add lockspace, while the other error messages says "IndexError: list out of range"

I am not able to figure out what is the cause for all, so created a bug for error messages in {broker[1], agent[2]}.log

If all issues are cause of one another, then this bug can be marked as duplicate of other.

The other issues are - 

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1331526
[2] - https://bugzilla.redhat.com/show_bug.cgi?id=1331514

Comment 5 SATHEESARAN 2016-04-29 08:48:47 UTC

Created attachment 1152171 [details]
sosreport from the node

Comment 6 SATHEESARAN 2016-04-29 09:16:15 UTC

Created attachment 1152198 [details]
supervdsm.log

Comment 7 SATHEESARAN 2016-04-29 09:17:21 UTC

Created attachment 1152199 [details]
broker.log

Comment 8 SATHEESARAN 2016-04-29 09:17:52 UTC

Created attachment 1152200 [details]
agent.log

Comment 9 Yaniv Lavi 2016-05-09 10:53:38 UTC

Moving to first RC, since things should not be targeted to second one at this point.

Comment 10 Simone Tiraboschi 2016-05-20 12:20:14 UTC

IndexError: list out of range on getImagesList is just a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1274622

*** This bug has been marked as a duplicate of bug 1274622 ***

Note You need to log in before you can comment on or make changes to this bug.