Bug 1331526 - Multiple error messages seen in broker.log post hosted-engine deployment on one host
Summary: Multiple error messages seen in broker.log post hosted-engine deployment on o...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: 1.3.5.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Sandro Bonazzola
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On: 1327516
Blocks: Gluster-HC-2
TreeView+ depends on / blocked
 
Reported: 2016-04-28 17:36 UTC by SATHEESARAN
Modified: 2017-05-11 09:30 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
RHEV RHGS HCI RHEV 3.6.5 RHEL 7.2
Last Closed: 2016-06-20 09:17:34 UTC
oVirt Team: Gluster
Embargoed:
sasundar: planning_ack?
sasundar: devel_ack?
sasundar: testing_ack?


Attachments (Terms of Use)
broker.log (1.06 MB, text/plain)
2016-04-28 17:39 UTC, SATHEESARAN
no flags Details
sosreport from the node (12.22 MB, application/x-xz)
2016-04-29 08:29 UTC, SATHEESARAN
no flags Details
agent.log from the node (1.07 MB, text/plain)
2016-04-29 08:30 UTC, SATHEESARAN
no flags Details
vdsm.log (5.67 MB, text/plain)
2016-04-29 08:31 UTC, SATHEESARAN
no flags Details
supervdsm.log (5.24 MB, text/plain)
2016-04-29 08:32 UTC, SATHEESARAN
no flags Details

Description SATHEESARAN 2016-04-28 17:36:55 UTC
Description of problem:
------------------------
After hosted-engine deployment, lots of errors are seen in the broker.log

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHEV 3.6.5.3-1
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.3-1.el7ev.noarch
ovirt-hosted-engine-setup-1.3.5.0-1.el7ev.noarch

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. On RHEL 7.2 perform deploy hosted-engine ( hosted-engine --deploy )
2. Use glusterfs volume ( replica 3 + sharded volume ) for the hosted-engine backend
3. After hosted-engine deployment is successful, check for agent logs ( /var/log/ovirt-hosted-engine/ha/broker.log )

Actual results:
---------------
There are lots of error messages seen in broker.log

Expected results:
------------------
There shouldn't be any error messages seen in the broker.log

Additional info:
----------------

<---cut--->

Thread-4059::ERROR::2016-04-28 22:58:06,177::storage_broker::138::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata from /rhev/data-center/mnt/g
lusterSD/localhost:_enginevol/c75896b9-76d7-4327-b8bc-d2aa593e5e1b/ha_agent/hosted-engine.metadata
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 129, in get_raw_stats_for_service_type
    f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
OSError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/localhost:_enginevol/c75896b9-76d7-4327-b8bc-d2aa593e5e1b/ha_agent/hosted-engine.metadata'


<---cut--->

Thread-4072::ERROR::2016-04-28 22:58:45,473::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=gluster
fs sd_uuid=c75896b9-76d7-4327-b8bc-d2aa593e5e1b'
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle
    data)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch
    .set_storage_domain(client, sd_type, **options)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain
    self._backends[client].connect()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 456, in connect
    self._dom_type)
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path
    " in {1}".format(sd_uuid, parent))
BackendFailureException: path to storage domain c75896b9-76d7-4327-b8bc-d2aa593e5e1b not found in /rhev/data-center/mnt/glusterSD
Thread-4072::INFO::2016-04-28 22:58:45,475::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed

<---cut--->

[root@ ~]# grep ERROR /var/log/ovirt-hosted-engine-ha/broker.log  | wc -l
59

Comment 1 SATHEESARAN 2016-04-28 17:39:44 UTC
Created attachment 1152006 [details]
broker.log

Comment 2 Sahina Bose 2016-04-29 08:14:58 UTC
Please attach agent and vdsm logs as well.
What's the impact - does hosted-engine fail to deploy?

Comment 3 SATHEESARAN 2016-04-29 08:29:42 UTC
Created attachment 1152163 [details]
sosreport from the node

Comment 4 SATHEESARAN 2016-04-29 08:30:29 UTC
Created attachment 1152164 [details]
agent.log from the node

Comment 5 SATHEESARAN 2016-04-29 08:31:16 UTC
Created attachment 1152166 [details]
vdsm.log

Comment 6 SATHEESARAN 2016-04-29 08:32:01 UTC
Created attachment 1152167 [details]
supervdsm.log

Comment 7 SATHEESARAN 2016-04-29 08:42:10 UTC
I have also found various error messages in vdsm.log[1] and and also in agent.log[2]. I have raise separate bugs for those

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1331503
[2] - https://bugzilla.redhat.com/show_bug.cgi?id=1331514

Comment 8 SATHEESARAN 2016-04-29 09:14:21 UTC
There is no functional loss, and so lowering the severity

Comment 9 Sahina Bose 2016-05-03 08:57:47 UTC
Is this related to periodic umount and mount of engine volume seen in gluster client logs? Does the error occur due to the umount or vice versa?

Comment 10 Simone Tiraboschi 2016-05-03 09:03:10 UTC
Yes, I think so

Comment 11 Yaniv Lavi 2016-05-09 10:53:29 UTC
Moving to first RC, since things should not be targeted to second one at this point.

Comment 12 Sahina Bose 2016-06-02 08:48:54 UTC
Note that with 3.6.7RC1 I'm not seeing this issue.
Kasturi, can you check in your environment too?

Comment 13 RamaKasturi 2016-06-03 14:15:16 UTC
Hi Sahina,

  I am testing with 3.6.7RC1 build and in the agent.log i see the error message described in description plus some additional errors.

Thread-21842::ERROR::2016-06-03 15:37:36,165::storage_broker::138::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata from /var/run/vdsm/storage/c78bf0e8-ff81-4e98-ba76-068613d3cefe/ab9ed2e8-f830-4b0a-9646-3f75ed60dd32/8dc79e6e-c7a2-4926-a1dd-13be01c14852
Thread-21846::ERROR::2016-06-03 15:37:39,932::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file) Failed to getVmStats: 'pid'
Thread-80473::ERROR::2016-06-03 15:37:44,667::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=c78bf0e8-ff81-4e98-ba76-068613d3cefe'
Thread-80474::ERROR::2016-06-03 15:37:54,733::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=c78bf0e8-ff81-4e98-ba76-068613d3cefe'
Thread-80475::ERROR::2016-06-03 15:38:04,785::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=c78bf0e8-ff81-4e98-ba76-068613d3cefe'
Thread-1::ERROR::2016-06-03 15:46:01,392::storage_broker::138::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata from /rhev/data-center/mnt/glusterSD/10.70.34.35:_engine/c78bf0e8-ff81-4e98-ba76-068613d3cefe/ha_agent/hosted-engine.metadata

Log from dhcp35-194.lab.eng.blr.redhat.com:
==================================================
[root@dhcp35-194 ovirt-hosted-engine-ha]# grep ERROR /var/log/ovirt-hosted-engine-ha/broker.log
[root@dhcp35-194 ovirt-hosted-engine-ha]# 


Log from zod.lab.eng.blr.redhat.com:
=============================================

grep ERROR /var/log/ovirt-hosted-engine-ha/broker.log
Thread-57120::ERROR::2016-06-03 16:08:44,770::storage_broker::138::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata from /var/run/vdsm/storage/c78bf0e8-ff81-4e98-ba76-068613d3cefe/ab9ed2e8-f830-4b0a-9646-3f75ed60dd32/8dc79e6e-c7a2-4926-a1dd-13be01c14852
Thread-58234::ERROR::2016-06-03 16:08:51,502::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=c78bf0e8-ff81-4e98-ba76-068613d3cefe'

Comment 14 Doron Fediuck 2016-06-05 07:49:59 UTC
"Failed to read metadata from" indicates the mount or the whole setup was broken.

Comment 15 Sahina Bose 2016-06-20 07:59:53 UTC
Sas, are you seeing these messages after using backup-volfile-server option and moving to latest 3.6.7 and gluster builds?

Comment 16 SATHEESARAN 2016-06-20 08:12:04 UTC
(In reply to Sahina Bose from comment #15)
> Sas, are you seeing these messages after using backup-volfile-server option
> and moving to latest 3.6.7 and gluster builds?

I have tested with RHEV 3.6.7.4 and I am not seeing any of the mentioned error messages.

For the worth of information.
I do see only a different error messages  as follows- 
"/var/log/ovirt-hosted-engine-ha/broker.log.2016-06-15:Thread-5::ERROR::2016-06-16 10:32:44,721::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file) Failed to getVmStats: 'pid'" for fewer times ( 215 times to be exact )

I am not sure that the above reported error message is seen consistently. We can take that in a different bug, if there is any other problem related to that.
So far the error messages reported in comment0, is no longer seen with RHEV 3.6.7.4

Comment 17 SATHEESARAN 2016-06-20 09:04:26 UTC
(In reply to SATHEESARAN from comment #16)
> (In reply to Sahina Bose from comment #15)
> > Sas, are you seeing these messages after using backup-volfile-server option
> > and moving to latest 3.6.7 and gluster builds?
> 
> I have tested with RHEV 3.6.7.4 and I am not seeing any of the mentioned
> error messages.
> 
> For the worth of information.
> I do see only a different error messages  as follows- 
> "/var/log/ovirt-hosted-engine-ha/broker.log.2016-06-15:Thread-5::ERROR::2016-
> 06-16
> 10:32:44,721::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::
> (update_stat_file) Failed to getVmStats: 'pid'" for fewer times ( 215 times
> to be exact )
> 
> I am not sure that the above reported error message is seen consistently. We
> can take that in a different bug, if there is any other problem related to
> that.
> So far the error messages reported in comment0, is no longer seen with RHEV
> 3.6.7.4

Missed the other part of the information.
I have used backup-volfile-server option while configuring hosted-engine and also using latest RHGS 3.1.3 RC ( glusterfs -3.7.9-10 )

Comment 18 Sahina Bose 2016-06-20 09:17:34 UTC
Thank, Sas! Closing this as per comment 16 & 17


Note You need to log in before you can comment on or make changes to this bug.