Description of problem: ------------------------ After hosted-engine deployment, lots of errors are seen in the broker.log Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHEV 3.6.5.3-1 ovirt-host-deploy-1.4.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.3-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.5.0-1.el7ev.noarch How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. On RHEL 7.2 perform deploy hosted-engine ( hosted-engine --deploy ) 2. Use glusterfs volume ( replica 3 + sharded volume ) for the hosted-engine backend 3. After hosted-engine deployment is successful, check for agent logs ( /var/log/ovirt-hosted-engine/ha/broker.log ) Actual results: --------------- There are lots of error messages seen in broker.log Expected results: ------------------ There shouldn't be any error messages seen in the broker.log Additional info: ---------------- <---cut---> Thread-4059::ERROR::2016-04-28 22:58:06,177::storage_broker::138::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata from /rhev/data-center/mnt/g lusterSD/localhost:_enginevol/c75896b9-76d7-4327-b8bc-d2aa593e5e1b/ha_agent/hosted-engine.metadata Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 129, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC) OSError: [Errno 2] No such file or directory: '/rhev/data-center/mnt/glusterSD/localhost:_enginevol/c75896b9-76d7-4327-b8bc-d2aa593e5e1b/ha_agent/hosted-engine.metadata' <---cut---> Thread-4072::ERROR::2016-04-28 22:58:45,473::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=gluster fs sd_uuid=c75896b9-76d7-4327-b8bc-d2aa593e5e1b' Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 166, in handle data) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py", line 299, in _dispatch .set_storage_domain(client, sd_type, **options) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 66, in set_storage_domain self._backends[client].connect() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 456, in connect self._dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", line 108, in get_domain_path " in {1}".format(sd_uuid, parent)) BackendFailureException: path to storage domain c75896b9-76d7-4327-b8bc-d2aa593e5e1b not found in /rhev/data-center/mnt/glusterSD Thread-4072::INFO::2016-04-28 22:58:45,475::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed <---cut---> [root@ ~]# grep ERROR /var/log/ovirt-hosted-engine-ha/broker.log | wc -l 59
Created attachment 1152006 [details] broker.log
Please attach agent and vdsm logs as well. What's the impact - does hosted-engine fail to deploy?
Created attachment 1152163 [details] sosreport from the node
Created attachment 1152164 [details] agent.log from the node
Created attachment 1152166 [details] vdsm.log
Created attachment 1152167 [details] supervdsm.log
I have also found various error messages in vdsm.log[1] and and also in agent.log[2]. I have raise separate bugs for those [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1331503 [2] - https://bugzilla.redhat.com/show_bug.cgi?id=1331514
There is no functional loss, and so lowering the severity
Is this related to periodic umount and mount of engine volume seen in gluster client logs? Does the error occur due to the umount or vice versa?
Yes, I think so
Moving to first RC, since things should not be targeted to second one at this point.
Note that with 3.6.7RC1 I'm not seeing this issue. Kasturi, can you check in your environment too?
Hi Sahina, I am testing with 3.6.7RC1 build and in the agent.log i see the error message described in description plus some additional errors. Thread-21842::ERROR::2016-06-03 15:37:36,165::storage_broker::138::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata from /var/run/vdsm/storage/c78bf0e8-ff81-4e98-ba76-068613d3cefe/ab9ed2e8-f830-4b0a-9646-3f75ed60dd32/8dc79e6e-c7a2-4926-a1dd-13be01c14852 Thread-21846::ERROR::2016-06-03 15:37:39,932::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file) Failed to getVmStats: 'pid' Thread-80473::ERROR::2016-06-03 15:37:44,667::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=c78bf0e8-ff81-4e98-ba76-068613d3cefe' Thread-80474::ERROR::2016-06-03 15:37:54,733::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=c78bf0e8-ff81-4e98-ba76-068613d3cefe' Thread-80475::ERROR::2016-06-03 15:38:04,785::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=c78bf0e8-ff81-4e98-ba76-068613d3cefe' Thread-1::ERROR::2016-06-03 15:46:01,392::storage_broker::138::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata from /rhev/data-center/mnt/glusterSD/10.70.34.35:_engine/c78bf0e8-ff81-4e98-ba76-068613d3cefe/ha_agent/hosted-engine.metadata Log from dhcp35-194.lab.eng.blr.redhat.com: ================================================== [root@dhcp35-194 ovirt-hosted-engine-ha]# grep ERROR /var/log/ovirt-hosted-engine-ha/broker.log [root@dhcp35-194 ovirt-hosted-engine-ha]# Log from zod.lab.eng.blr.redhat.com: ============================================= grep ERROR /var/log/ovirt-hosted-engine-ha/broker.log Thread-57120::ERROR::2016-06-03 16:08:44,770::storage_broker::138::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata from /var/run/vdsm/storage/c78bf0e8-ff81-4e98-ba76-068613d3cefe/ab9ed2e8-f830-4b0a-9646-3f75ed60dd32/8dc79e6e-c7a2-4926-a1dd-13be01c14852 Thread-58234::ERROR::2016-06-03 16:08:51,502::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data: 'set-storage-domain FilesystemBackend dom_type=glusterfs sd_uuid=c78bf0e8-ff81-4e98-ba76-068613d3cefe'
"Failed to read metadata from" indicates the mount or the whole setup was broken.
Sas, are you seeing these messages after using backup-volfile-server option and moving to latest 3.6.7 and gluster builds?
(In reply to Sahina Bose from comment #15) > Sas, are you seeing these messages after using backup-volfile-server option > and moving to latest 3.6.7 and gluster builds? I have tested with RHEV 3.6.7.4 and I am not seeing any of the mentioned error messages. For the worth of information. I do see only a different error messages as follows- "/var/log/ovirt-hosted-engine-ha/broker.log.2016-06-15:Thread-5::ERROR::2016-06-16 10:32:44,721::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth::(update_stat_file) Failed to getVmStats: 'pid'" for fewer times ( 215 times to be exact ) I am not sure that the above reported error message is seen consistently. We can take that in a different bug, if there is any other problem related to that. So far the error messages reported in comment0, is no longer seen with RHEV 3.6.7.4
(In reply to SATHEESARAN from comment #16) > (In reply to Sahina Bose from comment #15) > > Sas, are you seeing these messages after using backup-volfile-server option > > and moving to latest 3.6.7 and gluster builds? > > I have tested with RHEV 3.6.7.4 and I am not seeing any of the mentioned > error messages. > > For the worth of information. > I do see only a different error messages as follows- > "/var/log/ovirt-hosted-engine-ha/broker.log.2016-06-15:Thread-5::ERROR::2016- > 06-16 > 10:32:44,721::cpu_load_no_engine::156::cpu_load_no_engine.EngineHealth:: > (update_stat_file) Failed to getVmStats: 'pid'" for fewer times ( 215 times > to be exact ) > > I am not sure that the above reported error message is seen consistently. We > can take that in a different bug, if there is any other problem related to > that. > So far the error messages reported in comment0, is no longer seen with RHEV > 3.6.7.4 Missed the other part of the information. I have used backup-volfile-server option while configuring hosted-engine and also using latest RHGS 3.1.3 RC ( glusterfs -3.7.9-10 )
Thank, Sas! Closing this as per comment 16 & 17