Created attachment 1321473 [details] Sosreport and setup log from alma03 Description of problem: [ ERROR ] Failed to execute stage 'Closing up': 'NoneType' object has no attribute 'values' [ ERROR ] Failed to execute stage 'Clean up': [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue,fix and redeploy 20**FILTERED**7-09-03 **FILTERED**2:00:34,635+0300 DEBUG otopi.context context._executeMethod:**FILTERED**42 method ex ception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line **FILTERED**32, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/storage/storage.py", line **FILTERED**044, in _cleanup self._stopMonitoringDomain() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/storage/storage.py", line 70**FILTERED**, in _stopMonitoringDomain self.environment[ohostedcons.StorageEnv.HOST_ID] File "/usr/lib/python2.7/site-packages/vdsm/jsonrpcvdscli.py", line **FILTERED**70, in _callMethod req, timeout=self._timeouts.get(method_name, timeout)) File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 386, in call call = self.call_async(*reqs) File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 392, in call_async self.call_cb(call.callback, *reqs) File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 408, in call_cb self._transport.send(ctx.encode()) File "/usr/lib/python2.7/site-packages/yajsonrpc/stompreactor.py", line 6**FILTERED**4, in send headers, File "/usr/lib/python2.7/site-packages/yajsonrpc/stompreactor.py", line 448, in send raise stomp.Disconnected() Disconnected 20**FILTERED**7-09-03 **FILTERED**2:00:34,637+0300 ERROR otopi.context context._executeMethod:**FILTERED**5**FILTERED* * Failed to execute stage 'Clean up': Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.2.0-0.0.master.20170831115238.git6d9b02e.el7.centos.noarch How reproducible: 100% Reproduced on several different clean RHEL7.4 hosts. Steps to Reproduce: 1.Deploy SHE over NFS on RHEL7.4 over NFS. Actual results: Failed to execute stage 'Closing up': 'NoneType' object has no attribute 'values' Expected results: Deployment should succeed. Additional info: Components and deployment details available from http://pastebin.test.redhat.com/513207. Sosreport and setup log attached.
Does it work on iSCSI? Did it work in a previous build? Was it caught in tier 1 automation as well?
(In reply to Yaniv Kaul from comment #1) > Does it work on iSCSI? I did not tested it on it yet, only on NFS. > Did it work in a previous build? Yes. > Was it caught in tier 1 automation as well? Kobi, can you please provide us with the info regarding this?
Looking at the logs it happens when waiting for engine to come up- [ INFO ] Waiting for engine to start... [ ERROR ] Failed to execute stage 'Closing up': 'NoneType' object has no attribute 'values' so it may be an issue with the daemon response.
Last time I ran tier1 it was with upgrade flow [1] and it runs as expected. Today I ran cleaner, reprovision and deploy on hosted engine environment [2] and got the same error [1] - https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/rhv-4.2-ge-upgrade-flow-tier1/14/console [2] - https://rhv-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/4.2_dev/job/rhv-4.2-ge-flow-infra/212/console
The root error is hosted-engine-setup logs is here: 2017-09-03 12:00:34,084+0300 INFO otopi.plugins.gr_he_common.core.misc misc.engine_vm_up_check:161 Waiting for engine to start... Cannot connect to the HA daemon, please check the logs. [Errno 2] No such file or directory Cannot connect to the HA daemon, please check the logs. 2017-09-03 12:00:34,089+0300 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/core/misc.py", line 164, in engine_vm_up_check if vmstatus.get_status()['engine_vm_up']: File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 170, in get_status for host in status['all_host_stats'].values(): AttributeError: 'NoneType' object has no attribute 'values' So hosted-engine-setup fails getting VM status from ovirt-ha-agent. If we check ovirt-ha-broker logs indeed we see that's full of: Listener::ERROR::2017-09-03 12:13:46,083::storage_broker::147::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats_for_service_type) Failed to read metadata from /var/run/vdsm/storage/699db583-f5b0-43bd-b769-91ab0627f5d5/a8318a91-f70a-4456-9778-34931a913541/aa45d041-350d-4023-9d70-040f3e4a1395 Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 138, in get_raw_stats_for_service_type f = os.open(path, direct_flag | os.O_RDONLY | os.O_SYNC)
It seems to be a race condition between setup and broker: 2017-09-04 14:51:21,925+0200 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/sys temctl', 'start', 'ovirt-ha-broker.service') stderr: 2017-09-04 14:51:21,927+0200 DEBUG otopi.context context._executeMethod:128 Stage closeup METHOD otopi.plugins.gr_he_ common.core.misc.Plugin._persist_files_start 2017-09-04 14:51:21,928+0200 DEBUG otopi.context context._executeMethod:134 condition False 2017-09-04 14:51:21,929+0200 DEBUG otopi.context context._executeMethod:128 Stage closeup METHOD otopi.plugins.gr_he_ common.core.misc.Plugin.engine_vm_up_check 2017-09-04 14:51:21,930+0200 INFO otopi.plugins.gr_he_common.core.misc misc.engine_vm_up_check:161 Waiting for engine to start... Cannot connect to the HA daemon, please check the logs. [Errno 2] No such file or directory Cannot connect to the HA daemon, please check the logs. 2017-09-04 14:51:21,934+0200 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-common/core/misc.py", line 164, in engine_vm_up_check if vmstatus.get_status()['engine_vm_up']: File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/vm_status.py", line 172, in get_status for host in status['all_host_stats'].values(): AttributeError: 'NoneType' object has no attribute 'values' 2017-09-04 14:51:21,935+0200 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': 'NoneType' object has no attribute 'values' Setup starts broker and immediately makes a request for it. Unfortunately broker validates storage mounts, symlinks and other stuff on start-up, so it make take up to several seconds before broker will be operational. I would recommend to make several (3-5 or 10) retries with 5 second delays between them.
With https://gerrit.ovirt.org/#/c/81405/ hosted-engine-setup doesn't fail and it continue to poll but ovirt-ha-agent still fails connecting the broker: MainThread::ERROR::2017-09-04 18:24:23,334::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 434, in start_monitoring if self._broker and self._broker.is_connected(): AttributeError: 'BrokerLink' object has no attribute 'is_connected' MainThread::ERROR::2017-09-04 18:24:23,335::agent::145::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2017-09-04 18:24:23,335::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down MainThread::INFO::2017-09-04 18:24:33,692::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.2.0-0.0.master.20170901100749 started
It actually fails to disconnect. This should not be a problem for setup and normal operation. Anyway, i'm going to fix it today.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Successfully deployed ovirt-hosted-engine-setup-2.2.0-0.0.master.20170913082146.git120d800.el7.centos.noarch with pre-installed rhvm-appliance-1:4.1.20170811.0-1.el7. Moving to verified as initial issue not being reproduced anymore and works fine.
Refining my previous comment #11: Deployment was a success with ovirt-engine-appliance-4.2-20170916.1.el7.centos.noarch installed on host running with RHEL7.4 and ovirt-hosted-engine-setup-2.2.0-0.0.master.20170913082146.git120d800.el7.centos.noarch.
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017. Since the problem described in this bug report should be resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.