+++ This bug is an upstream to downstream clone. The original bug is: +++ +++ bug 1375573 +++ ====================================================================== Description of problem: 1.-I've deployed over NFS storage and using PXE, a new and clean Red Hat Virtualization Manager Version: 4.0.4.2-0.1.el7ev, on 3.6.9 el7.2 host. 2.-Got this: [ INFO ] Still waiting for VDSM host to become operational... The host alma03.qa.lab.tlv.redhat.com is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? The host alma03.qa.lab.tlv.redhat.com is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? 3.-From the WEBUI I've tried to activate the host and got it in "Unassigned" status. 4.-From shell I see these: hosted-engine --vm-status Failed to connect to broker, the number of errors has exceeded the limit (1) Cannot connect to the HA daemon, please check the logs. In /var/log/messages I see: Sep 13 15:34:14 alma03 journal: vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012 File "/usr/shar e/vdsm/API.py", line 1856, in _getHaInfo#012 stats = instance.get_all_stats()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_h a/client/client.py", line 102, in get_all_stats#012 with broker.connection(self._retries, self._wait):#012 File "/usr/lib64/python2.7/contextl ib.py", line 17, in __enter__#012 return self.gen.next()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection#012 self.connect(retries, wait)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect#012 raise BrokerConnectionError(error_msg)#012BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) 5.-Both ovirt-ha-agent and broker services were dead, so I manually started them on: # systemctl status ovirt-ha-agent ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled) Active: inactive (dead) # systemctl status ovirt-ha-broker ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled) Active: inactive (dead) # systemctl start ovirt-ha-agent # systemctl start ovirt-ha-broker # systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2016-09-13 15:43:33 IDT; 1min 47s ago Main PID: 57265 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─57265 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: WARNING:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to find OVF_STORE Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR 'version' is not stored in the HE configuration image Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:'version' is not stored in the HE configuration image Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineUp (score: 3400) # systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2016-09-13 15:43:33 IDT; 1min 58s ago Main PID: 57264 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─57264 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:mem_free.MemFree:memFree: 27052 Sep 13 15:45:23 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:mem_free.MemFree:memFree: 27054 6.-Got these from vdsm: # systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2016-09-13 15:31:01 IDT; 15min ago Main PID: 55199 (vdsm) CGroup: /system.slice/vdsmd.service ├─55199 /usr/bin/python /usr/share/vdsm/vdsm ├─55319 /usr/libexec/ioprocess --read-pipe-fd 33 --write-pipe-fd 32 --max-threads 10 --max-queued-requests 101.-I've deployed over NFS storage and using PXE, a new and clean Red Hat Virtualization Manager Version: 4.0.4.2-0.1.el7ev, on 3.6.9 el7.2 host. 2.-Got this: [ INFO ] Still waiting for VDSM host to become operational... The host alma03.qa.lab.tlv.redhat.com is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? The host alma03.qa.lab.tlv.redhat.com is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? 3.-From the WEBUI I've tried to activate the host and got it in "Unassigned" status. 4.-From shell I see these: hosted-engine --vm-status Failed to connect to broker, the number of errors has exceeded the limit (1) Cannot connect to the HA daemon, please check the logs. In /var/log/messages I see: Sep 13 15:34:14 alma03 journal: vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012 File "/usr/shar e/vdsm/API.py", line 1856, in _getHaInfo#012 stats = instance.get_all_stats()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_h a/client/client.py", line 102, in get_all_stats#012 with broker.connection(self._retries, self._wait):#012 File "/usr/lib64/python2.7/contextl ib.py", line 17, in __enter__#012 return self.gen.next()#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection#012 self.connect(retries, wait)#012 File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect#012 raise BrokerConnectionError(error_msg)#012BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) 5.-Both ovirt-ha-agent and broker services were dead, so I manually started them on: # systemctl status ovirt-ha-agent ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled) Active: inactive (dead) # systemctl status ovirt-ha-broker ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled) Active: inactive (dead) # systemctl start ovirt-ha-agent # systemctl start ovirt-ha-broker # systemctl status ovirt-ha-agent -l ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2016-09-13 15:43:33 IDT; 1min 47s ago Main PID: 57265 (ovirt-ha-agent) CGroup: /system.slice/ovirt-ha-agent.service └─57265 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: WARNING:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to find OVF_STORE Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR 'version' is not stored in the HE configuration image Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:'version' is not stored in the HE configuration image Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineUp (score: 3400) # systemctl status ovirt-ha-broker -l ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2016-09-13 15:43:33 IDT; 1min 58s ago Main PID: 57264 (ovirt-ha-broker) CGroup: /system.slice/ovirt-ha-broker.service └─57264 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:mem_free.MemFree:memFree: 27052 Sep 13 15:45:23 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:mem_free.MemFree:memFree: 27054 6.-Got these from vdsm: # systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2016-09-13 15:31:01 IDT; 15min ago Main PID: 55199 (vdsm) CGroup: /system.slice/vdsmd.service ├─55199 /usr/bin/python /usr/share/vdsm/vdsm ├─55319 /usr/libexec/ioprocess --read-pipe-fd 33 --write-pipe-fd 32 --max-threads 10 --max-queued-requests 10 ├─55328 /usr/libexec/ioprocess --read-pipe-fd 42 --write-pipe-fd 41 --max-threads 10 --max-queued-requests 10 ├─57294 /usr/libexec/ioprocess --read-pipe-fd 67 --write-pipe-fd 66 --max-threads 10 --max-queued-requests 10 └─57322 /usr/libexec/ioprocess --read-pipe-fd 53 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10 Sep 13 15:32:49 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:32:49 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:10 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:10 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:30 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:30 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:51 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:51 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:34:14 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:34:14 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) 7.-Deployment failed after I was waiting for some time: [ INFO ] Still waiting for VDSM host to become operational... The host alma03.qa.lab.tlv.redhat.com is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? The host alma03.qa.lab.tlv.redhat.com is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ ERROR ] Timed out while waiting for host to start. Please check the logs. [ ERROR ] Unable to add alma03.qa.lab.tlv.redhat.com to the manager [ INFO ] Saving hosted-engine configuration on the shared storage domain Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. Sosreport from host and the engine are attached. ├─55328 /usr/libexec/ioprocess --read-pipe-fd 42 --write-pipe-fd 41 --max-threads 10 --max-queued-requests 10 ├─57294 /usr/libexec/ioprocess --read-pipe-fd 67 --write-pipe-fd 66 --max-threads 10 --max-queued-requests 10 └─57322 /usr/libexec/ioprocess --read-pipe-fd 53 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10 Sep 13 15:32:49 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:32:49 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:10 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:10 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:30 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:30 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:51 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:33:51 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:34:14 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1) Sep 13 15:34:14 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats with broker.connection(self._retries, self._wait): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection self.connect(retries, wait) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect raise BrokerConnectionError(error_msg) BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1) 7.-Deployment failed after I was waiting for some time: [ INFO ] Still waiting for VDSM host to become operational... The host alma03.qa.lab.tlv.redhat.com is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? The host alma03.qa.lab.tlv.redhat.com is in non-operational state. Please try to activate it via the engine webadmin UI. Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ ERROR ] Timed out while waiting for host to start. Please check the logs. [ ERROR ] Unable to add alma03.qa.lab.tlv.redhat.com to the manager [ INFO ] Saving hosted-engine configuration on the shared storage domain Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. Sosreport from host and the engine are attached. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1.Deploy hosted engine 4.0 using PXE over NFS on 3.6.9 host. 2. 3. Actual results: Deployment failed. Expected results: Deployment should succeed. Additional info: Sosreports from engine and host. (Originally by Nikolai Sednev)
This bug is opened from https://bugzilla.redhat.com/show_bug.cgi?id=1356127#c11. adding components from host: ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64 ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch sanlock-3.2.4-3.el7_2.x86_64 libvirt-client-1.2.17-13.el7_2.5.x86_64 mom-0.5.6-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch vdsm-4.17.35-1.el7ev.noarch rhevm-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch rhev-release-3.6.9-2-001.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch Linux version 3.10.0-327.28.3.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Aug 12 13:21:05 EDT 2016 Linux 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) (Originally by Nikolai Sednev)
Created attachment 1200497 [details] sosreport from host (Originally by Nikolai Sednev)
Created attachment 1200499 [details] sosreport from engine (Originally by Nikolai Sednev)
Dan, here Nikolai is trying to deploy hosted-engine using a 4.0 engine on a 3.6 host and the host is not coming up due to: 2016-09-13 15:31:06,523 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (org.ovirt.thread.pool-6-thread-1) [7c16adab] HostName = alma03.qa.lab.tlv.redhat.com 2016-09-13 15:31:06,524 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (org.ovirt.thread.pool-6-thread-1) [7c16adab] Failed in 'CollectVdsNetworkDataAfterInstallationVDS' method, for vds: 'alma03.qa.lab.tlv.redhat.com'; host: 'alma03.qa.lab.tlv.redhat.com': Required SwitchType is not reported. 2016-09-13 15:31:06,525 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (org.ovirt.thread.pool-6-thread-1) [7c16adab] Command 'CollectVdsNetworkDataAfterInstallationVDSCommand(HostName = alma03.qa.lab.tlv.redhat.com, CollectHostNetworkDataVdsCommandParameters:{runAsync='true', hostId='cef7ba08-33c3-4a9f-9646-d2b5a42d347f', vds='Host[alma03.qa.lab.tlv.redhat.com,cef7ba08-33c3-4a9f-9646-d2b5a42d347f]'})' execution failed: Required SwitchType is not reported. 2016-09-13 15:31:06,526 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (org.ovirt.thread.pool-6-thread-1) [7c16adab] transaction rolled back 2016-09-13 15:31:06,526 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (org.ovirt.thread.pool-6-thread-1) [7c16adab] Exception: org.ovirt.engine.core.common.errors.EngineException: EngineException: java.lang.IllegalStateException: Required SwitchType is not reported. (Failed with error ENGINE and code 5001) Is it somehow related to the OVS integration since 4.0? hosted-engine-setup is not lowering the cluster compatibility level since AFAIK it requires an host to be active. (Originally by Simone Tiraboschi)
Which Engine is it, exactly? Using switchType=Legacy as a default should have been fixed by bug 1373112 in ovirt-engine-4.0.2.7-0.1.el7ev (Originally by danken)
(In reply to Dan Kenigsberg from comment #5) > Which Engine is it, exactly? Using switchType=Legacy as a default should > have been fixed by bug 1373112 in ovirt-engine-4.0.2.7-0.1.el7ev ovirt-engine-4.0.4.2-0.1.el7ev.noarch so it seams still there. (Originally by Simone Tiraboschi)
Martin, could you see how this can be? (Originally by danken)
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone. (Originally by rule-engine)
(In reply to Dan Kenigsberg from comment #5, which had a wrong bz# mentioned) Very odd, since 1367483 has been verified, and Ib09c1c826919c8e5084148dd9599612f20f11938 is in ovirt-engine-4.0.4 branch. (Originally by danken)
looking at master code (with limited logs), this function can produce this error: private static SwitchType getSwitchType(Version clusterVersion, Map<String, Object> networkProperties) { Object switchType = networkProperties.get(VdsProperties.SWITCH_KEY); boolean switchTypeShouldBeReportedByVdsm = FeatureSupported.ovsSupported(clusterVersion); if (switchTypeShouldBeReportedByVdsm && switchType == null) { throw new IllegalStateException("Required SwitchType is not reported."); } return SwitchType.parse(Objects.toString(switchType, SwitchType.LEGACY.getOptionValue())); } so it means this: in given 'clusterVersion' switchType is supported; we know that by reading FeatureSupported.ovsSupported(clusterVersion). If ovs is supported, adequeate (also supporting) vdsm should be used. And in that case switchType should be reported. So the reason here can be, that incompatible engine and vdsm are used at the same time. —— comparing master to 4.0.4 it seems there are no differences in this method — all related code was backported. (Originally by Martin Mucha)
This is not a supported flow. We should add an error. But overall there must be a match between appliance major and minor version and host major and minor version. (Originally by Yaniv Dary)
(In reply to Martin Mucha from comment #10) > So the reason here can be, that incompatible engine and vdsm are used at the > same time. The engine was at 4.0.4 and VDSM at 3.6.9 (Originally by Simone Tiraboschi)
(In reply to Yaniv Dary from comment #11) > This is not a supported flow. We should add an error. > But overall there must be a match between appliance major and minor version > and host major and minor version. Indeed we have, at least on master and 4.0, that's why Nikolai had to force it deploying the engine VM via PXE. (Originally by Simone Tiraboschi)
(In reply to Simone Tiraboschi from comment #13) > (In reply to Yaniv Dary from comment #11) > > This is not a supported flow. We should add an error. > > But overall there must be a match between appliance major and minor version > > and host major and minor version. > > Indeed we have, at least on master and 4.0, that's why Nikolai had to force > it deploying the engine VM via PXE. PXE is not supported anymore, only appliance. Let's add a warning in the engine as well then when you add a 3.6 host to 4.0 cluster. (Originally by Yaniv Dary)
(In reply to Yaniv Dary from comment #14) > (In reply to Simone Tiraboschi from comment #13) > > (In reply to Yaniv Dary from comment #11) > > > This is not a supported flow. We should add an error. > > > But overall there must be a match between appliance major and minor version > > > and host major and minor version. > > > > Indeed we have, at least on master and 4.0, that's why Nikolai had to force > > it deploying the engine VM via PXE. > > PXE is not supported anymore, only appliance. Let's add a warning in the > engine as well then when you add a 3.6 host to 4.0 cluster. If PXE is not supported, then why it still exposed to customer? (Originally by Nikolai Sednev)
*** Bug 1375240 has been marked as a duplicate of this bug. *** (Originally by Yaniv Dary)
(In reply to Nikolai Sednev from comment #15) > (In reply to Yaniv Dary from comment #14) > > (In reply to Simone Tiraboschi from comment #13) > > > (In reply to Yaniv Dary from comment #11) > > > > This is not a supported flow. We should add an error. > > > > But overall there must be a match between appliance major and minor version > > > > and host major and minor version. > > > > > > Indeed we have, at least on master and 4.0, that's why Nikolai had to force > > > it deploying the engine VM via PXE. > > > > PXE is not supported anymore, only appliance. Let's add a warning in the > > engine as well then when you add a 3.6 host to 4.0 cluster. > > If PXE is not supported, then why it still exposed to customer? It will be removed in the next version. We have put a message on deprecation in the docs. (Originally by Yaniv Dary)
Removing flag until doc text if provided. (Originally by Yaniv Dary)
4.0.6 has been the last oVirt 4.0 release, please re-target this bug. (Originally by Sandro Bonazzola)
May you please provide the target release?
Verified on # Host # rpm -qa | grep vdsm vdsm-jsonrpc-4.17.37-1.el7ev.noarch vdsm-xmlrpc-4.17.37-1.el7ev.noarch vdsm-cli-4.17.37-1.el7ev.noarch vdsm-4.17.37-1.el7ev.noarch vdsm-infra-4.17.37-1.el7ev.noarch vdsm-yajsonrpc-4.17.37-1.el7ev.noarch vdsm-python-4.17.37-1.el7ev.noarch vdsm-hook-vmfex-dev-4.17.37-1.el7ev.noarch # Engine # rpm -qa | grep rhevm rhevm-doc-4.0.7-1.el7ev.noarch rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch rhevm-branding-rhev-4.0.0-7.el7ev.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-guest-agent-common-1.0.12-4.el7ev.noarch rhevm-4.0.7.3-0.1.el7ev.noarch rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch rhevm-setup-plugins-4.0.0.3-1.el7ev.noarch When I the HE deploy add the host I can see normal message under the engine log: 2017-02-28 16:20:07,365 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [480bdf83] Correlation ID: 480bdf83, Job ID: c7d066be-3e5c-4c1a-8962-057c9f41d0b9, Call Stack: null, Custom Event ID: -1, Message: Host hosted_engine_1 is compatible with versions (3.4,3.5,3.6) and cannot join Cluster Default which is set to version 4.0.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0542.html