Bug 1375573 - engine should fail nicely when adding a 3.6 host to a 4.0 cluster.
Summary: engine should fail nicely when adding a 3.6 host to a 4.0 cluster.
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.HostedEngine
Version: 3.6.7
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ovirt-4.0.7
: ---
Assignee: Dominik Holler
QA Contact: Nikolai Sednev
URL:
Whiteboard:
: 1375240 (view as bug list)
Depends On:
Blocks: 1356127 1421149
TreeView+ depends on / blocked
 
Reported: 2016-09-13 13:14 UTC by Nikolai Sednev
Modified: 2017-05-11 11:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1421149 (view as bug list)
Environment:
Last Closed: 2017-02-11 07:27:50 UTC
oVirt Team: Network
Embargoed:
rule-engine: ovirt-4.0.z+


Attachments (Terms of Use)
sosreport from host (6.94 MB, application/x-xz)
2016-09-13 13:18 UTC, Nikolai Sednev
no flags Details
sosreport from engine (6.57 MB, application/x-xz)
2016-09-13 13:19 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 67596 0 master MERGED engine: improve handling of 3.6 hosts in 4.x clusters 2016-12-13 09:54:41 UTC
oVirt gerrit 68339 0 ovirt-engine-4.0 MERGED engine: improve handling of 3.6 hosts in 4.x clusters 2016-12-14 11:58:33 UTC

Description Nikolai Sednev 2016-09-13 13:14:01 UTC
Description of problem:
1.-I've deployed over NFS storage and using PXE, a new and clean Red Hat Virtualization Manager Version: 4.0.4.2-0.1.el7ev, on 3.6.9 el7.2 host.
2.-Got this:
[ INFO  ] Still waiting for VDSM host to become operational...
          The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
          Please try to activate it via the engine webadmin UI.
          Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]?
          The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
          Please try to activate it via the engine webadmin UI.
          Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? 

3.-From the WEBUI I've tried to activate the host and got it in "Unassigned" status.

4.-From shell I see these:

hosted-engine --vm-status
Failed to connect to broker, the number of errors has exceeded the limit (1)
Cannot connect to the HA daemon, please check the logs.
In /var/log/messages I see:
Sep 13 15:34:14 alma03 journal: vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/shar
e/vdsm/API.py", line 1856, in _getHaInfo#012    stats = instance.get_all_stats()#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_h
a/client/client.py", line 102, in get_all_stats#012    with broker.connection(self._retries, self._wait):#012  File "/usr/lib64/python2.7/contextl
ib.py", line 17, in __enter__#012    return self.gen.next()#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
 line 99, in connection#012    self.connect(retries, wait)#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 78, in connect#012    raise BrokerConnectionError(error_msg)#012BrokerConnectionError: Failed to connect to broker, the number of errors has 
exceeded the limit (1)

5.-Both ovirt-ha-agent and broker services were dead, so I manually started them on:
# systemctl status ovirt-ha-agent
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
# systemctl status ovirt-ha-broker
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

# systemctl start ovirt-ha-agent
# systemctl start ovirt-ha-broker
# systemctl status ovirt-ha-agent -l
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-09-13 15:43:33 IDT; 1min 47s ago
 Main PID: 57265 (ovirt-ha-agent)
   CGroup: /system.slice/ovirt-ha-agent.service
           └─57265 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: WARNING:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to find OVF_STORE
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR 'version' is not stored in the HE configuration image
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:'version' is not stored in the HE configuration image
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineUp (score: 3400)

# systemctl status ovirt-ha-broker -l
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-09-13 15:43:33 IDT; 1min 58s ago
 Main PID: 57264 (ovirt-ha-broker)
   CGroup: /system.slice/ovirt-ha-broker.service
           └─57264 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon

Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:mem_free.MemFree:memFree: 27052
Sep 13 15:45:23 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established
Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed
Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established
Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed
Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:mem_free.MemFree:memFree: 27054

6.-Got these from vdsm:
# systemctl status vdsmd -l
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2016-09-13 15:31:01 IDT; 15min ago
 Main PID: 55199 (vdsm)
   CGroup: /system.slice/vdsmd.service
           ├─55199 /usr/bin/python /usr/share/vdsm/vdsm
           ├─55319 /usr/libexec/ioprocess --read-pipe-fd 33 --write-pipe-fd 32 --max-threads 10 --max-queued-requests 101.-I've deployed over NFS storage and using PXE, a new and clean Red Hat Virtualization Manager Version: 4.0.4.2-0.1.el7ev, on 3.6.9 el7.2 host.
2.-Got this:
[ INFO  ] Still waiting for VDSM host to become operational...
          The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
          Please try to activate it via the engine webadmin UI.
          Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]?
          The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
          Please try to activate it via the engine webadmin UI.
          Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]? 

3.-From the WEBUI I've tried to activate the host and got it in "Unassigned" status.

4.-From shell I see these:

hosted-engine --vm-status
Failed to connect to broker, the number of errors has exceeded the limit (1)
Cannot connect to the HA daemon, please check the logs.
In /var/log/messages I see:
Sep 13 15:34:14 alma03 journal: vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/shar
e/vdsm/API.py", line 1856, in _getHaInfo#012    stats = instance.get_all_stats()#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_h
a/client/client.py", line 102, in get_all_stats#012    with broker.connection(self._retries, self._wait):#012  File "/usr/lib64/python2.7/contextl
ib.py", line 17, in __enter__#012    return self.gen.next()#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
 line 99, in connection#012    self.connect(retries, wait)#012  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 78, in connect#012    raise BrokerConnectionError(error_msg)#012BrokerConnectionError: Failed to connect to broker, the number of errors has 
exceeded the limit (1)

5.-Both ovirt-ha-agent and broker services were dead, so I manually started them on:
# systemctl status ovirt-ha-agent
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
# systemctl status ovirt-ha-broker
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

# systemctl start ovirt-ha-agent
# systemctl start ovirt-ha-broker
# systemctl status ovirt-ha-agent -l
● ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring Agent
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-09-13 15:43:33 IDT; 1min 47s ago
 Main PID: 57265 (ovirt-ha-agent)
   CGroup: /system.slice/ovirt-ha-agent.service
           └─57265 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon

Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Preparing images
Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.lib.image.Image:Preparing images
Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Reloading vm.conf from the shared storage domain
Sep 13 15:45:13 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Trying to get a fresher copy of vm configuration from the OVF_STORE
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: WARNING:ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore:Unable to find OVF_STORE
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:Unable to get vm.conf from OVF_STORE, falling back to initial vm.conf
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config ERROR 'version' is not stored in the HE configuration image
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: ERROR:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config:'version' is not stored in the HE configuration image
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-agent[57265]: INFO:ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:Current state EngineUp (score: 3400)

# systemctl status ovirt-ha-broker -l
● ovirt-ha-broker.service - oVirt Hosted Engine High Availability Communications Broker
   Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2016-09-13 15:43:33 IDT; 1min 58s ago
 Main PID: 57264 (ovirt-ha-broker)
   CGroup: /system.slice/ovirt-ha-broker.service
           └─57264 /usr/bin/python /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon

Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed
Sep 13 15:45:14 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:mem_free.MemFree:memFree: 27052
Sep 13 15:45:23 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established
Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed
Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection established
Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:ovirt_hosted_engine_ha.broker.listener.ConnectionHandler:Connection closed
Sep 13 15:45:24 alma03.qa.lab.tlv.redhat.com ovirt-ha-broker[57264]: INFO:mem_free.MemFree:memFree: 27054

6.-Got these from vdsm:
# systemctl status vdsmd -l
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2016-09-13 15:31:01 IDT; 15min ago
 Main PID: 55199 (vdsm)
   CGroup: /system.slice/vdsmd.service
           ├─55199 /usr/bin/python /usr/share/vdsm/vdsm
           ├─55319 /usr/libexec/ioprocess --read-pipe-fd 33 --write-pipe-fd 32 --max-threads 10 --max-queued-requests 10
           ├─55328 /usr/libexec/ioprocess --read-pipe-fd 42 --write-pipe-fd 41 --max-threads 10 --max-queued-requests 10
           ├─57294 /usr/libexec/ioprocess --read-pipe-fd 67 --write-pipe-fd 66 --max-threads 10 --max-queued-requests 10
           └─57322 /usr/libexec/ioprocess --read-pipe-fd 53 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10

Sep 13 15:32:49 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:32:49 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:10 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:10 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:30 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:30 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:51 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:51 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:34:14 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:34:14 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)

7.-Deployment failed after I was waiting for some time:
[ INFO  ] Still waiting for VDSM host to become operational...
          The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
          Please try to activate it via the engine webadmin UI.
          Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]?
          The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
          Please try to activate it via the engine webadmin UI.
          Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]?
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ ERROR ] Timed out while waiting for host to start. Please check the logs.
[ ERROR ] Unable to add alma03.qa.lab.tlv.redhat.com to the manager
[ INFO  ] Saving hosted-engine configuration on the shared storage domain
          Please shutdown the VM allowing the system to launch it as a monitored service.
          The system will wait until the VM is down.


Sosreport from host and the engine are attached.
           ├─55328 /usr/libexec/ioprocess --read-pipe-fd 42 --write-pipe-fd 41 --max-threads 10 --max-queued-requests 10
           ├─57294 /usr/libexec/ioprocess --read-pipe-fd 67 --write-pipe-fd 66 --max-threads 10 --max-queued-requests 10
           └─57322 /usr/libexec/ioprocess --read-pipe-fd 53 --write-pipe-fd 51 --max-threads 10 --max-queued-requests 10

Sep 13 15:32:49 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:32:49 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:10 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:10 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:30 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:30 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:51 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:33:51 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:34:14 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Sep 13 15:34:14 alma03.qa.lab.tlv.redhat.com vdsm[55199]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/share/vdsm/API.py", line 1856, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                              with broker.connection(self._retries, self._wait):
                                                            File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                              return self.gen.next()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                              self.connect(retries, wait)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                              raise BrokerConnectionError(error_msg)
                                                          BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)

7.-Deployment failed after I was waiting for some time:
[ INFO  ] Still waiting for VDSM host to become operational...
          The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
          Please try to activate it via the engine webadmin UI.
          Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]?
          The host alma03.qa.lab.tlv.redhat.com is in non-operational state.
          Please try to activate it via the engine webadmin UI.
          Retry checking host status or ignore this and continue (Retry, Ignore)[Retry]?
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ ERROR ] Timed out while waiting for host to start. Please check the logs.
[ ERROR ] Unable to add alma03.qa.lab.tlv.redhat.com to the manager
[ INFO  ] Saving hosted-engine configuration on the shared storage domain
          Please shutdown the VM allowing the system to launch it as a monitored service.
          The system will wait until the VM is down.


Sosreport from host and the engine are attached.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1.Deploy hosted engine 4.0 using PXE over NFS on 3.6.9 host.
2.
3.

Actual results:
Deployment failed.

Expected results:
Deployment should succeed.

Additional info:
Sosreports from engine and host.

Comment 1 Nikolai Sednev 2016-09-13 13:17:38 UTC
This bug is opened from https://bugzilla.redhat.com/show_bug.cgi?id=1356127#c11.
adding components from host:
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
ovirt-hosted-engine-setup-1.3.7.3-1.el7ev.noarch
sanlock-3.2.4-3.el7_2.x86_64
libvirt-client-1.2.17-13.el7_2.5.x86_64
mom-0.5.6-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
vdsm-4.17.35-1.el7ev.noarch
rhevm-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.5.8-1.el7ev.noarch
rhev-release-3.6.9-2-001.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
Linux version 3.10.0-327.28.3.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Fri Aug 12 13:21:05 EDT 2016
Linux 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Comment 2 Nikolai Sednev 2016-09-13 13:18:25 UTC
Created attachment 1200497 [details]
sosreport from host

Comment 3 Nikolai Sednev 2016-09-13 13:19:53 UTC
Created attachment 1200499 [details]
sosreport from engine

Comment 4 Simone Tiraboschi 2016-09-13 14:42:12 UTC
Dan,
here Nikolai is trying to deploy hosted-engine using a 4.0 engine on a 3.6 host and the host is not coming up due to:

2016-09-13 15:31:06,523 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (org.ovirt.thread.pool-6-thread-1) [7c16adab] HostName = alma03.qa.lab.tlv.redhat.com
2016-09-13 15:31:06,524 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (org.ovirt.thread.pool-6-thread-1) [7c16adab] Failed in 'CollectVdsNetworkDataAfterInstallationVDS' method, for vds: 'alma03.qa.lab.tlv.redhat.com'; host: 'alma03.qa.lab.tlv.redhat.com': Required SwitchType is not reported.
2016-09-13 15:31:06,525 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CollectVdsNetworkDataAfterInstallationVDSCommand] (org.ovirt.thread.pool-6-thread-1) [7c16adab] Command 'CollectVdsNetworkDataAfterInstallationVDSCommand(HostName = alma03.qa.lab.tlv.redhat.com, CollectHostNetworkDataVdsCommandParameters:{runAsync='true', hostId='cef7ba08-33c3-4a9f-9646-d2b5a42d347f', vds='Host[alma03.qa.lab.tlv.redhat.com,cef7ba08-33c3-4a9f-9646-d2b5a42d347f]'})' execution failed: Required SwitchType is not reported.
2016-09-13 15:31:06,526 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (org.ovirt.thread.pool-6-thread-1) [7c16adab] transaction rolled back
2016-09-13 15:31:06,526 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (org.ovirt.thread.pool-6-thread-1) [7c16adab] Exception: org.ovirt.engine.core.common.errors.EngineException: EngineException: java.lang.IllegalStateException: Required SwitchType is not reported. (Failed with error ENGINE and code 5001)

Is it somehow related to the OVS integration since 4.0?

hosted-engine-setup is not lowering the cluster compatibility level since AFAIK it requires an host to be active.

Comment 5 Dan Kenigsberg 2016-09-13 15:34:31 UTC
Which Engine is it, exactly? Using switchType=Legacy as a default should have been fixed by bug 1373112 in ovirt-engine-4.0.2.7-0.1.el7ev

Comment 6 Simone Tiraboschi 2016-09-13 16:39:47 UTC
(In reply to Dan Kenigsberg from comment #5)
> Which Engine is it, exactly? Using switchType=Legacy as a default should
> have been fixed by bug 1373112 in ovirt-engine-4.0.2.7-0.1.el7ev

ovirt-engine-4.0.4.2-0.1.el7ev.noarch so it seams still there.

Comment 7 Dan Kenigsberg 2016-09-14 05:49:59 UTC
Martin, could you see how this can be?

Comment 8 Red Hat Bugzilla Rules Engine 2016-09-14 05:50:06 UTC
Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone.

Comment 9 Dan Kenigsberg 2016-09-14 08:08:00 UTC
(In reply to Dan Kenigsberg from comment #5, which had a wrong bz# mentioned)

Very odd, since 1367483 has been verified, and Ib09c1c826919c8e5084148dd9599612f20f11938 is in ovirt-engine-4.0.4 branch.

Comment 10 Martin Mucha 2016-09-14 08:18:45 UTC
looking at master code (with limited logs), this function can produce this error:

private static SwitchType getSwitchType(Version clusterVersion, Map<String, Object> networkProperties) {
        Object switchType = networkProperties.get(VdsProperties.SWITCH_KEY);
        boolean switchTypeShouldBeReportedByVdsm = FeatureSupported.ovsSupported(clusterVersion);

        if (switchTypeShouldBeReportedByVdsm && switchType == null) {
            throw new IllegalStateException("Required SwitchType is not reported.");
        }

        return SwitchType.parse(Objects.toString(switchType, SwitchType.LEGACY.getOptionValue()));
    }

so it means this: in given 'clusterVersion' switchType is supported; we know that by reading FeatureSupported.ovsSupported(clusterVersion). If ovs is supported, adequeate (also supporting) vdsm should be used. And in that case switchType should be reported. 

So the reason here can be, that incompatible engine and vdsm are used at the same time.

——
comparing master to 4.0.4 it seems there are no differences in this method — all related code was backported.

Comment 11 Yaniv Lavi 2016-09-14 08:21:45 UTC
This is not a supported flow. We should add an error.
But overall there must be a match between appliance major and minor version and host major and minor version.

Comment 12 Simone Tiraboschi 2016-09-14 08:22:34 UTC
(In reply to Martin Mucha from comment #10)
> So the reason here can be, that incompatible engine and vdsm are used at the
> same time.

The engine was at 4.0.4 and VDSM at 3.6.9

Comment 13 Simone Tiraboschi 2016-09-14 08:23:52 UTC
(In reply to Yaniv Dary from comment #11)
> This is not a supported flow. We should add an error.
> But overall there must be a match between appliance major and minor version
> and host major and minor version.

Indeed we have, at least on master and 4.0, that's why Nikolai had to force it deploying the engine VM via PXE.

Comment 14 Yaniv Lavi 2016-09-14 08:27:42 UTC
(In reply to Simone Tiraboschi from comment #13)
> (In reply to Yaniv Dary from comment #11)
> > This is not a supported flow. We should add an error.
> > But overall there must be a match between appliance major and minor version
> > and host major and minor version.
> 
> Indeed we have, at least on master and 4.0, that's why Nikolai had to force
> it deploying the engine VM via PXE.

PXE is not supported anymore, only appliance. Let's add a warning in the engine as well then when you add a 3.6 host to 4.0 cluster.

Comment 15 Nikolai Sednev 2016-09-14 08:31:10 UTC
(In reply to Yaniv Dary from comment #14)
> (In reply to Simone Tiraboschi from comment #13)
> > (In reply to Yaniv Dary from comment #11)
> > > This is not a supported flow. We should add an error.
> > > But overall there must be a match between appliance major and minor version
> > > and host major and minor version.
> > 
> > Indeed we have, at least on master and 4.0, that's why Nikolai had to force
> > it deploying the engine VM via PXE.
> 
> PXE is not supported anymore, only appliance. Let's add a warning in the
> engine as well then when you add a 3.6 host to 4.0 cluster.

If PXE is not supported, then why it still exposed to customer?

Comment 16 Yaniv Lavi 2016-09-14 12:11:44 UTC
*** Bug 1375240 has been marked as a duplicate of this bug. ***

Comment 17 Yaniv Lavi 2016-09-14 15:06:07 UTC
(In reply to Nikolai Sednev from comment #15)
> (In reply to Yaniv Dary from comment #14)
> > (In reply to Simone Tiraboschi from comment #13)
> > > (In reply to Yaniv Dary from comment #11)
> > > > This is not a supported flow. We should add an error.
> > > > But overall there must be a match between appliance major and minor version
> > > > and host major and minor version.
> > > 
> > > Indeed we have, at least on master and 4.0, that's why Nikolai had to force
> > > it deploying the engine VM via PXE.
> > 
> > PXE is not supported anymore, only appliance. Let's add a warning in the
> > engine as well then when you add a 3.6 host to 4.0 cluster.
> 
> If PXE is not supported, then why it still exposed to customer?

It will be removed in the next version.
We have put a message on deprecation in the docs.

Comment 18 Yaniv Lavi 2016-09-14 22:40:04 UTC
Removing flag until doc text if provided.

Comment 19 Sandro Bonazzola 2017-01-25 07:54:52 UTC
4.0.6 has been the last oVirt 4.0 release, please re-target this bug.


Note You need to log in before you can comment on or make changes to this bug.