Bug 1301571

Summary: [hosted-engine-ha] Over iSCSI, VM doesn't start automatically; "failed to retrieve Hosted Engine HA info"
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Elad <ebenahar>
Component: BrokerAssignee: Martin Sivák <msivak>
Status: CLOSED WORKSFORME QA Contact: Elad <ebenahar>
Severity: high Docs Contact:
Priority: high    
Version: 1.3.3.6CC: acanan, bugs, dfediuck, ebenahar, stirabos, ylavi
Target Milestone: ovirt-3.6.3Keywords: Regression
Target Release: ---Flags: ylavi: ovirt-3.6.z?
rule-engine: blocker?
ebenahar: planning_ack?
ebenahar: devel_ack?
ebenahar: testing_ack?
Hardware: x86_64   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-26 12:27:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
HE logs, vdsm.log, messages none

Description Elad 2016-01-25 12:08:40 UTC
Created attachment 1117959 [details]
HE logs, vdsm.log, messages

Description of problem:
Deployed hosted-engine over iSCSI. During deployment, got the following error message in setup.log:

Jan 25 12:10:14 green-vdsc.qa.lab.tlv.redhat.com vdsm[1005]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                             Traceback (most recent call last):
                                                               File "/usr/share/vdsm/API.py", line 1842, in _getHaInfo
                                                                 stats = instance.get_all_stats()
                                                               File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                                 with broker.connection(self._retries, self._wait):
                                                               File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                                 return self.gen.next()
                                                               File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                                 self.connect(retries, wait)
                                                               File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                                 raise BrokerConnectionError(error_msg)
                                                             BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)
Jan 25 12:10:31 green-vdsc.qa.lab.tlv.redhat.com vdsm[1005]: vdsm ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to broker, the number of errors has exceeded the limit (1)
Jan 25 12:10:31 green-vdsc.qa.lab.tlv.redhat.com vdsm[1005]: vdsm vds ERROR failed to retrieve Hosted Engine HA info
                                                             Traceback (most recent call last):
                                                               File "/usr/share/vdsm/API.py", line 1842, in _getHaInfo
                                                                 stats = instance.get_all_stats()
                                                               File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 102, in get_all_stats
                                                                 with broker.connection(self._retries, self._wait):
                                                               File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
                                                                 return self.gen.next()
                                                               File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection
                                                                 self.connect(retries, wait)
                                                               File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect
                                                                 raise BrokerConnectionError(error_msg)
                                                             BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (1)


HE VM did not start automatically.

Version-Release number of selected component (if applicable):
ovirt-vmconsole-1.0.0-1.el7ev.noarch
ovirt-host-deploy-1.4.1-1.el7ev.noarch
ovirt-setup-lib-1.0.1-1.el7ev.noarch
ovirt-vmconsole-host-1.0.0-1.el7ev.noarch
libgovirt-0.3.3-1.el7_2.1.x86_64
ovirt-hosted-engine-setup-1.3.2.3-1.el7ev.noarch
ovirt-hosted-engine-ha-1.3.3.7-1.el7ev.noarch
vdsm-xmlrpc-4.17.18-0.el7ev.noarch
vdsm-4.17.18-0.el7ev.noarch
vdsm-python-4.17.18-0.el7ev.noarch
vdsm-hook-vmfex-dev-4.17.18-0.el7ev.noarch
vdsm-jsonrpc-4.17.18-0.el7ev.noarch
vdsm-yajsonrpc-4.17.18-0.el7ev.noarch
vdsm-cli-4.17.18-0.el7ev.noarch
vdsm-infra-4.17.18-0.el7ev.noarch


How reproducible:
Over iSCSI - Always

Steps to Reproduce:
1. Deploy hosted engine over iSCSI


Actual results:
At the end of the deployment, HE VM does not start automatically:

[root@green-vdsc ~]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : green-vdsc.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : 2e0a207d
Host timestamp                     : 73063


Both ha-broker and ha-agent services are active.
Started manually the VM using --vm-start successfully.

Expected results:
HA VM should start automatically at the end of the deployment

Additional info:
HE logs, vdsm.log, messages

Comment 1 Red Hat Bugzilla Rules Engine 2016-01-25 12:44:08 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 2 Martin Sivák 2016-01-25 13:12:02 UTC
Elad, did you use clean storage? How did you install the host?

The tracebacks seem to be of no consequence as the reason for not starting the VM automatically is actually pretty simple:

Local maintenance                  : True

I do not think this is a bug since you were able to start the VM manually and get the status correctly.

Can you please explain how the maintenance mode happened? And try hosted-engine --set-maintenance --mode=none to see whether it will start the vm automatically (it might take a minute or so to initiate the start)?

Comment 3 Elad 2016-01-26 09:17:10 UTC
(In reply to Martin Sivák from comment #2)
> Elad, did you use clean storage? How did you install the host?

The storage I'm using is clean, I'm creating a new LUN for each HE deployment and cleaning the old ones.

> The tracebacks seem to be of no consequence as the reason for not starting
> the VM automatically is actually pretty simple:
> 
> Local maintenance                  : True
> 
> I do not think this is a bug since you were able to start the VM manually
> and get the status correctly.
> 
> Can you please explain how the maintenance mode happened? And try
> hosted-engine --set-maintenance --mode=none to see whether it will start the
> vm automatically (it might take a minute or so to initiate the start)?


I did not do anything to make this happen, just regular deployment over iSCSI

I tested this 4 times over 2 different hosts, reproduced 4/4

Comment 4 Martin Sivák 2016-01-26 10:20:29 UTC
Hmm, were the hosts clean? No old hosted engine config files or so?

Simone: are we setting the maintenance mode during deploy somehow?

Comment 5 Simone Tiraboschi 2016-01-26 10:23:33 UTC
(In reply to Martin Sivák from comment #4)
> Simone: are we setting the maintenance mode during deploy somehow?

No, we don't

Comment 6 Elad 2016-01-26 12:27:51 UTC
Martin, following you comment #4, I re-installed my host and before deploying. At the end of the deployment, VM started automatically. 
Closing as WORKSFORME.