Bug 1123285 - After hosted-engine --deploy process finished ovirt-ha-broker and agent still down
Summary: After hosted-engine --deploy process finished ovirt-ha-broker and agent still...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-hosted-engine-ha
Version: 3.5
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.5.0
Assignee: Jiri Moskovcak
QA Contact: Artyom
URL:
Whiteboard: sla
Depends On:
Blocks: 1036683 1076944 1093366 1093621 1093638 1123006
TreeView+ depends on / blocked
 
Reported: 2014-07-25 08:52 UTC by Artyom
Modified: 2016-02-10 19:42 UTC (History)
10 users (show)

Fixed In Version: ovirt-3.5.0_rc1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-10-17 12:21:47 UTC
oVirt Team: SLA


Attachments (Terms of Use)
logs (285.37 KB, application/zip)
2014-07-25 08:52 UTC, Artyom
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 30696 None None None Never
oVirt gerrit 30844 master MERGED init: Propagate status return value to start and stop functions Never
oVirt gerrit 30873 ovirt-hosted-engine-ha-1.2 MERGED init: Propagate status return value to start and stop functions Never

Description Artyom 2014-07-25 08:52:49 UTC
Created attachment 920912 [details]
logs

Description of problem:
After hosted-engine --deploy process finished ovirt-ha-broker and agent still down, and it can't be activated via:
service ovirt-ha-broker start && service ovirt-ha-agent start
Only errors that I can find in messages:
Jul 25 11:44:40 rose05 vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1637, in _getHaInfo#012    stats = instance.get_all_stats()#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 97, in get_all_stats#012    with broker.connection():#012  File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__#012    return self.gen.next()#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection#012    self.connect()#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect#012    raise BrokerConnectionError(error_msg)#012BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (0)

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-1.2.1-0.2.master.20140724142825.el6.noarch

How reproducible:
Always

Steps to Reproduce:
1. Finish hosted-engine --deploy process
2.
3.

Actual results:
service ovirt-ha-agent and ovirt-ha-broker not started and also can't be started via service

Expected results:
After hosted-engine --deploy finished, service must start without any problems

Additional info:

Comment 1 Artyom 2014-07-27 12:14:46 UTC
I don't sure it's the single problem, because I received this problem after change:
Jul 27 14:33:40 master-vds10 vdsm vds ERROR failed to retrieve Hosted Engine HA info#012Traceback (most recent call last):#012  File "/usr/share/vdsm/API.py", line 1637, in _getHaInfo#012    stats = instance.get_all_stats()#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", line 97, in get_all_stats#012    with broker.connection():#012  File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__#012    return self.gen.next()#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 99, in connection#012    self.connect()#012  File "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 78, in connect#012    raise BrokerConnectionError(error_msg)#012BrokerConnectionError: Failed to connect to broker, the number of errors has exceeded the limit (5)
Also start not work on this services, and on boot services also down,
only one possibility to run this services via:
service ovirt-ha-broker restart && service ovirt-ha-agent restart

Comment 2 Jiri Moskovcak 2014-07-28 13:40:57 UTC
Artyom, can you please be more specific about your steps? I understand you've tried to manually apply the patch as I suggested. What were the next steps you did?

Comment 3 Greg Padgett 2014-07-29 23:07:29 UTC
Looks like there is a second issue here.  During test day, I discovered that 'systemctl start ovirt-ha-broker.service' didn't start the service.  After some digging, I found http://gerrit.ovirt.org/#/c/29574/ which has the side-effect of causing the 'start' command to not execute the code to actually start the service.  In my case, reverting the commit (allowing the rh_status() exit code to propagate) fixed the issue and allowed the broker to start up.

Comment 4 Jiri Moskovcak 2014-07-30 07:23:33 UTC
(In reply to Greg Padgett from comment #3)
> Looks like there is a second issue here.  During test day, I discovered that
> 'systemctl start ovirt-ha-broker.service' didn't start the service.  After
> some digging, I found http://gerrit.ovirt.org/#/c/29574/ which has the
> side-effect of causing the 'start' command to not execute the code to
> actually start the service.  In my case, reverting the commit (allowing the
> rh_status() exit code to propagate) fixed the issue and allowed the broker
> to start up.

- my patch actually makes the status exit code to propagate, otherwise it always returns 0

- the weird thing here is that depsite we have .service file systemctl is influenced by the init file, there is something fishy with it...

Comment 5 Greg Padgett 2014-07-30 13:48:03 UTC
(In reply to Jiri Moskovcak from comment #4)
> - my patch actually makes the status exit code to propagate, otherwise it
> always returns 0

Agreed, the status was broken and fixed by your patch.  Looks like it needs to not only store the return status in RETVAL, but also return it as the function return value as well so that stop and start can use it.  I've got a simple patch which I've just submitted [1], please have a look and see what you think.

> - the weird thing here is that depsite we have .service file systemctl is
> influenced by the init file, there is something fishy with it...

Ahh, it's because the HA service files use the init files' start and stop routines.

[1] http://gerrit.ovirt.org/#/c/30844/

Comment 6 Artyom 2014-08-03 08:24:16 UTC
I change value to 5 and tried to start ovirt-ha-broker and agent:
service ovirt-ha-broker start && service ovirt-ha-agent start it not success with the same error you can see error above,
So I tried to run restart and it worked, from some reason only restart for service work and simple start not work, problem that when you finish deployment you run start service, the same when you restart host.

Comment 7 Artyom 2014-08-07 13:29:42 UTC
I checked tha build ovirt-hosted-engine-setup-1.2.0-0.1.master.el6.noarch, already include patch and works fine.
So you can move it ON_QA?

Comment 8 Artyom 2014-08-11 08:00:59 UTC
Verified on ovirt-hosted-engine-ha-1.2.1-0.2.master.20140805072346.el6.noarch

Comment 9 Sandro Bonazzola 2014-10-17 12:21:47 UTC
oVirt 3.5 has been released and should include the fix for this issue.


Note You need to log in before you can comment on or make changes to this bug.