Bug 1517500

Summary: OPS Tools | Availability Monitoring | Octavia dockers monitoring support
Product: Red Hat OpenStack Reporter: Alexander Stafeyev <astafeye>
Component: openstack-tripleo-commonAssignee: Martin Magr <mmagr>
Status: CLOSED ERRATA QA Contact: Alexander Stafeyev <astafeye>
Severity: medium Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: amuller, apannu, astafeye, bcafarel, bschmaus, cgoncalves, emacchi, ihrachys, jamsmith, jbadiapa, jlibosva, jschluet, lars, lpeer, majopela, mburns, mmagr, mrunge, nyechiel, rlopez, rmccabe, scorcora, slinaber
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.4-4.el7ost openstack-tripleo-common-8.6.3-3.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-29 16:34:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1613662    
Bug Blocks: 1433523    

Description Alexander Stafeyev 2017-11-26 09:15:34 UTC
RFE- 

We would like to have monitoring for Octavia containers. 

1. Octavia API 
2. Octavia worker
3. Octavia health manager
4. Octavia HouseKeeper manager

A setup is deployed with Octavia when we add the following to the overcloud deploy command: 

-e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/octavia.yaml

Comment 2 Matthias Runge 2017-12-06 18:41:18 UTC
The health check is maintained by the DFG maintaining the component. 
A health check would probably look like these here: https://review.openstack.org/#/q/If5b77481330fa697f1bab16696acb70075052d4f

Comment 3 Martin Magr 2018-01-26 19:53:01 UTC
I started adding health checks for containers that are missing them, so I can write patches for octavia containers. The only problem is that I have no idea how to correctly check each service if it is alive. So please answer me following questions:

1. Octavia API
 - on which port the api server listens, is there a special url to get health status
2. Octavia worker
 - does the service connect to other service or listen on any port?
 - is there any way to get health status from the service?
3. Octavia health manager
 - does the service connect to other service/listen on any port?
 - is there any way to get health status from the service?
4. Octavia HouseKeeper manager
 - does the service connect to other service/listen on any port?
 - is there any way to get health status from the service?

Comment 4 Martin Magr 2018-02-01 14:51:51 UTC
Moving this BZ under DFG. This does not mean I'm not willing to work on this task.

Comment 5 Martin Magr 2018-02-01 14:52:54 UTC
*under proper

Comment 7 Carlos Goncalves 2018-02-06 11:28:19 UTC
1. Octavia API 
  - listens on TCP 9876 (internal and public endpoints)

2. Octavia worker
  - connects to oslo messaging (AMQP internal:5672)
  - REST API calls to nova, neutron, glance

3. Octavia health manager
  -  listens on UDP 5555 (get IP from 'o-hm0' host iface)

4. Octavia HouseKeeper manager
  - connects to DB server (MySQL)


None of the Octavia services provide a special URL to get health status.

Comment 8 Carlos Goncalves 2018-02-06 14:26:05 UTC
As per comment 4, assigning to Martin.

Comment 21 Carlos Goncalves 2018-07-06 12:13:49 UTC
I'm flipping status to POST as I believe all required patches have been merged and backported upstream to stable/queens.

Martin, please ACK/NACK.

Comment 22 Martin Magr 2018-07-09 11:23:37 UTC
NACK. We still need https://review.openstack.org/#/c/555252/ to be backported to stable/queens. This went of my radar. Sorry for the delay.

Comment 23 Bernard Cafarelli 2018-07-20 14:41:21 UTC
*** Bug 1603240 has been marked as a duplicate of this bug. ***

Comment 34 Alexander Stafeyev 2018-08-07 07:01:45 UTC
Hi, 
What could be proper verification steps pls? 

Thanks

Comment 35 Martin Magr 2018-08-07 12:50:30 UTC
Octavia containers report healthy or unhealthy status (depending on actual health of those containers) after deployment in output of command 'docker ps --all | grep octavia'.

Comment 36 Alexander Stafeyev 2018-08-08 05:46:15 UTC
[root@overcloud-controller-1 ~]# docker ps | grep octa
1b13b1974797        registry.access.redhat.com/rhosp13/openstack-octavia-health-manager:latest   "kolla_start"            20 hours ago        Up 20 hours (unhealthy)                       octavia_health_manager
2c6502229b83        registry.access.redhat.com/rhosp13/openstack-octavia-api:latest              "kolla_start"            20 hours ago        Up 20 hours (unhealthy)                       octavia_api
5a08540e9372        registry.access.redhat.com/rhosp13/openstack-octavia-housekeeping:latest     "kolla_start"            20 hours ago        Up 20 hours (unhealthy)                       octavia_housekeeping
d47a8ded4b82        registry.access.redhat.com/rhosp13/openstack-octavia-worker:latest           "kolla_start"            20 hours ago        Up 20 hours (healthy)                         octavia_worker
[root@overcloud-controller-1 ~]# docker exec octavia_health_manager /openstack/healthcheck
There is no octavia-health- process with opened RabbitMQ ports (5671,5672) running in the container
[root@overcloud-controller-1 ~]# docker exec octavia_api /openstack/healthcheck
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/openstack/healthcheck\": stat /openstack/healthcheck: no such file or directory"

[root@overcloud-controller-1 ~]# docker exec octavia_housekeeping /openstack/healthcheck
There is no octavia-houseke process with opened RabbitMQ ports (5671,5672) running in the container
[root@overcloud-controller-1 ~]# docker exec octavia_worker /openstack/healthcheck
172.17.1.17:5672 - users:(("octavia-worker:",pid=23,fd=8))


(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep openstack | grep trip | grep temp
openstack-tripleo-heat-templates-8.0.4-10.el7ost.noarch

Comment 39 Joanne O'Flynn 2018-08-15 07:39:19 UTC
This bug is marked for inclusion in the errata but does not currently contain draft documentation text. To ensure the timely release of this advisory please provide draft documentation text for this bug as soon as possible.

If you do not think this bug requires errata documentation, set the requires_doc_text flag to "-".


To add draft documentation text:

* Select the documentation type from the "Doc Type" drop down field.

* A template will be provided in the "Doc Text" field based on the "Doc Type" value selected. Enter draft text in the "Doc Text" field.

Comment 40 Nir Magnezi 2018-08-16 12:37:37 UTC
*** Bug 1613662 has been marked as a duplicate of this bug. ***

Comment 41 Carlos Goncalves 2018-08-16 12:42:12 UTC
On my OSP13 environment, all 4 Octavia containers are healthy.


[root@controller-0 heat-admin]# docker ps | grep octavia
95db292d2efc        192.168.24.1:8787/rhosp13/openstack-octavia-health-manager:2018-08-08.2   "kolla_start"            47 hours ago        Up 47 hours (healthy)                         octavia_health_manager
838ede76313c        192.168.24.1:8787/rhosp13/openstack-octavia-api:2018-08-08.2              "kolla_start"            47 hours ago        Up 47 hours (healthy)                         octavia_api
66396fecbe7e        192.168.24.1:8787/rhosp13/openstack-octavia-housekeeping:2018-08-08.2     "kolla_start"            47 hours ago        Up 47 hours (healthy)                         octavia_housekeeping
37e8ee2bf056        192.168.24.1:8787/rhosp13/openstack-octavia-worker:2018-08-08.2           "kolla_start"            47 hours ago        Up 47 hours (healthy)                         octavia_worker

Comment 42 Bernard Cafarelli 2018-08-22 17:57:49 UTC
In comment #36 the healthcheck output message (mentioning RabbitMq) is the old one (before the fix in  openstack-tripleo-common-8.6.3-3.el7ost):
https://github.com/openstack/tripleo-common/commit/dc342858a74c5c89df22343b5931f821bd61e7b9#diff-437c9b0a7f17cb0002622a959732d7f6
So the container did not have the needed version apparently.

That plus comment #41 showing healthy containers, I am moving this bug back to verification step

Comment 43 Alexander Stafeyev 2018-08-23 10:30:56 UTC
[heat-admin@controller-1 ~]$ sudo -i 
[root@controller-1 ~]# docker ps | grep octav
af135e52359d        192.168.24.1:8787/rhosp13/openstack-octavia-health-manager:2018-08-22.2      "kolla_start"            33 minutes ago      Up 26 minutes (healthy)                         octavia_health_manager
a484685b53b9        192.168.24.1:8787/rhosp13/openstack-octavia-api:2018-08-22.2                 "kolla_start"            33 minutes ago      Up 25 minutes (healthy)                         octavia_api
cb0f7ee7f8c5        192.168.24.1:8787/rhosp13/openstack-octavia-housekeeping:2018-08-22.2        "kolla_start"            33 minutes ago      Up 25 minutes (healthy)                         octavia_housekeeping
7c22d3c633a1        192.168.24.1:8787/rhosp13/openstack-octavia-worker:2018-08-22.2              "kolla_start"            33 minutes ago      Up 25 minutes (healthy)                         octavia_worker
[root@controller-1 ~]# cat /etc/yum.repos.d/latest-installed 
13   -p 2018-08-22.2
[root@controller-1 ~]#

Comment 45 errata-xmlrpc 2018-08-29 16:34:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2574