Bug 1517500 - OPS Tools | Availability Monitoring | Octavia dockers monitoring support
Summary: OPS Tools | Availability Monitoring | Octavia dockers monitoring support
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: z2
: 13.0 (Queens)
Assignee: Martin Magr
QA Contact: Alexander Stafeyev
URL:
Whiteboard:
: 1603240 1613662 (view as bug list)
Depends On: 1613662
Blocks: 1433523
TreeView+ depends on / blocked
 
Reported: 2017-11-26 09:15 UTC by Alexander Stafeyev
Modified: 2019-09-10 14:09 UTC (History)
23 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.0.4-4.el7ost openstack-tripleo-common-8.6.3-3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-29 16:34:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 550508 0 None MERGED Activate another set of healthchecks 2020-10-23 17:59:59 UTC
OpenStack gerrit 554946 0 None MERGED Add and fix healthcheck scripts for Octavia services 2020-10-23 17:59:59 UTC
OpenStack gerrit 555252 0 None MERGED Enable octavia-api health check 2020-10-23 17:59:58 UTC
OpenStack gerrit 563022 0 None MERGED Add and fix healthcheck scripts for Octavia services 2020-10-23 17:59:58 UTC
OpenStack gerrit 563024 0 None MERGED Activate another set of healthchecks 2020-10-23 18:00:12 UTC
OpenStack gerrit 581008 0 None MERGED Enable octavia-api health check 2020-10-23 18:00:12 UTC
Red Hat Product Errata RHBA-2018:2574 0 None None None 2018-08-29 16:35:49 UTC

Description Alexander Stafeyev 2017-11-26 09:15:34 UTC
RFE- 

We would like to have monitoring for Octavia containers. 

1. Octavia API 
2. Octavia worker
3. Octavia health manager
4. Octavia HouseKeeper manager

A setup is deployed with Octavia when we add the following to the overcloud deploy command: 

-e /usr/share/openstack-tripleo-heat-templates/environments/services-docker/octavia.yaml

Comment 2 Matthias Runge 2017-12-06 18:41:18 UTC
The health check is maintained by the DFG maintaining the component. 
A health check would probably look like these here: https://review.openstack.org/#/q/If5b77481330fa697f1bab16696acb70075052d4f

Comment 3 Martin Magr 2018-01-26 19:53:01 UTC
I started adding health checks for containers that are missing them, so I can write patches for octavia containers. The only problem is that I have no idea how to correctly check each service if it is alive. So please answer me following questions:

1. Octavia API
 - on which port the api server listens, is there a special url to get health status
2. Octavia worker
 - does the service connect to other service or listen on any port?
 - is there any way to get health status from the service?
3. Octavia health manager
 - does the service connect to other service/listen on any port?
 - is there any way to get health status from the service?
4. Octavia HouseKeeper manager
 - does the service connect to other service/listen on any port?
 - is there any way to get health status from the service?

Comment 4 Martin Magr 2018-02-01 14:51:51 UTC
Moving this BZ under DFG. This does not mean I'm not willing to work on this task.

Comment 5 Martin Magr 2018-02-01 14:52:54 UTC
*under proper

Comment 7 Carlos Goncalves 2018-02-06 11:28:19 UTC
1. Octavia API 
  - listens on TCP 9876 (internal and public endpoints)

2. Octavia worker
  - connects to oslo messaging (AMQP internal:5672)
  - REST API calls to nova, neutron, glance

3. Octavia health manager
  -  listens on UDP 5555 (get IP from 'o-hm0' host iface)

4. Octavia HouseKeeper manager
  - connects to DB server (MySQL)


None of the Octavia services provide a special URL to get health status.

Comment 8 Carlos Goncalves 2018-02-06 14:26:05 UTC
As per comment 4, assigning to Martin.

Comment 21 Carlos Goncalves 2018-07-06 12:13:49 UTC
I'm flipping status to POST as I believe all required patches have been merged and backported upstream to stable/queens.

Martin, please ACK/NACK.

Comment 22 Martin Magr 2018-07-09 11:23:37 UTC
NACK. We still need https://review.openstack.org/#/c/555252/ to be backported to stable/queens. This went of my radar. Sorry for the delay.

Comment 23 Bernard Cafarelli 2018-07-20 14:41:21 UTC
*** Bug 1603240 has been marked as a duplicate of this bug. ***

Comment 34 Alexander Stafeyev 2018-08-07 07:01:45 UTC
Hi, 
What could be proper verification steps pls? 

Thanks

Comment 35 Martin Magr 2018-08-07 12:50:30 UTC
Octavia containers report healthy or unhealthy status (depending on actual health of those containers) after deployment in output of command 'docker ps --all | grep octavia'.

Comment 36 Alexander Stafeyev 2018-08-08 05:46:15 UTC
[root@overcloud-controller-1 ~]# docker ps | grep octa
1b13b1974797        registry.access.redhat.com/rhosp13/openstack-octavia-health-manager:latest   "kolla_start"            20 hours ago        Up 20 hours (unhealthy)                       octavia_health_manager
2c6502229b83        registry.access.redhat.com/rhosp13/openstack-octavia-api:latest              "kolla_start"            20 hours ago        Up 20 hours (unhealthy)                       octavia_api
5a08540e9372        registry.access.redhat.com/rhosp13/openstack-octavia-housekeeping:latest     "kolla_start"            20 hours ago        Up 20 hours (unhealthy)                       octavia_housekeeping
d47a8ded4b82        registry.access.redhat.com/rhosp13/openstack-octavia-worker:latest           "kolla_start"            20 hours ago        Up 20 hours (healthy)                         octavia_worker
[root@overcloud-controller-1 ~]# docker exec octavia_health_manager /openstack/healthcheck
There is no octavia-health- process with opened RabbitMQ ports (5671,5672) running in the container
[root@overcloud-controller-1 ~]# docker exec octavia_api /openstack/healthcheck
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/openstack/healthcheck\": stat /openstack/healthcheck: no such file or directory"

[root@overcloud-controller-1 ~]# docker exec octavia_housekeeping /openstack/healthcheck
There is no octavia-houseke process with opened RabbitMQ ports (5671,5672) running in the container
[root@overcloud-controller-1 ~]# docker exec octavia_worker /openstack/healthcheck
172.17.1.17:5672 - users:(("octavia-worker:",pid=23,fd=8))


(undercloud) [stack@undercloud-0 ~]$ rpm -qa | grep openstack | grep trip | grep temp
openstack-tripleo-heat-templates-8.0.4-10.el7ost.noarch

Comment 39 Joanne O'Flynn 2018-08-15 07:39:19 UTC
This bug is marked for inclusion in the errata but does not currently contain draft documentation text. To ensure the timely release of this advisory please provide draft documentation text for this bug as soon as possible.

If you do not think this bug requires errata documentation, set the requires_doc_text flag to "-".


To add draft documentation text:

* Select the documentation type from the "Doc Type" drop down field.

* A template will be provided in the "Doc Text" field based on the "Doc Type" value selected. Enter draft text in the "Doc Text" field.

Comment 40 Nir Magnezi 2018-08-16 12:37:37 UTC
*** Bug 1613662 has been marked as a duplicate of this bug. ***

Comment 41 Carlos Goncalves 2018-08-16 12:42:12 UTC
On my OSP13 environment, all 4 Octavia containers are healthy.


[root@controller-0 heat-admin]# docker ps | grep octavia
95db292d2efc        192.168.24.1:8787/rhosp13/openstack-octavia-health-manager:2018-08-08.2   "kolla_start"            47 hours ago        Up 47 hours (healthy)                         octavia_health_manager
838ede76313c        192.168.24.1:8787/rhosp13/openstack-octavia-api:2018-08-08.2              "kolla_start"            47 hours ago        Up 47 hours (healthy)                         octavia_api
66396fecbe7e        192.168.24.1:8787/rhosp13/openstack-octavia-housekeeping:2018-08-08.2     "kolla_start"            47 hours ago        Up 47 hours (healthy)                         octavia_housekeeping
37e8ee2bf056        192.168.24.1:8787/rhosp13/openstack-octavia-worker:2018-08-08.2           "kolla_start"            47 hours ago        Up 47 hours (healthy)                         octavia_worker

Comment 42 Bernard Cafarelli 2018-08-22 17:57:49 UTC
In comment #36 the healthcheck output message (mentioning RabbitMq) is the old one (before the fix in  openstack-tripleo-common-8.6.3-3.el7ost):
https://github.com/openstack/tripleo-common/commit/dc342858a74c5c89df22343b5931f821bd61e7b9#diff-437c9b0a7f17cb0002622a959732d7f6
So the container did not have the needed version apparently.

That plus comment #41 showing healthy containers, I am moving this bug back to verification step

Comment 43 Alexander Stafeyev 2018-08-23 10:30:56 UTC
[heat-admin@controller-1 ~]$ sudo -i 
[root@controller-1 ~]# docker ps | grep octav
af135e52359d        192.168.24.1:8787/rhosp13/openstack-octavia-health-manager:2018-08-22.2      "kolla_start"            33 minutes ago      Up 26 minutes (healthy)                         octavia_health_manager
a484685b53b9        192.168.24.1:8787/rhosp13/openstack-octavia-api:2018-08-22.2                 "kolla_start"            33 minutes ago      Up 25 minutes (healthy)                         octavia_api
cb0f7ee7f8c5        192.168.24.1:8787/rhosp13/openstack-octavia-housekeeping:2018-08-22.2        "kolla_start"            33 minutes ago      Up 25 minutes (healthy)                         octavia_housekeeping
7c22d3c633a1        192.168.24.1:8787/rhosp13/openstack-octavia-worker:2018-08-22.2              "kolla_start"            33 minutes ago      Up 25 minutes (healthy)                         octavia_worker
[root@controller-1 ~]# cat /etc/yum.repos.d/latest-installed 
13   -p 2018-08-22.2
[root@controller-1 ~]#

Comment 45 errata-xmlrpc 2018-08-29 16:34:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2574


Note You need to log in before you can comment on or make changes to this bug.