1755683 – octavia_api container stuck in unhealthy state

Bug 1755683 - octavia_api container stuck in unhealthy state

Summary: octavia_api container stuck in unhealthy state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-common
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z11
Target Release:	13.0 (Queens)
Assignee:	Carlos Goncalves
QA Contact:	Michael Johnson
Docs Contact:
URL:
Whiteboard:
Depends On:	1800847
Blocks:
TreeView+	depends on / blocked

Reported:	2019-09-26 02:39 UTC by chrisbro@redhat.com
Modified:	2021-02-08 11:48 UTC (History)
CC List:	14 users (show)
Fixed In Version:	openstack-tripleo-common-8.7.1-3.el7ost
Doc Type:	Bug Fix
Doc Text:	Before this update, an error caused the `octavia-api` container to always report an unhealthy state. This update resolves the issue.
Clone Of:
Environment:
Last Closed:	2020-03-10 11:22:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	686190	0	'None'	MERGED	Fixup octavia-api healthcheck	2021-02-08 11:41:39 UTC
Red Hat Product Errata	RHBA-2020:0760	0	None	None	None	2020-03-10 11:22:53 UTC

Description chrisbro@redhat.com 2019-09-26 02:39:36 UTC

Description of problem:
octavia_api container stuck in unhealthy state

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Deployment of openstack RHOSP13 leads to octavia_api container being in an unhealthy state.



Additional info:
Information from customer side with hostnames changed.

 octavia_api still reports as unhealthy.

Still see this error in /etc/httpd/logs/octavia_wsgi_error_ssl.log
[ssl:warn] [pid 1] AH01909: RSA certificate configured for os13.lab.redhat.com:443 does NOT include an ID which matches the server name

Whilst the vhost server name is  os13.lab.redhat.com, the certificate subject name is os13.internalapi.lab.redhat.com

[root@os13-service14 ~]# docker exec -it octavia_api egrep -R ServerName /etc/httpd/conf.d/                                                                           
/etc/httpd/conf.d/10-octavia_wsgi.conf:  ServerName os13.lab.redhat.com

[root@os13-service14 ~]# docker exec -it octavia_api openssl x509 -in  /etc/pki/tls/certs/httpd/httpd-internal_api.crt -text -noout | grep -A 2 "X509v3 Subject Alt"                                                                           
            X509v3 Subject Alternative Name: 
                DNS:os13.internalapi.lab.redhat.com, othername:<unsupported>, othername:<unsupported>

Which is probably why the health check is failing.

If I compare to nova_api for example, the vhost server name matches the certificate;

[root@os13-service14 ~]# docker exec -it nova_api egrep -R ServerName /etc/httpd/conf.d/
/etc/httpd/conf.d/10-nova_api_wsgi.conf:  ServerName os13.internalapi.lab.redhat.com

[root@os13-service14 ~]# docker exec -it nova_api openssl x509 -in  /etc/pki/tls/certs/httpd/httpd-internal_api.crt -text -noout | grep -A 2 "X509v3 Subject Alt"
            X509v3 Subject Alternative Name: 
                DNS:os13.internalapi.lab.redhat.com, othername:<unsupported>, othername:<unsupported>

So, it seems the octavia_api vhost server name is being set incorrectly. Or rather, is being set according to the following overrides on overcloud/octavia_env.yaml which is in response to BZ#1693529 - https://bugzilla.redhat.com/show_bug.cgi?id=1693529
 
 EndpointMap:
    # Address a bug where admin/internal endpoints default to IP address
    OctaviaAdmin: {protocol: 'https', port: '9876', host: 'CLOUDNAME'}
    OctaviaInternal: {protocol: 'https', port: '9876', host: 'CLOUDNAME'}
    OctaviaPublic: {protocol: 'https', port: '13876', host: 'CLOUDNAME'}

Comment 13 Michael Johnson 2020-02-18 18:06:26 UTC

I have verified that the API health checks are now successful on a TLS enabled deployment.

(undercloud) [stack@undercloud-0 ~]$ ansible -i /usr/bin/tripleo-ansible-inventory -b -m shell -a "docker ps | grep octavia" Controller

controller-2 | SUCCESS | rc=0 >>
11dbafe2991c        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-health-manager:20200213.1      "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_health_manager
00a448765f11        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-api:20200213.1                 "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_api
4ed6ebc51137        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-housekeeping:20200213.1        "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_housekeeping
209f18c90b8b        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-worker:20200213.1              "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_worker

controller-0 | SUCCESS | rc=0 >>
23bcc3e5ebf1        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-health-manager:20200213.1      "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_health_manager
917c63dda58a        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-api:20200213.1                 "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_api
b03de348778d        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-housekeeping:20200213.1        "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_housekeeping
08fdab77ec65        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-worker:20200213.1              "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_worker

controller-1 | SUCCESS | rc=0 >>
59ccac8b8f14        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-health-manager:20200213.1      "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_health_manager
6cbce7532a08        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-api:20200213.1                 "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_api
dad32c238bd5        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-housekeeping:20200213.1        "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_housekeeping
6cd76bbfad7e        192.168.24.1:8787/rh-osbs/rhosp13-openstack-octavia-worker:20200213.1              "dumb-init --singl..."   3 hours ago         Up 3 hours (healthy)                         octavia_worker

I have also checked the octavia_wsgi_access_ssl.log log on the controllers and see successful HTTPS based container health checks:

172.17.1.13 - - [18/Feb/2020:14:00:23 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.28 - - [18/Feb/2020:14:00:23 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.14 - - [18/Feb/2020:14:00:23 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.28 - - [18/Feb/2020:14:00:30 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.14 - - [18/Feb/2020:14:00:30 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.13 - - [18/Feb/2020:14:00:29 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.28 - - [18/Feb/2020:14:00:32 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.14 - - [18/Feb/2020:14:00:32 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.13 - - [18/Feb/2020:14:00:33 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.28 - - [18/Feb/2020:14:00:34 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.14 - - [18/Feb/2020:14:00:34 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.13 - - [18/Feb/2020:14:00:35 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.28 - - [18/Feb/2020:14:00:36 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"
172.17.1.14 - - [18/Feb/2020:14:00:36 +0000] "OPTIONS / HTTP/1.0" 200 - "-" "-"

Comment 15 errata-xmlrpc 2020-03-10 11:22:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760

Note You need to log in before you can comment on or make changes to this bug.