1701195 – [OSP15] failed healthcheck for cinder_scheduler container

Bug 1701195 - [OSP15] failed healthcheck for cinder_scheduler container

Summary: [OSP15] failed healthcheck for cinder_scheduler container

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	15.0 (Stein)
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	beta
Target Release:	15.0 (Stein)
Assignee:	Alan Bishop
QA Contact:	Tzach Shefi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-04-18 10:48 UTC by Artem Hrechanychenko
Modified:	2019-09-26 10:49 UTC (History)
CC List:	5 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-10.5.1-0.20190514103211.038d887.el8ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-21 11:21:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1825342	None	None	None	2019-04-18 10:48:55 UTC
OpenStack gerrit	658108	'None'	MERGED	Use RpcPort for container healthchecks	2021-01-30 15:31:13 UTC
OpenStack gerrit	658360	'None'	MERGED	Use RpcPort for container healthchecks	2021-01-30 15:31:57 UTC
Red Hat Product Errata	RHEA-2019:2811	None	None	None	2019-09-21 11:21:48 UTC

Description Artem Hrechanychenko 2019-04-18 10:48:56 UTC

Description of problem:
[heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service
● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-04-18 10:41:00 UTC; 48s ago
  Process: 599072 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck null (code=exited, status=1/FAILURE)
 Main PID: 599072 (code=exited, status=1/FAILURE)

Apr 18 10:41:00 controller-0 systemd[1]: Starting cinder_scheduler healthcheck...
Apr 18 10:41:00 controller-0 podman[599072]: There is no cinder-scheduler process with opened RabbitMQ ports (null) running in the container
Apr 18 10:41:00 controller-0 podman[599072]: exit status 1
Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Failed with result 'exit-code'.
Apr 18 10:41:00 controller-0 systemd[1]: Failed to start cinder_scheduler healthcheck.
[heat-admin@controller-0 ~]$ sudo podman inspect cinder_scheduler |grep healthcheck
                "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=37c5752bb7a8713cb7bf28d9c72c5e39\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck null\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190411.1\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}",



Version-Release number of selected component (if applicable):
OSP15 - RHOS_TRUNK-15.0-RHEL-8-20190412.n.0
python3-tripleoclient-heat-installer- 1.4.1-0.20190411190358.0ca816d.el8ost.noarch
python3-tripleo-common-10.6.2-0.20190412150355.0ec6518.el8ost.noarch
python3-tripleoclient-11.4.1-0.20190411190358.0ca816d.el8ost.noarch
openstack-tripleo-heat-templates-10.4.1-0.20190412000410.b934fdd.el8ost.noarch
openstack-cinder-scheduler:20190411.1

How reproducible:
always

Steps to Reproduce:
1.Deploy Undercloud
2.check healtcheck status for cinder_scheduler
3.

Actual results:
There is no cinder-scheduler process with opened RabbitMQ ports (null) running in the container

Expected results:
rc == 0 and exited  status Passed

Additional info:

Comment 1 Alan Bishop 2019-04-24 15:40:45 UTC

It seems a bunch of services' health checks were effected when [1] merged. See my comment [2].

[1] https://review.opendev.org/565086
[2] https://bugs.launchpad.net/tripleo/+bug/1825342/comments/1

Comment 2 Alan Bishop 2019-05-15 18:11:43 UTC

No doc text required. This was a regression introduced and fixed in stein prior to the release of OSP-15.

Comment 8 Tzach Shefi 2019-08-08 08:57:33 UTC

Hey Alan, 

Not sure about results, running them by you to be sure.  
would the below be sufficient to verify? 

Tested on:
openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch

[heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service
● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2019-08-07 09:09:49 UTC; 30s ago
  Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS)     -> notice success rather than FAILURE on original comment which is good. 
 Main PID: 399812 (code=exited, status=0/SUCCESS)

Aug 07 09:09:49 controller-0 systemd[1]: Starting cinder_scheduler healthcheck...
Aug 07 09:09:49 controller-0 systemd[1]: Started cinder_scheduler healthcheck.


However return code of 3 not 0. 


sudo podman inspect cinder_scheduler |grep healthcheck
                "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=22b28bf6014b355e8c0d83c112d965a3\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck 5672\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190801.2\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}",

Here indeed I do get a return code of 0. 
[root@controller-0 ~]# echo $?
0


Guess looks good to verify, correct? 
If not, why not and or what else do I check this out?
Thanks

Comment 9 Alan Bishop 2019-08-08 13:50:26 UTC

Tzach,

The original problem has definitely been fixed. Note this line from your text above:

  Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS)     -> notice success rather than 

The original problem was the "5672" portion of the command (the rabbitmq port number) was missing, so the health check for this (and several other!) containers constantly failed because of a syntax error (missing port number).

Comment 10 Tzach Shefi 2019-08-10 20:53:19 UTC

Verified on:
openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch
See above comments 9 and 8 for testing.

Comment 12 errata-xmlrpc 2019-09-21 11:21:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811

Note You need to log in before you can comment on or make changes to this bug.