Bug 1701195

Summary:	[OSP15] failed healthcheck for cinder_scheduler container
Product:	Red Hat OpenStack	Reporter:	Artem Hrechanychenko <ahrechan>
Component:	openstack-tripleo-heat-templates	Assignee:	Alan Bishop <abishop>
Status:	CLOSED ERRATA	QA Contact:	Tzach Shefi <tshefi>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	15.0 (Stein)	CC:	abishop, dprince, mburns, pgrist, tshefi
Target Milestone:	beta	Keywords:	Triaged
Target Release:	15.0 (Stein)
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	openstack-tripleo-heat-templates-10.5.1-0.20190514103211.038d887.el8ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-09-21 11:21:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Artem Hrechanychenko 2019-04-18 10:48:56 UTC

Description of problem:
[heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service
● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-04-18 10:41:00 UTC; 48s ago
  Process: 599072 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck null (code=exited, status=1/FAILURE)
 Main PID: 599072 (code=exited, status=1/FAILURE)

Apr 18 10:41:00 controller-0 systemd[1]: Starting cinder_scheduler healthcheck...
Apr 18 10:41:00 controller-0 podman[599072]: There is no cinder-scheduler process with opened RabbitMQ ports (null) running in the container
Apr 18 10:41:00 controller-0 podman[599072]: exit status 1
Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Failed with result 'exit-code'.
Apr 18 10:41:00 controller-0 systemd[1]: Failed to start cinder_scheduler healthcheck.
[heat-admin@controller-0 ~]$ sudo podman inspect cinder_scheduler |grep healthcheck
                "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=37c5752bb7a8713cb7bf28d9c72c5e39\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck null\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190411.1\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}",



Version-Release number of selected component (if applicable):
OSP15 - RHOS_TRUNK-15.0-RHEL-8-20190412.n.0
python3-tripleoclient-heat-installer- 1.4.1-0.20190411190358.0ca816d.el8ost.noarch
python3-tripleo-common-10.6.2-0.20190412150355.0ec6518.el8ost.noarch
python3-tripleoclient-11.4.1-0.20190411190358.0ca816d.el8ost.noarch
openstack-tripleo-heat-templates-10.4.1-0.20190412000410.b934fdd.el8ost.noarch
openstack-cinder-scheduler:20190411.1

How reproducible:
always

Steps to Reproduce:
1.Deploy Undercloud
2.check healtcheck status for cinder_scheduler
3.

Actual results:
There is no cinder-scheduler process with opened RabbitMQ ports (null) running in the container

Expected results:
rc == 0 and exited  status Passed

Additional info:

Comment 1 Alan Bishop 2019-04-24 15:40:45 UTC

It seems a bunch of services' health checks were effected when [1] merged. See my comment [2].

[1] https://review.opendev.org/565086
[2] https://bugs.launchpad.net/tripleo/+bug/1825342/comments/1

Comment 2 Alan Bishop 2019-05-15 18:11:43 UTC

No doc text required. This was a regression introduced and fixed in stein prior to the release of OSP-15.

Comment 8 Tzach Shefi 2019-08-08 08:57:33 UTC

Hey Alan, 

Not sure about results, running them by you to be sure.  
would the below be sufficient to verify? 

Tested on:
openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch

[heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service
● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2019-08-07 09:09:49 UTC; 30s ago
  Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS)     -> notice success rather than FAILURE on original comment which is good. 
 Main PID: 399812 (code=exited, status=0/SUCCESS)

Aug 07 09:09:49 controller-0 systemd[1]: Starting cinder_scheduler healthcheck...
Aug 07 09:09:49 controller-0 systemd[1]: Started cinder_scheduler healthcheck.


However return code of 3 not 0. 


sudo podman inspect cinder_scheduler |grep healthcheck
                "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=22b28bf6014b355e8c0d83c112d965a3\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck 5672\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190801.2\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}",

Here indeed I do get a return code of 0. 
[root@controller-0 ~]# echo $?
0


Guess looks good to verify, correct? 
If not, why not and or what else do I check this out?
Thanks

Comment 9 Alan Bishop 2019-08-08 13:50:26 UTC

Tzach,

The original problem has definitely been fixed. Note this line from your text above:

  Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS)     -> notice success rather than 

The original problem was the "5672" portion of the command (the rabbitmq port number) was missing, so the health check for this (and several other!) containers constantly failed because of a syntax error (missing port number).

Comment 10 Tzach Shefi 2019-08-10 20:53:19 UTC

Verified on:
openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch
See above comments 9 and 8 for testing.

Comment 12 errata-xmlrpc 2019-09-21 11:21:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811