Bug 1701195 - [OSP15] failed healthcheck for cinder_scheduler container
Summary: [OSP15] failed healthcheck for cinder_scheduler container
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 15.0 (Stein)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: beta
: 15.0 (Stein)
Assignee: Alan Bishop
QA Contact: Tzach Shefi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-18 10:48 UTC by Artem Hrechanychenko
Modified: 2019-09-26 10:49 UTC (History)
5 users (show)

Fixed In Version: openstack-tripleo-heat-templates-10.5.1-0.20190514103211.038d887.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-21 11:21:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1825342 0 None None None 2019-04-18 10:48:55 UTC
OpenStack gerrit 658108 0 'None' MERGED Use RpcPort for container healthchecks 2021-01-30 15:31:13 UTC
OpenStack gerrit 658360 0 'None' MERGED Use RpcPort for container healthchecks 2021-01-30 15:31:57 UTC
Red Hat Product Errata RHEA-2019:2811 0 None None None 2019-09-21 11:21:48 UTC

Description Artem Hrechanychenko 2019-04-18 10:48:56 UTC
Description of problem:
[heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service
● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-04-18 10:41:00 UTC; 48s ago
  Process: 599072 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck null (code=exited, status=1/FAILURE)
 Main PID: 599072 (code=exited, status=1/FAILURE)

Apr 18 10:41:00 controller-0 systemd[1]: Starting cinder_scheduler healthcheck...
Apr 18 10:41:00 controller-0 podman[599072]: There is no cinder-scheduler process with opened RabbitMQ ports (null) running in the container
Apr 18 10:41:00 controller-0 podman[599072]: exit status 1
Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Failed with result 'exit-code'.
Apr 18 10:41:00 controller-0 systemd[1]: Failed to start cinder_scheduler healthcheck.
[heat-admin@controller-0 ~]$ sudo podman inspect cinder_scheduler |grep healthcheck
                "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=37c5752bb7a8713cb7bf28d9c72c5e39\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck null\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190411.1\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}",



Version-Release number of selected component (if applicable):
OSP15 - RHOS_TRUNK-15.0-RHEL-8-20190412.n.0
python3-tripleoclient-heat-installer- 1.4.1-0.20190411190358.0ca816d.el8ost.noarch
python3-tripleo-common-10.6.2-0.20190412150355.0ec6518.el8ost.noarch
python3-tripleoclient-11.4.1-0.20190411190358.0ca816d.el8ost.noarch
openstack-tripleo-heat-templates-10.4.1-0.20190412000410.b934fdd.el8ost.noarch
openstack-cinder-scheduler:20190411.1

How reproducible:
always

Steps to Reproduce:
1.Deploy Undercloud
2.check healtcheck status for cinder_scheduler
3.

Actual results:
There is no cinder-scheduler process with opened RabbitMQ ports (null) running in the container

Expected results:
rc == 0 and exited  status Passed

Additional info:

Comment 1 Alan Bishop 2019-04-24 15:40:45 UTC
It seems a bunch of services' health checks were effected when [1] merged. See my comment [2].

[1] https://review.opendev.org/565086
[2] https://bugs.launchpad.net/tripleo/+bug/1825342/comments/1

Comment 2 Alan Bishop 2019-05-15 18:11:43 UTC
No doc text required. This was a regression introduced and fixed in stein prior to the release of OSP-15.

Comment 8 Tzach Shefi 2019-08-08 08:57:33 UTC
Hey Alan, 

Not sure about results, running them by you to be sure.  
would the below be sufficient to verify? 

Tested on:
openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch

[heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service
● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2019-08-07 09:09:49 UTC; 30s ago
  Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS)     -> notice success rather than FAILURE on original comment which is good. 
 Main PID: 399812 (code=exited, status=0/SUCCESS)

Aug 07 09:09:49 controller-0 systemd[1]: Starting cinder_scheduler healthcheck...
Aug 07 09:09:49 controller-0 systemd[1]: Started cinder_scheduler healthcheck.


However return code of 3 not 0. 


sudo podman inspect cinder_scheduler |grep healthcheck
                "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=22b28bf6014b355e8c0d83c112d965a3\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck 5672\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190801.2\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}",

Here indeed I do get a return code of 0. 
[root@controller-0 ~]# echo $?
0


Guess looks good to verify, correct? 
If not, why not and or what else do I check this out?
Thanks

Comment 9 Alan Bishop 2019-08-08 13:50:26 UTC
Tzach,

The original problem has definitely been fixed. Note this line from your text above:

  Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS)     -> notice success rather than 

The original problem was the "5672" portion of the command (the rabbitmq port number) was missing, so the health check for this (and several other!) containers constantly failed because of a syntax error (missing port number).

Comment 10 Tzach Shefi 2019-08-10 20:53:19 UTC
Verified on:
openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch
See above comments 9 and 8 for testing.

Comment 12 errata-xmlrpc 2019-09-21 11:21:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811


Note You need to log in before you can comment on or make changes to this bug.