Description of problem: [heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service ● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2019-04-18 10:41:00 UTC; 48s ago Process: 599072 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck null (code=exited, status=1/FAILURE) Main PID: 599072 (code=exited, status=1/FAILURE) Apr 18 10:41:00 controller-0 systemd[1]: Starting cinder_scheduler healthcheck... Apr 18 10:41:00 controller-0 podman[599072]: There is no cinder-scheduler process with opened RabbitMQ ports (null) running in the container Apr 18 10:41:00 controller-0 podman[599072]: exit status 1 Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Main process exited, code=exited, status=1/FAILURE Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Failed with result 'exit-code'. Apr 18 10:41:00 controller-0 systemd[1]: Failed to start cinder_scheduler healthcheck. [heat-admin@controller-0 ~]$ sudo podman inspect cinder_scheduler |grep healthcheck "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=37c5752bb7a8713cb7bf28d9c72c5e39\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck null\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190411.1\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}", Version-Release number of selected component (if applicable): OSP15 - RHOS_TRUNK-15.0-RHEL-8-20190412.n.0 python3-tripleoclient-heat-installer- 1.4.1-0.20190411190358.0ca816d.el8ost.noarch python3-tripleo-common-10.6.2-0.20190412150355.0ec6518.el8ost.noarch python3-tripleoclient-11.4.1-0.20190411190358.0ca816d.el8ost.noarch openstack-tripleo-heat-templates-10.4.1-0.20190412000410.b934fdd.el8ost.noarch openstack-cinder-scheduler:20190411.1 How reproducible: always Steps to Reproduce: 1.Deploy Undercloud 2.check healtcheck status for cinder_scheduler 3. Actual results: There is no cinder-scheduler process with opened RabbitMQ ports (null) running in the container Expected results: rc == 0 and exited status Passed Additional info:
It seems a bunch of services' health checks were effected when [1] merged. See my comment [2]. [1] https://review.opendev.org/565086 [2] https://bugs.launchpad.net/tripleo/+bug/1825342/comments/1
No doc text required. This was a regression introduced and fixed in stein prior to the release of OSP-15.
Hey Alan, Not sure about results, running them by you to be sure. would the below be sufficient to verify? Tested on: openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch [heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service ● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled) Active: inactive (dead) since Wed 2019-08-07 09:09:49 UTC; 30s ago Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS) -> notice success rather than FAILURE on original comment which is good. Main PID: 399812 (code=exited, status=0/SUCCESS) Aug 07 09:09:49 controller-0 systemd[1]: Starting cinder_scheduler healthcheck... Aug 07 09:09:49 controller-0 systemd[1]: Started cinder_scheduler healthcheck. However return code of 3 not 0. sudo podman inspect cinder_scheduler |grep healthcheck "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=22b28bf6014b355e8c0d83c112d965a3\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck 5672\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190801.2\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}", Here indeed I do get a return code of 0. [root@controller-0 ~]# echo $? 0 Guess looks good to verify, correct? If not, why not and or what else do I check this out? Thanks
Tzach, The original problem has definitely been fixed. Note this line from your text above: Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS) -> notice success rather than The original problem was the "5672" portion of the command (the rabbitmq port number) was missing, so the health check for this (and several other!) containers constantly failed because of a syntax error (missing port number).
Verified on: openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch See above comments 9 and 8 for testing.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811