Bug 1701195
Summary: | [OSP15] failed healthcheck for cinder_scheduler container | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Artem Hrechanychenko <ahrechan> |
Component: | openstack-tripleo-heat-templates | Assignee: | Alan Bishop <abishop> |
Status: | CLOSED ERRATA | QA Contact: | Tzach Shefi <tshefi> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 15.0 (Stein) | CC: | abishop, dprince, mburns, pgrist, tshefi |
Target Milestone: | beta | Keywords: | Triaged |
Target Release: | 15.0 (Stein) | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | openstack-tripleo-heat-templates-10.5.1-0.20190514103211.038d887.el8ost | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-09-21 11:21:34 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Artem Hrechanychenko
2019-04-18 10:48:56 UTC
It seems a bunch of services' health checks were effected when [1] merged. See my comment [2]. [1] https://review.opendev.org/565086 [2] https://bugs.launchpad.net/tripleo/+bug/1825342/comments/1 No doc text required. This was a regression introduced and fixed in stein prior to the release of OSP-15. Hey Alan, Not sure about results, running them by you to be sure. would the below be sufficient to verify? Tested on: openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch [heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service ● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled) Active: inactive (dead) since Wed 2019-08-07 09:09:49 UTC; 30s ago Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS) -> notice success rather than FAILURE on original comment which is good. Main PID: 399812 (code=exited, status=0/SUCCESS) Aug 07 09:09:49 controller-0 systemd[1]: Starting cinder_scheduler healthcheck... Aug 07 09:09:49 controller-0 systemd[1]: Started cinder_scheduler healthcheck. However return code of 3 not 0. sudo podman inspect cinder_scheduler |grep healthcheck "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=22b28bf6014b355e8c0d83c112d965a3\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck 5672\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190801.2\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}", Here indeed I do get a return code of 0. [root@controller-0 ~]# echo $? 0 Guess looks good to verify, correct? If not, why not and or what else do I check this out? Thanks Tzach, The original problem has definitely been fixed. Note this line from your text above: Process: 399812 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck 5672 (code=exited, status=0/SUCCESS) -> notice success rather than The original problem was the "5672" portion of the command (the rabbitmq port number) was missing, so the health check for this (and several other!) containers constantly failed because of a syntax error (missing port number). Verified on: openstack-tripleo-heat-templates-10.6.1-0.20190801110459.7fbedf0.el8ost.noarch See above comments 9 and 8 for testing. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2811 |