Bug 1693196
| Summary: | [OSP16][Undercloud][healthcheck] failed healthcheck for ceilometer_agent_compute | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Artem Hrechanychenko <ahrechan> |
| Component: | openstack-tripleo-common | Assignee: | Martin Magr <mmagr> |
| Status: | CLOSED DUPLICATE | QA Contact: | Leonid Natapov <lnatapov> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.1 (Train) | CC: | aschultz, cschimid, csibbitt, dhill, jbadiapa, jveiraca, lmadsen, mburns, mmagr, mrunge, slinaber |
| Target Milestone: | z8 | Keywords: | Triaged, ZStream |
| Target Release: | 16.1 (Train on RHEL 8.2) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-common-11.4.1-1.20210407183435.el8ost | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-08-24 11:23:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Artem Hrechanychenko
2019-03-27 10:28:00 UTC
Hello! pretty sure this one is linked to https://bugzilla.redhat.com/show_bug.cgi?id=1689671 The following patch will probably solve this issue: https://review.openstack.org/648027 I'm taking this BZ. Cheers, C. Hi, I've installed OSP15 core_puddle=RHOS_TRUNK-15.0-RHEL-8-20190604.n.2 undercloud:1,controller:1,compute:1 (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.12 Warning: Permanently added '192.168.24.12' (ECDSA) to the list of known hosts. Last login: Mon Jun 10 11:34:33 2019 from 192.168.24.254 [heat-admin@compute-0 ~]$ [heat-admin@compute-0 ~]$ sudo systemctl status tripleo_ceilometer_agent_compute_healthcheck.service ● tripleo_ceilometer_agent_compute_healthcheck.service - ceilometer_agent_compute healthcheck Loaded: loaded (/etc/systemd/system/tripleo_ceilometer_agent_compute_healthcheck.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Tue 2019-06-11 08:56:21 UTC; 50s ago Process: 252186 ExecStart=/usr/bin/podman exec ceilometer_agent_compute /openstack/healthcheck 5672 (code=exited, status=1/FAILURE) Main PID: 252186 (code=exited, status=1/FAILURE) Jun 11 08:56:21 compute-0 systemd[1]: Starting ceilometer_agent_compute healthcheck... Jun 11 08:56:21 compute-0 podman[252186]: There is no ceilometer-polling process with opened RabbitMQ ports (5672) running in the container Jun 11 08:56:21 compute-0 podman[252186]: exit status 1 Jun 11 08:56:21 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Main process exited, code=exited, status=1/FAILURE Jun 11 08:56:21 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Failed with result 'exit-code'. Jun 11 08:56:21 compute-0 systemd[1]: Failed to start ceilometer_agent_compute healthcheck. [heat-admin@compute-0 ~]$ sudo podman logs ceilometer_agent_compute #The logs are empty [heat-admin@compute-0 ~]$ sudo podman ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2a8e9ba993d3 192.168.24.1:8787/rhosp15/openstack-nova-compute:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago nova_compute 7533c83f18b7 192.168.24.1:8787/rhosp15/openstack-neutron-metadata-agent-ovn:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago ovn_metadata_agent 8abed1793fad 192.168.24.1:8787/rhosp15/openstack-ovn-controller:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago ovn_controller 79f3ac5bb7b0 192.168.24.1:8787/rhosp15/openstack-nova-compute:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago nova_migration_target cdbff47e1aa0 192.168.24.1:8787/rhosp15/openstack-cron:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago logrotate_crond 47c9bef560e4 192.168.24.1:8787/rhosp15/openstack-ceilometer-compute:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago ceilometer_agent_compute be547c98832f 192.168.24.1:8787/rhosp15/openstack-iscsid:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago iscsid fb30bb4e95ce 192.168.24.1:8787/rhosp15/openstack-nova-libvirt:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago nova_libvirt 739e51d60b33 192.168.24.1:8787/rhosp15/openstack-nova-libvirt:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago nova_virtlogd It seems that the problem hasn't been resolved. Nataf Confirm that I got the same issue OK, will put it back on the bench and work out a solution then :). Setting right DFG(s) - they should take care of the ceilometer healthchecks. According to our records, this should be resolved by openstack-tripleo-common-10.8.3-0.20200113210450.0e559fc.el8ost. This build is available now. This is still issue on OSP16. From the output below we can see that the HC script has been fixed, but the default correct value is still being overriden. That happens probably during deploy. Further investigation is required.
[root@compute-0 ~]# systemctl status tripleo_ceilometer_agent_compute_healthcheck.service
● tripleo_ceilometer_agent_compute_healthcheck.service - ceilometer_agent_compute healthcheck
Loaded: loaded (/etc/systemd/system/tripleo_ceilometer_agent_compute_healthcheck.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2020-09-24 10:13:12 UTC; 53s ago
Process: 309837 ExecStart=/usr/bin/podman exec --user root ceilometer_agent_compute /openstack/healthcheck 5672 (code=exited, status=1/FAILURE)
Main PID: 309837 (code=exited, status=1/FAILURE)
Sep 24 10:13:11 compute-0 systemd[1]: Starting ceilometer_agent_compute healthcheck...
Sep 24 10:13:12 compute-0 podman[309837]: 2020-09-24 10:13:12.092599591 +0000 UTC m=+0.322381794 container exec f04e88e773d3d4941877dbb20acbfd0ea6971b4f3e68bfde157bc72487271186 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb>
Sep 24 10:13:12 compute-0 healthcheck_ceilometer_agent_compute[309837]: There is no ceilometer-polling process with opened Redis ports (5672) running in the container
Sep 24 10:13:12 compute-0 healthcheck_ceilometer_agent_compute[309837]: Error: non zero exit code: 1: OCI runtime error
Sep 24 10:13:12 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Sep 24 10:13:12 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Failed with result 'exit-code'.
Sep 24 10:13:12 compute-0 systemd[1]: Failed to start ceilometer_agent_compute healthcheck.
[root@compute-0 ~]#
[root@compute-0 ~]# podman exec -it ceilometer_agent_compute bash
()[root@compute-0 /]# cat /openstack/healthcheck
#!/bin/bash
. ${HEALTHCHECK_SCRIPTS:-/usr/share/openstack-tripleo-common/healthcheck}/common.sh
process='ceilometer-polling'
args="${@:-6379}"
if healthcheck_port $process $args; then
exit 0
else
ports=${args// /,}
echo "There is no $process process with opened Redis ports ($ports) running in the container"
exit 1
fi
()[root@compute-0 /]#
Targeting to OSP16 since OSP15 is EOL.
The original issue from description was fixed and the new issue is being handled in bug #1910939. Closing this as duplicate. *** This bug has been marked as a duplicate of bug 1910939 *** |