Description of problem: After Overcloud installation check health-check for container on Compute node [heat-admin@compute-0 ~]$ sudo systemctl status tripleo_ceilometer_agent_compute_healthcheck.service ● tripleo_ceilometer_agent_compute_healthcheck.service - ceilometer_agent_compute healthcheck Loaded: loaded (/etc/systemd/system/tripleo_ceilometer_agent_compute_healthcheck.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2019-03-27 10:21:44 UTC; 1min 13s ago Process: 136470 ExecStart=/usr/bin/podman exec ceilometer_agent_compute /openstack/healthcheck (code=exited, status=1/FAILURE) Main PID: 136470 (code=exited, status=1/FAILURE) Mar 27 10:21:44 compute-0 systemd[1]: Starting ceilometer_agent_compute healthcheck... Mar 27 10:21:44 compute-0 podman[136470]: There is no ceilometer-poll process with opened RabbitMQ ports (5671,5672) running in the container Mar 27 10:21:44 compute-0 podman[136470]: exit status 1 Mar 27 10:21:44 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Main process exited, code=exited, status=1/FAILURE Mar 27 10:21:44 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Failed with result 'exit-code'. Mar 27 10:21:44 compute-0 systemd[1]: Failed to start ceilometer_agent_compute healthcheck. Container runs f7624d8ba4f0 192.168.24.1:8787/rhosp15/openstack-ceilometer-compute:20190325.1 kolla_start 13 hours ago Up 13 hours ago ceilometer_agent_compute [heat-admin@compute-0 ~]$ sudo podman logs ceilometer_agent_compute + sudo -E kolla_set_configs INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Copying service configuration files INFO:__main__:Deleting /etc/ceilometer/ceilometer.conf INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/ceilometer/ceilometer.conf to /etc/ceilometer/ceilometer.conf INFO:__main__:Writing out command to execute ++ cat /run_command + CMD='/usr/bin/ceilometer-polling --polling-namespaces compute --logfile /var/log/ceilometer/compute.log' + ARGS= + [[ ! -n '' ]] + . kolla_extend_start ++ CEILOMETER_LOG_DIR=/var/log/kolla/ceilometer ++ [[ ! -d /var/log/kolla/ceilometer ]] ++ mkdir -p /var/log/kolla/ceilometer +++ stat -c %U:%G /var/log/kolla/ceilometer ++ [[ root:kolla != \c\e\i\l\o\m\e\t\e\r\:\k\o\l\l\a ]] ++ chown ceilometer:kolla /var/log/kolla/ceilometer +++ stat -c %a /var/log/kolla/ceilometer ++ [[ 2755 != \7\5\5 ]] ++ chmod 755 /var/log/kolla/ceilometer ++ . /usr/local/bin/kolla_ceilometer_extend_start + echo 'Running command: '\''/usr/bin/ceilometer-polling --polling-namespaces compute --logfile /var/log/ceilometer/compute.log'\''' Running command: '/usr/bin/ceilometer-polling --polling-namespaces compute --logfile /var/log/ceilometer/compute.log' + exec /usr/bin/ceilometer-polling --polling-namespaces compute --logfile /var/log/ceilometer/compute.log Version-Release number of selected component (if applicable): OSP15 compose RHOS_TRUNK-15.0-RHEL-8-20190326.n.0 container image openstack-ceilometer-compute:20190325.1 How reproducible: Always Steps to Reproduce: 1.Deploy undercloud OSP15 2.Deploy Overcloud OSP15 3. check healthcheck status for container on overcloud compute node Actual results: There is no ceilometer-poll process with opened RabbitMQ ports (5671,5672) running in the container Expected results: service exited with exit code ==0 Additional info:
Hello! pretty sure this one is linked to https://bugzilla.redhat.com/show_bug.cgi?id=1689671 The following patch will probably solve this issue: https://review.openstack.org/648027 I'm taking this BZ. Cheers, C.
Hi, I've installed OSP15 core_puddle=RHOS_TRUNK-15.0-RHEL-8-20190604.n.2 undercloud:1,controller:1,compute:1 (undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.12 Warning: Permanently added '192.168.24.12' (ECDSA) to the list of known hosts. Last login: Mon Jun 10 11:34:33 2019 from 192.168.24.254 [heat-admin@compute-0 ~]$ [heat-admin@compute-0 ~]$ sudo systemctl status tripleo_ceilometer_agent_compute_healthcheck.service ● tripleo_ceilometer_agent_compute_healthcheck.service - ceilometer_agent_compute healthcheck Loaded: loaded (/etc/systemd/system/tripleo_ceilometer_agent_compute_healthcheck.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Tue 2019-06-11 08:56:21 UTC; 50s ago Process: 252186 ExecStart=/usr/bin/podman exec ceilometer_agent_compute /openstack/healthcheck 5672 (code=exited, status=1/FAILURE) Main PID: 252186 (code=exited, status=1/FAILURE) Jun 11 08:56:21 compute-0 systemd[1]: Starting ceilometer_agent_compute healthcheck... Jun 11 08:56:21 compute-0 podman[252186]: There is no ceilometer-polling process with opened RabbitMQ ports (5672) running in the container Jun 11 08:56:21 compute-0 podman[252186]: exit status 1 Jun 11 08:56:21 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Main process exited, code=exited, status=1/FAILURE Jun 11 08:56:21 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Failed with result 'exit-code'. Jun 11 08:56:21 compute-0 systemd[1]: Failed to start ceilometer_agent_compute healthcheck. [heat-admin@compute-0 ~]$ sudo podman logs ceilometer_agent_compute #The logs are empty [heat-admin@compute-0 ~]$ sudo podman ls CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2a8e9ba993d3 192.168.24.1:8787/rhosp15/openstack-nova-compute:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago nova_compute 7533c83f18b7 192.168.24.1:8787/rhosp15/openstack-neutron-metadata-agent-ovn:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago ovn_metadata_agent 8abed1793fad 192.168.24.1:8787/rhosp15/openstack-ovn-controller:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago ovn_controller 79f3ac5bb7b0 192.168.24.1:8787/rhosp15/openstack-nova-compute:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago nova_migration_target cdbff47e1aa0 192.168.24.1:8787/rhosp15/openstack-cron:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago logrotate_crond 47c9bef560e4 192.168.24.1:8787/rhosp15/openstack-ceilometer-compute:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago ceilometer_agent_compute be547c98832f 192.168.24.1:8787/rhosp15/openstack-iscsid:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago iscsid fb30bb4e95ce 192.168.24.1:8787/rhosp15/openstack-nova-libvirt:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago nova_libvirt 739e51d60b33 192.168.24.1:8787/rhosp15/openstack-nova-libvirt:20190604.1 dumb-init --singl... 22 hours ago Up 22 hours ago nova_virtlogd It seems that the problem hasn't been resolved. Nataf
Confirm that I got the same issue
OK, will put it back on the bench and work out a solution then :).
Setting right DFG(s) - they should take care of the ceilometer healthchecks.
According to our records, this should be resolved by openstack-tripleo-common-10.8.3-0.20200113210450.0e559fc.el8ost. This build is available now.
This is still issue on OSP16. From the output below we can see that the HC script has been fixed, but the default correct value is still being overriden. That happens probably during deploy. Further investigation is required. [root@compute-0 ~]# systemctl status tripleo_ceilometer_agent_compute_healthcheck.service ● tripleo_ceilometer_agent_compute_healthcheck.service - ceilometer_agent_compute healthcheck Loaded: loaded (/etc/systemd/system/tripleo_ceilometer_agent_compute_healthcheck.service; disabled; vendor preset: disabled) Active: failed (Result: exit-code) since Thu 2020-09-24 10:13:12 UTC; 53s ago Process: 309837 ExecStart=/usr/bin/podman exec --user root ceilometer_agent_compute /openstack/healthcheck 5672 (code=exited, status=1/FAILURE) Main PID: 309837 (code=exited, status=1/FAILURE) Sep 24 10:13:11 compute-0 systemd[1]: Starting ceilometer_agent_compute healthcheck... Sep 24 10:13:12 compute-0 podman[309837]: 2020-09-24 10:13:12.092599591 +0000 UTC m=+0.322381794 container exec f04e88e773d3d4941877dbb20acbfd0ea6971b4f3e68bfde157bc72487271186 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb> Sep 24 10:13:12 compute-0 healthcheck_ceilometer_agent_compute[309837]: There is no ceilometer-polling process with opened Redis ports (5672) running in the container Sep 24 10:13:12 compute-0 healthcheck_ceilometer_agent_compute[309837]: Error: non zero exit code: 1: OCI runtime error Sep 24 10:13:12 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Main process exited, code=exited, status=1/FAILURE Sep 24 10:13:12 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Failed with result 'exit-code'. Sep 24 10:13:12 compute-0 systemd[1]: Failed to start ceilometer_agent_compute healthcheck. [root@compute-0 ~]# [root@compute-0 ~]# podman exec -it ceilometer_agent_compute bash ()[root@compute-0 /]# cat /openstack/healthcheck #!/bin/bash . ${HEALTHCHECK_SCRIPTS:-/usr/share/openstack-tripleo-common/healthcheck}/common.sh process='ceilometer-polling' args="${@:-6379}" if healthcheck_port $process $args; then exit 0 else ports=${args// /,} echo "There is no $process process with opened Redis ports ($ports) running in the container" exit 1 fi ()[root@compute-0 /]# Targeting to OSP16 since OSP15 is EOL.
The original issue from description was fixed and the new issue is being handled in bug #1910939. Closing this as duplicate. *** This bug has been marked as a duplicate of bug 1910939 ***