Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1693196

Summary: [OSP16][Undercloud][healthcheck] failed healthcheck for ceilometer_agent_compute
Product: Red Hat OpenStack Reporter: Artem Hrechanychenko <ahrechan>
Component: openstack-tripleo-commonAssignee: Martin Magr <mmagr>
Status: CLOSED DUPLICATE QA Contact: Leonid Natapov <lnatapov>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: aschultz, cschimid, csibbitt, dhill, jbadiapa, jveiraca, lmadsen, mburns, mmagr, mrunge, slinaber
Target Milestone: z8Keywords: Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-11.4.1-1.20210407183435.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-24 11:23:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Artem Hrechanychenko 2019-03-27 10:28:00 UTC
Description of problem:
After Overcloud installation

check health-check for container on Compute node

[heat-admin@compute-0 ~]$ sudo systemctl status tripleo_ceilometer_agent_compute_healthcheck.service 
● tripleo_ceilometer_agent_compute_healthcheck.service - ceilometer_agent_compute healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_ceilometer_agent_compute_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2019-03-27 10:21:44 UTC; 1min 13s ago
  Process: 136470 ExecStart=/usr/bin/podman exec ceilometer_agent_compute /openstack/healthcheck (code=exited, status=1/FAILURE)
 Main PID: 136470 (code=exited, status=1/FAILURE)

Mar 27 10:21:44 compute-0 systemd[1]: Starting ceilometer_agent_compute healthcheck...
Mar 27 10:21:44 compute-0 podman[136470]: There is no ceilometer-poll process with opened RabbitMQ ports (5671,5672) running in the container
Mar 27 10:21:44 compute-0 podman[136470]: exit status 1
Mar 27 10:21:44 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Mar 27 10:21:44 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Failed with result 'exit-code'.
Mar 27 10:21:44 compute-0 systemd[1]: Failed to start ceilometer_agent_compute healthcheck.


Container runs
f7624d8ba4f0  192.168.24.1:8787/rhosp15/openstack-ceilometer-compute:20190325.1          kolla_start  13 hours ago  Up 13 hours ago         ceilometer_agent_compute

[heat-admin@compute-0 ~]$ sudo podman logs ceilometer_agent_compute
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Kolla config strategy set to: COPY_ALWAYS
INFO:__main__:Copying service configuration files
INFO:__main__:Deleting /etc/ceilometer/ceilometer.conf
INFO:__main__:Copying /var/lib/kolla/config_files/src/etc/ceilometer/ceilometer.conf to /etc/ceilometer/ceilometer.conf
INFO:__main__:Writing out command to execute
++ cat /run_command
+ CMD='/usr/bin/ceilometer-polling --polling-namespaces compute --logfile /var/log/ceilometer/compute.log'
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
++ CEILOMETER_LOG_DIR=/var/log/kolla/ceilometer
++ [[ ! -d /var/log/kolla/ceilometer ]]
++ mkdir -p /var/log/kolla/ceilometer
+++ stat -c %U:%G /var/log/kolla/ceilometer
++ [[ root:kolla != \c\e\i\l\o\m\e\t\e\r\:\k\o\l\l\a ]]
++ chown ceilometer:kolla /var/log/kolla/ceilometer
+++ stat -c %a /var/log/kolla/ceilometer
++ [[ 2755 != \7\5\5 ]]
++ chmod 755 /var/log/kolla/ceilometer
++ . /usr/local/bin/kolla_ceilometer_extend_start
+ echo 'Running command: '\''/usr/bin/ceilometer-polling --polling-namespaces compute --logfile /var/log/ceilometer/compute.log'\'''
Running command: '/usr/bin/ceilometer-polling --polling-namespaces compute --logfile /var/log/ceilometer/compute.log'
+ exec /usr/bin/ceilometer-polling --polling-namespaces compute --logfile /var/log/ceilometer/compute.log

Version-Release number of selected component (if applicable):
OSP15 compose RHOS_TRUNK-15.0-RHEL-8-20190326.n.0

container image openstack-ceilometer-compute:20190325.1 

How reproducible:
Always

Steps to Reproduce:
1.Deploy undercloud OSP15 
2.Deploy Overcloud OSP15
3. check healthcheck status for container on overcloud compute node

Actual results:
There is no ceilometer-poll process with opened RabbitMQ ports (5671,5672) running in the container

Expected results:
service exited with exit code ==0 

Additional info:

Comment 2 Cédric Jeanneret 2019-03-27 12:17:06 UTC
Hello!

pretty sure this one is linked to https://bugzilla.redhat.com/show_bug.cgi?id=1689671
The following patch will probably solve this issue: https://review.openstack.org/648027

I'm taking this BZ.

Cheers,

C.

Comment 6 Nataf Sharabi 2019-06-11 09:12:38 UTC
Hi,

I've installed OSP15 core_puddle=RHOS_TRUNK-15.0-RHEL-8-20190604.n.2

undercloud:1,controller:1,compute:1



(undercloud) [stack@undercloud-0 ~]$ ssh heat-admin.24.12
Warning: Permanently added '192.168.24.12' (ECDSA) to the list of known hosts.
Last login: Mon Jun 10 11:34:33 2019 from 192.168.24.254
[heat-admin@compute-0 ~]$ 
[heat-admin@compute-0 ~]$ sudo systemctl status tripleo_ceilometer_agent_compute_healthcheck.service
● tripleo_ceilometer_agent_compute_healthcheck.service - ceilometer_agent_compute healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_ceilometer_agent_compute_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2019-06-11 08:56:21 UTC; 50s ago
  Process: 252186 ExecStart=/usr/bin/podman exec ceilometer_agent_compute /openstack/healthcheck 5672 (code=exited, status=1/FAILURE)
 Main PID: 252186 (code=exited, status=1/FAILURE)

Jun 11 08:56:21 compute-0 systemd[1]: Starting ceilometer_agent_compute healthcheck...
Jun 11 08:56:21 compute-0 podman[252186]: There is no ceilometer-polling process with opened RabbitMQ ports (5672) running in the container
Jun 11 08:56:21 compute-0 podman[252186]: exit status 1
Jun 11 08:56:21 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Jun 11 08:56:21 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Failed with result 'exit-code'.
Jun 11 08:56:21 compute-0 systemd[1]: Failed to start ceilometer_agent_compute healthcheck.

[heat-admin@compute-0 ~]$ sudo podman logs ceilometer_agent_compute
#The logs are empty

[heat-admin@compute-0 ~]$ sudo podman ls
CONTAINER ID  IMAGE                                                                      COMMAND               CREATED       STATUS           PORTS  NAMES
2a8e9ba993d3  192.168.24.1:8787/rhosp15/openstack-nova-compute:20190604.1                dumb-init --singl...  22 hours ago  Up 22 hours ago         nova_compute
7533c83f18b7  192.168.24.1:8787/rhosp15/openstack-neutron-metadata-agent-ovn:20190604.1  dumb-init --singl...  22 hours ago  Up 22 hours ago         ovn_metadata_agent
8abed1793fad  192.168.24.1:8787/rhosp15/openstack-ovn-controller:20190604.1              dumb-init --singl...  22 hours ago  Up 22 hours ago         ovn_controller
79f3ac5bb7b0  192.168.24.1:8787/rhosp15/openstack-nova-compute:20190604.1                dumb-init --singl...  22 hours ago  Up 22 hours ago         nova_migration_target
cdbff47e1aa0  192.168.24.1:8787/rhosp15/openstack-cron:20190604.1                        dumb-init --singl...  22 hours ago  Up 22 hours ago         logrotate_crond
47c9bef560e4  192.168.24.1:8787/rhosp15/openstack-ceilometer-compute:20190604.1          dumb-init --singl...  22 hours ago  Up 22 hours ago         ceilometer_agent_compute
be547c98832f  192.168.24.1:8787/rhosp15/openstack-iscsid:20190604.1                      dumb-init --singl...  22 hours ago  Up 22 hours ago         iscsid
fb30bb4e95ce  192.168.24.1:8787/rhosp15/openstack-nova-libvirt:20190604.1                dumb-init --singl...  22 hours ago  Up 22 hours ago         nova_libvirt
739e51d60b33  192.168.24.1:8787/rhosp15/openstack-nova-libvirt:20190604.1                dumb-init --singl...  22 hours ago  Up 22 hours ago         nova_virtlogd

It seems that the problem hasn't been resolved.

Nataf

Comment 7 Artem Hrechanychenko 2019-06-13 15:11:45 UTC
Confirm that I got the same issue

Comment 8 Cédric Jeanneret 2019-06-28 07:16:21 UTC
OK, will put it back on the bench and work out a solution then :).

Comment 9 Cédric Jeanneret 2019-07-05 07:36:46 UTC
Setting right DFG(s) - they should take care of the ceilometer healthchecks.

Comment 21 Lon Hohberger 2020-03-06 11:38:23 UTC
According to our records, this should be resolved by openstack-tripleo-common-10.8.3-0.20200113210450.0e559fc.el8ost.  This build is available now.

Comment 22 Martin Magr 2020-09-24 10:20:46 UTC
This is still issue on OSP16. From the output below we can see that the HC script has been fixed, but the default correct value is still being overriden. That happens probably during deploy. Further investigation is required.

[root@compute-0 ~]# systemctl status tripleo_ceilometer_agent_compute_healthcheck.service
● tripleo_ceilometer_agent_compute_healthcheck.service - ceilometer_agent_compute healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_ceilometer_agent_compute_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2020-09-24 10:13:12 UTC; 53s ago
  Process: 309837 ExecStart=/usr/bin/podman exec --user root ceilometer_agent_compute /openstack/healthcheck 5672 (code=exited, status=1/FAILURE)
 Main PID: 309837 (code=exited, status=1/FAILURE)

Sep 24 10:13:11 compute-0 systemd[1]: Starting ceilometer_agent_compute healthcheck...
Sep 24 10:13:12 compute-0 podman[309837]: 2020-09-24 10:13:12.092599591 +0000 UTC m=+0.322381794 container exec f04e88e773d3d4941877dbb20acbfd0ea6971b4f3e68bfde157bc72487271186 (image=undercloud-0.ctlplane.redhat.local:8787/rh-osb>
Sep 24 10:13:12 compute-0 healthcheck_ceilometer_agent_compute[309837]: There is no ceilometer-polling process with opened Redis ports (5672) running in the container
Sep 24 10:13:12 compute-0 healthcheck_ceilometer_agent_compute[309837]: Error: non zero exit code: 1: OCI runtime error
Sep 24 10:13:12 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Sep 24 10:13:12 compute-0 systemd[1]: tripleo_ceilometer_agent_compute_healthcheck.service: Failed with result 'exit-code'.
Sep 24 10:13:12 compute-0 systemd[1]: Failed to start ceilometer_agent_compute healthcheck.
[root@compute-0 ~]# 
[root@compute-0 ~]# podman exec -it ceilometer_agent_compute bash
()[root@compute-0 /]# cat /openstack/healthcheck 
#!/bin/bash

. ${HEALTHCHECK_SCRIPTS:-/usr/share/openstack-tripleo-common/healthcheck}/common.sh

process='ceilometer-polling'
args="${@:-6379}"

if healthcheck_port $process $args; then
    exit 0
else
    ports=${args// /,}
    echo "There is no $process process with opened Redis ports ($ports) running in the container"
    exit 1
fi
()[root@compute-0 /]#

Targeting to OSP16 since OSP15 is EOL.

Comment 30 Martin Magr 2021-08-24 11:23:22 UTC
The original issue from description was fixed and the new issue is being handled in bug #1910939. Closing this as duplicate.

*** This bug has been marked as a duplicate of bug 1910939 ***