Description of problem: The following data is missing in STF/collectd health check; the customer could not be able to create alerts based on it in Prometheus. ovn_controller, ovn_metadata_agent, neutron ovn gallera redis rabbit haproxy How reproducible: 1. Deploy RHOSP 16.2.3, STF 1.4 **STF 1.4 Deployments:** https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/service_telemetry_framework_1.4/assembly-preparing-your-ocp-environment-for-stf_assembly 2. Run following for few hours both on one of the controllers (to figure out ovn and neutron health checks) and on one of the computes (to figure out nova_libvirt) tail -f /var/log/containers/collectd/healthchecks.log > /tmp/debug.log 3. Verify /tmp/debug.log from both nodes Actual results: Data is missing Expected results: Data should be visible Additional info: I'm attaching healthchecks.log from customer environment
There are indeed some containers not reported even though they have health checks associated. Fix for that has been submitted upstream. Unfortunately there are no health checks for "bundle" containers such as ovn-dbs-bundle-podman, haproxy-bundle-podman, redis-bundle-podman, rabbitmq-bundle-podman or galera-bundle-podman. You can check containers that reports health by `systemctl list-timers` and look for `tripleo_<container-name>_healthcheck.service` to figure out on which containers customer can create alerts in Prometheus.
openstack-tripleo-heat-templates-11.6.1-2.20230320130752.f1322eb.el8ost.noarch tested according to test instructions in comment #7. Verified.
If you think customers need a description of this bug in addition to the content of the BZ summary field, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require an additional Doc Text description, please set the 'requires_doc_text' flag to '-'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.2.5 (Train) bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:1763
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days