Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2149002

Summary: Metrics are missing in collectd healthchecks
Product: Red Hat OpenStack Reporter: Abhishek <abhijadh>
Component: openstack-tripleo-heat-templatesAssignee: Martin Magr <mmagr>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact: Joanne O'Flynn <joflynn>
Priority: low    
Version: 16.2 (Train)CC: erpeters, jschluet, lmadsen, mburns, mmagr, praveen.k.dubey
Target Milestone: z5Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20230211104940.370c34a Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
: 2149008 (view as bug list) Environment:
Last Closed: 2023-04-26 12:17:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2149008    

Comment 1 Abhishek 2022-11-28 13:56:23 UTC
Description of problem:

The following data is missing in STF/collectd health check; the customer could not be able to create alerts based on it in Prometheus.

ovn_controller, 
ovn_metadata_agent,
neutron ovn
gallera
redis
rabbit
haproxy


How reproducible:

1. Deploy RHOSP 16.2.3, STF 1.4


**STF 1.4 Deployments:**
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.2/html/service_telemetry_framework_1.4/assembly-preparing-your-ocp-environment-for-stf_assembly

2. Run following for few hours both on one of the controllers (to figure out ovn and neutron health checks) and on one of the computes (to figure out nova_libvirt)
tail -f /var/log/containers/collectd/healthchecks.log > /tmp/debug.log 

3. Verify  /tmp/debug.log from both nodes

Actual results:
Data is missing 

Expected results:
Data should be visible 


Additional info:
I'm attaching healthchecks.log from customer environment

Comment 3 Martin Magr 2022-11-29 20:44:59 UTC
There are indeed some containers not reported even though they have health checks associated. Fix for that has been submitted upstream.

Unfortunately there are no health checks for "bundle" containers such as ovn-dbs-bundle-podman, haproxy-bundle-podman, redis-bundle-podman, rabbitmq-bundle-podman or galera-bundle-podman. You can check containers that reports health by `systemctl list-timers` and look for `tripleo_<container-name>_healthcheck.service` to figure out on which containers customer can create alerts in Prometheus.

Comment 10 Leonid Natapov 2023-03-29 07:57:56 UTC
openstack-tripleo-heat-templates-11.6.1-2.20230320130752.f1322eb.el8ost.noarch

tested according to test instructions in comment #7.
Verified.

Comment 11 Erin Peterson 2023-04-18 17:59:16 UTC
If you think customers need a description of this bug in addition to the content of the BZ summary field, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.
 
If this bug does not require an additional Doc Text description, please set the 'requires_doc_text' flag to '-'.

Comment 17 errata-xmlrpc 2023-04-26 12:17:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2.5 (Train) bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1763

Comment 18 Red Hat Bugzilla 2023-09-19 04:30:47 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days