Bug 2223294

Summary: Collectd Sensubility doesn't work on OSP17.1 and RHEL8.
Product: Red Hat OpenStack Reporter: Leonid Natapov <lnatapov>
Component: openstack-tripleo-heat-templatesAssignee: Martin Magr <mmagr>
Status: CLOSED ERRATA QA Contact: myadla
Severity: high Docs Contact: mgeary <mgeary>
Priority: high    
Version: 17.1 (Wallaby)CC: gregraka, jamsmith, lmadsen, mariel, mburns, mmagr, mrunge, pgrist
Target Milestone: z2Keywords: Reopened, Triaged, ZStream
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-14.3.1-17.1.20231103003743.e7c7ce3.el8ost Doc Type: Bug Fix
Doc Text:
This update fixes a bug that caused failure of the collection agent `collectd-sensubility` on RHEL 8 Compute nodes during an in-place upgrade from RHOSP 16.2 to 17.1.
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-01-16 14:36:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Leonid Natapov 2023-07-17 09:49:31 UTC
Colelctd Sensubility doesn't work on OSP17.1 and RHEL8.

latest collectd is collectd-5.12.0-10.el8ost

This scenario ma only happen after FFU in Mixed RHEL environment when compute node(s) are RHEL8. Clean OSP17.1 this scenario won't happen. 


The error that I get in sensubility.log
---------------------------------------

\\\"/scripts/collectd_check_health.py\\\", line 91, in \\u003cmodule\\u003e\\n    rc, status = fetch_container_health(o.decode())\\n  File \\\"/scripts/collectd_check_health.py\\\", line 74, in fetch_container_health\\n    if len(item['healthy']) \\u003e 0 and item['status'] != 'stopped':\\nTypeError: object of type 'NoneType' has no len()\\n\",\"status\":\"1\"}}}"},"startsAt":"2023-07-14T11:12:27Z"}}]
[DEBUG] Requesting execution of check. [check: check-container-health]
[DEBUG] Executed check script. [output: Traceback (most recent call last):
  File "/scripts/collectd_check_health.py", line 91, in <module>
    rc, status = fetch_container_health(o.decode())
  File "/scripts/collectd_check_health.py", line 74, in fetch_container_health
    if len(item['healthy']) > 0 and item['status'] != 'stopped':
TypeError: object of type 'NoneType' has no len()

The problem is that healthcheck script is using podman inspect <container-name> command, which apparently changed output.


Workaround:
-----------

To change /var/lib/container-config-scripts/collectd_check_health.py on line 26 s/“healthy: .State.Health.Status}“/ “healthy: .State.Healthcheck.Status}“/

Comment 14 Martin Magr 2023-09-22 14:11:13 UTC
Even when imfile is loaded and no output module is loaded, the rsyslog container fails to start [1]. Failure to configure even single output module is configuration issue not a bug.


[1] rsyslogd: there are no active actions configured. Inputs would run, but no output whatsoever were created. [v8.2102.0-15.el8 try https://www.rsyslog.com/e/2103 ]
rsyslogd: run failed with error -2103 (see rsyslog.h or try https://www.rsyslog.com/e/2103 to learn what that number means)

Comment 15 Martin Magr 2023-09-22 14:12:26 UTC
Wrong BZ, sorry.

Comment 19 myadla 2023-12-04 20:55:53 UTC
Script "collectd_check_health.py" is working good in OSP17.1z2 deployment.. 

Need to deploy OSP16.2 with Rhel8.4 and verify it works on the same.

Comment 33 errata-xmlrpc 2024-01-16 14:36:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 17.1.2 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0185