Bug 2223294 - Collectd Sensubility doesn't work on OSP17.1 and RHEL8.
Summary: Collectd Sensubility doesn't work on OSP17.1 and RHEL8.
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: collectd-sensubility
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 17.1
Assignee: Martin Magr
QA Contact: Leonid Natapov
mgeary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-17 09:49 UTC by Leonid Natapov
Modified: 2023-08-14 14:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
There is a known issue when performing an in-place upgrade from RHOSP 16.2 to 17.1 GA. The collection agent, `collectd-sensubility` fails to run on RHEL 8 Compute nodes. + Workaround: On affected nodes edit the file, `/var/lib/container-config-scripts/collectd_check_health.py`, and replace `"healthy: .State.Health.Status}"` with `"healthy: .State.Healthcheck.Status}"/` on line 26.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:
mmagr: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-26638 0 None None None 2023-07-17 09:49:44 UTC

Description Leonid Natapov 2023-07-17 09:49:31 UTC
Colelctd Sensubility doesn't work on OSP17.1 and RHEL8.

latest collectd is collectd-5.12.0-10.el8ost

This scenario ma only happen after FFU in Mixed RHEL environment when compute node(s) are RHEL8. Clean OSP17.1 this scenario won't happen. 


The error that I get in sensubility.log
---------------------------------------

\\\"/scripts/collectd_check_health.py\\\", line 91, in \\u003cmodule\\u003e\\n    rc, status = fetch_container_health(o.decode())\\n  File \\\"/scripts/collectd_check_health.py\\\", line 74, in fetch_container_health\\n    if len(item['healthy']) \\u003e 0 and item['status'] != 'stopped':\\nTypeError: object of type 'NoneType' has no len()\\n\",\"status\":\"1\"}}}"},"startsAt":"2023-07-14T11:12:27Z"}}]
[DEBUG] Requesting execution of check. [check: check-container-health]
[DEBUG] Executed check script. [output: Traceback (most recent call last):
  File "/scripts/collectd_check_health.py", line 91, in <module>
    rc, status = fetch_container_health(o.decode())
  File "/scripts/collectd_check_health.py", line 74, in fetch_container_health
    if len(item['healthy']) > 0 and item['status'] != 'stopped':
TypeError: object of type 'NoneType' has no len()

The problem is that healthcheck script is using podman inspect <container-name> command, which apparently changed output.


Workaround:
-----------

To change /var/lib/container-config-scripts/collectd_check_health.py on line 26 s/“healthy: .State.Health.Status}“/ “healthy: .State.Healthcheck.Status}“/


Note You need to log in before you can comment on or make changes to this bug.