Bug 2203785 - Collectd sensubility stops working after overcloud node was rebooted. [NEEDINFO]
Summary: Collectd sensubility stops working after overcloud node was rebooted.
Keywords:
Status: MODIFIED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: collectd-sensubility
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 17.1
Assignee: Martin Magr
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks: 2203787
TreeView+ depends on / blocked
 
Reported: 2023-05-15 08:56 UTC by Leonid Natapov
Modified: 2023-08-14 13:43 UTC (History)
7 users (show)

Fixed In Version: collectd-sensubility-0.2.1-1.el8ost
Doc Type: Known Issue
Doc Text:
Currently, there is a permission issue that causes collectd sensubility to stop working after you reboot a baremetal node. As a consequence, sensubility stops reporting container health. Workaround: After rebooting an overcloud node, manually run the following command on the node: `sudo podman exec -it collectd setfacl -R -m u:collectd:rwx /run/podman`
Clone Of:
: 2203787 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:
Embargoed:
ifrangs: needinfo? (mmagr)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 883447 0 None NEW Ensure podman,sock ACL survives reboot 2023-05-17 21:09:49 UTC
Red Hat Issue Tracker OSP-25054 0 None None None 2023-05-15 08:57:16 UTC

Description Leonid Natapov 2023-05-15 08:56:03 UTC
Collectd sensubility stops working after overcloud node was rebooted.

It happens because while deploying overcloud we are setting collectd user to be able to run /run/podman and apparently setfacl does not survive reboot.


After rebooting overcloud node I am getting following messages in sensubility log file:

podman machine init` and `podman machine start` to manage a new Linux VM\\n

Error: unable to connect to Podman socket: Get \\\"http://d/v4.4.1/libpod/_ping\\\": dial unix ///run/podman/podman.sock: connect: permission denied\\n\\n\",\"status\":\"1\"}}}"},"startsAt":"2023-05-15T04:01:11Z"}}]
[DEBUG] Sent message ACKed. [id: 136]
[DEBUG] Requesting execution of check. [check: check-container-health]
[DEBUG] Executed check script. [output: Failed to list containers:

Comment 2 Leonid Natapov 2023-05-15 09:00:09 UTC
The work around is after rebooting overcloud node manually run a following command:

sudo podman exec -it collectd setfacl -R -m u:collectd:rwx /run/podman

Comment 6 Matthias Runge 2023-07-18 10:04:34 UTC
This is a high severity/priority issue and it is already modified.
Moving back to z1.


Note You need to log in before you can comment on or make changes to this bug.