Bug 2203785

Summary: Collectd sensubility stops working after overcloud node was rebooted.
Product: Red Hat OpenStack Reporter: Leonid Natapov <lnatapov>
Component: collectd-sensubilityAssignee: Martin Magr <mmagr>
Status: MODIFIED --- QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: kgilliga, lmadsen, mburns, mmagr, mrunge, pgrist, rheslop
Target Milestone: z1Keywords: Triaged
Target Release: 17.1Flags: ifrangs: needinfo? (mmagr)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: collectd-sensubility-0.2.1-1.el8ost Doc Type: Known Issue
Doc Text:
Currently, there is a permission issue that causes collectd sensubility to stop working after you reboot a baremetal node. As a consequence, sensubility stops reporting container health. Workaround: After rebooting an overcloud node, manually run the following command on the node: `sudo podman exec -it collectd setfacl -R -m u:collectd:rwx /run/podman`
Story Points: ---
Clone Of:
: 2203787 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2203787    

Description Leonid Natapov 2023-05-15 08:56:03 UTC
Collectd sensubility stops working after overcloud node was rebooted.

It happens because while deploying overcloud we are setting collectd user to be able to run /run/podman and apparently setfacl does not survive reboot.


After rebooting overcloud node I am getting following messages in sensubility log file:

podman machine init` and `podman machine start` to manage a new Linux VM\\n

Error: unable to connect to Podman socket: Get \\\"http://d/v4.4.1/libpod/_ping\\\": dial unix ///run/podman/podman.sock: connect: permission denied\\n\\n\",\"status\":\"1\"}}}"},"startsAt":"2023-05-15T04:01:11Z"}}]
[DEBUG] Sent message ACKed. [id: 136]
[DEBUG] Requesting execution of check. [check: check-container-health]
[DEBUG] Executed check script. [output: Failed to list containers:

Comment 2 Leonid Natapov 2023-05-15 09:00:09 UTC
The work around is after rebooting overcloud node manually run a following command:

sudo podman exec -it collectd setfacl -R -m u:collectd:rwx /run/podman

Comment 6 Matthias Runge 2023-07-18 10:04:34 UTC
This is a high severity/priority issue and it is already modified.
Moving back to z1.