Bug 2304312 - collectd fails to start
Summary: collectd fails to start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 17.1 (Wallaby)
Hardware: All
OS: Linux
medium
high
Target Milestone: z4
: 17.1
Assignee: Martin Magr
QA Contact: myadla
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-08-13 12:56 UTC by Siggy Sigwald
Modified: 2025-01-27 10:54 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-14.3.1-17.1.20240919130751.e7c7ce3.el9ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-11-21 09:30:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2249626 0 high CLOSED FFU | FFFU from OSP16.2 to OSP17.1.2 fails becuase wrong systemd unit file syntax 2024-09-04 12:19:45 UTC
Red Hat Issue Tracker OSP-32664 0 None None None 2024-08-26 14:20:01 UTC
Red Hat Product Errata RHSA-2024:9978 0 None None None 2024-11-21 09:30:48 UTC

Description Siggy Sigwald 2024-08-13 12:56:11 UTC
Description of problem:
collectd fails to start with the following error

[tripleo-admin@n1cs1b1-osp1-comp001 ~]$ sudo systemctl status tripleo_podman_collectd_acl.service -l
× tripleo_podman_collectd_acl.service - ACL setting for /var/lib/tripleo-podman/collectd/podman.sock
     Loaded: loaded (/etc/systemd/system/tripleo_podman_collectd_acl.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Thu 2024-08-01 14:14:43 IST; 5h 59min ago
   Main PID: 3676 (code=exited, status=255/EXCEPTION)
        CPU: 131ms

Aug 01 14:14:42 n1cs1b1-osp1-comp001 systemd[1]: Starting ACL setting for /var/lib/tripleo-podman/collectd/podman.sock...
Aug 01 14:14:43 n1cs1b1-osp1-comp001 podman[3676]: 2024-08-01 14:14:43.440984869 +0530 IST m=+0.418289924 system refresh
Aug 01 14:14:43 n1cs1b1-osp1-comp001 podman[3676]: Error: can only create exec sessions on running containers: container state improper
Aug 01 14:14:43 n1cs1b1-osp1-comp001 systemd[1]: tripleo_podman_collectd_acl.service: Main process exited, code=exited, status=255/EXCEPTION
Aug 01 14:14:43 n1cs1b1-osp1-comp001 systemd[1]: tripleo_podman_collectd_acl.service: Failed with result 'exit-code'.
Aug 01 14:14:43 n1cs1b1-osp1-comp001 systemd[1]: Failed to start ACL setting for /var/lib/tripleo-podman/collectd/podman.sock.

It looks like this https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=2249626
but the version in the errata mentioned are older than installed.

Comment 3 Matthias Runge 2024-08-14 09:05:24 UTC
From reading the customer ticket, restarting the service manually works. Does the service stay up afterwards? 

If the service does not stay up, we need collectd log files and also collectd config files from a compute node. 
Can we please also fetch a collectd service file from a compute node?

Comment 6 Siggy Sigwald 2024-08-15 10:21:53 UTC
(In reply to Matthias Runge from comment #3)
> From reading the customer ticket, restarting the service manually works.
> Does the service stay up afterwards? 
Yes it does, however there's clearly a problem as the service should start automatically with the rest of the services and containers. The current workaround is to restart it manually which requires manual intervention for all the nodes in the overcloud.

Comment 8 Martin Magr 2024-08-16 13:58:34 UTC
How often does this happen and which HW your host is? The service tripleo_podman_collectd_acl.service is dependent on tripleo_podman_collectd.service, so this is just a timing issue (collectd container not spawned fast enough before the ACL procedure starts).

Comment 21 errata-xmlrpc 2024-11-21 09:30:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHOSP 17.1.4 (openstack-tripleo-heat-templates) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:9978

Comment 22 Stanley Predovic 2024-12-18 04:51:18 UTC Comment hidden (spam)

Note You need to log in before you can comment on or make changes to this bug.