[2024-01-09 14:09:37] ipmi plugin: Legacy configuration found! Please update your config file. [2024-01-09 14:09:37] plugin_load: plugin "load" successfully loaded. [2024-01-09 14:09:37] plugin_load: plugin "memory" successfully loaded. [2024-01-09 14:09:37] plugin_load: plugin "python" successfully loaded. [2024-01-09 14:09:37] plugin_load: plugin "thermal" successfully loaded. [2024-01-09 14:09:37] plugin_load: plugin "unixsock" successfully loaded. [2024-01-09 14:09:37] plugin_load: plugin "uptime" successfully loaded. [2024-01-09 14:09:37] plugin_load: plugin "virt" successfully loaded. [2024-01-09 14:09:37] plugin_load: plugin "vmem" successfully loaded. [2024-01-09 14:09:37] UNKNOWN plugin: plugin_get_interval: Unable to determine Interval from context. [2024-01-09 14:09:37] plugin_load: plugin "libpodstats" successfully loaded. [2024-01-09 14:09:37] ipmi plugin: ipmi_smi_setup_con failed for `main`: OS: Operation not permitted <============================================================================ [2024-01-09 14:09:37] ipmi plugin: c_ipmi_thread_init failed. [2024-01-09 14:09:37] virt plugin: reader virt-0 initialized [2024-01-09 14:09:37] Initialization complete, entering read-loop. [2024-01-09 14:09:37] ipmi plugin: c_ipmi_read: I'm not active, returning false. [2024-01-09 14:09:37] read-function of plugin `ipmi/main' failed. Will suspend it for 120.000 seconds. [2024-01-09 14:11:37] ipmi plugin: c_ipmi_read: I'm not active, returning false. [2024-01-09 14:11:37] read-function of plugin `ipmi/main' failed. Will suspend it for 240.000 seconds. [2024-01-09 14:15:37] ipmi plugin: c_ipmi_read: I'm not active, returning false. [2024-01-09 14:15:37] read-function of plugin `ipmi/main' failed. Will suspend it for 480.000 seconds. selinux ?
The call that is failing is in OpenIPMI: ~~~ ipmi_smi_setup_con(int if_num, os_handler_t *handlers, void *user_data, ipmi_con_t **new_con) { int err; if (!handlers->add_fd_to_wait_for || !handlers->remove_fd_to_wait_for || !handlers->alloc_timer || !handlers->free_timer) return ENOSYS; err = setup(if_num, handlers, user_data, new_con); return err; } ~~~ after being called from collectd: ~~~ static int c_ipmi_thread_init(c_ipmi_instance_t *st) { ipmi_domain_id_t domain_id; int status; if (st->connaddr != NULL) { status = ipmi_ip_setup_con( &st->connaddr, &(char *){IPMI_LAN_STD_PORT_STR}, 1, st->authtype, (unsigned int)IPMI_PRIVILEGE_USER, st->username, strlen(st->username), st->password, strlen(st->password), os_handler, /* user data = */ NULL, &st->connection); if (status != 0) { c_ipmi_error(st, "ipmi_ip_setup_con", status); return -1; } } else { status = ipmi_smi_setup_con(/* if_num = */ 0, os_handler, <=============================================================== /* user data = */ NULL, &st->connection); if (status != 0) { c_ipmi_error(st, "ipmi_smi_setup_con", status); return -1; } } ~~~ [dhill@supportshell-1 kernel]$ cat lsmod | grep ipmi ipmi_ssif 32768 0 acpi_ipmi 16384 0 ipmi_si 65536 2 ipmi_devintf 20480 2 ipmi_msghandler 110592 4 ipmi_devintf,ipmi_si,acpi_ipmi,ipmi_ssif
Yeah that's what I was consider doing in a remote session ... either --privileged or "--cap_add all" . Is it the same ?
There was a patch merged, that would allow to add capabilities to the container as a config option.
for reference, see https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/managing_overcloud_observability/collectd-plugins_assembly#collectd_plugin_smart
or see eg. https://bugzilla.redhat.com/show_bug.cgi?id=1984556
Step by step verification. 1.Have a BM environment with OSP16.2 deployed. 2. Install THT RPM with the fix on undercloud. RPM can be found here: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2989979 3.Install paunch RPM with the fix on all overcloud nodes. RPM can be found here: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2975960 4.Create custom template with the following content resource_registry: OS::TripleO::Services::Collectd: /usr/share/openstack-tripleo-heat-templates/deployment/metrics/collectd-container-puppet.yaml parameter_defaults: ComputeSriovOffloadParameters: IpmiMonitor: '/dev/ipmi0' 5. Include path to this yaml in overcloud_deploy.sh 6. Run ./overcloud_deploy.sh 7.After update successfully finished connect to one of overcloud nodes and check that collectd container is up and running. 8.Connect to collectd container by running: podman exec -it collectd /bin/sh 9.executed ipmitool sensor command inside collectd container and got the output.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.2 bug fix advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2025:0200