Bug 2257395 - [RHOSP 16.2] "ipmi/main" plugin read error in collectd container
Summary: [RHOSP 16.2] "ipmi/main" plugin read error in collectd container
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
low
high
Target Milestone: async
: 16.2 (Train on RHEL 8.4)
Assignee: OSP Team
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks: 2274010
TreeView+ depends on / blocked
 
Reported: 2024-01-09 10:34 UTC by Sukhendu Kar
Modified: 2025-01-09 14:55 UTC (History)
13 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.6.1-2.20230808225222.el8ost python-paunch-5.5.2-2.20240522085137.a0ae7ec.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2274010 (view as bug list)
Environment:
Last Closed: 2025-01-09 14:55:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-31105 0 None None None 2024-01-09 10:35:13 UTC
Red Hat Product Errata RHBA-2025:0200 0 None None None 2025-01-09 14:55:27 UTC

Comment 5 David Hill 2024-01-11 22:02:47 UTC
[2024-01-09 14:09:37] ipmi plugin: Legacy configuration found! Please update your config file.
[2024-01-09 14:09:37] plugin_load: plugin "load" successfully loaded.
[2024-01-09 14:09:37] plugin_load: plugin "memory" successfully loaded.
[2024-01-09 14:09:37] plugin_load: plugin "python" successfully loaded.
[2024-01-09 14:09:37] plugin_load: plugin "thermal" successfully loaded.
[2024-01-09 14:09:37] plugin_load: plugin "unixsock" successfully loaded.
[2024-01-09 14:09:37] plugin_load: plugin "uptime" successfully loaded.
[2024-01-09 14:09:37] plugin_load: plugin "virt" successfully loaded.
[2024-01-09 14:09:37] plugin_load: plugin "vmem" successfully loaded.
[2024-01-09 14:09:37] UNKNOWN plugin: plugin_get_interval: Unable to determine Interval from context.
[2024-01-09 14:09:37] plugin_load: plugin "libpodstats" successfully loaded.
[2024-01-09 14:09:37] ipmi plugin: ipmi_smi_setup_con failed for `main`: OS: Operation not permitted <============================================================================
[2024-01-09 14:09:37] ipmi plugin: c_ipmi_thread_init failed.
[2024-01-09 14:09:37] virt plugin: reader virt-0 initialized
[2024-01-09 14:09:37] Initialization complete, entering read-loop.
[2024-01-09 14:09:37] ipmi plugin: c_ipmi_read: I'm not active, returning false.
[2024-01-09 14:09:37] read-function of plugin `ipmi/main' failed. Will suspend it for 120.000 seconds.
[2024-01-09 14:11:37] ipmi plugin: c_ipmi_read: I'm not active, returning false.
[2024-01-09 14:11:37] read-function of plugin `ipmi/main' failed. Will suspend it for 240.000 seconds.
[2024-01-09 14:15:37] ipmi plugin: c_ipmi_read: I'm not active, returning false.
[2024-01-09 14:15:37] read-function of plugin `ipmi/main' failed. Will suspend it for 480.000 seconds.

selinux ?

Comment 7 David Hill 2024-01-12 13:07:54 UTC
The call that is failing is in OpenIPMI:
~~~
ipmi_smi_setup_con(int          if_num,
       os_handler_t *handlers,
       void         *user_data,
       ipmi_con_t   **new_con)
{ 
    int err;
    
    if (!handlers->add_fd_to_wait_for
  || !handlers->remove_fd_to_wait_for
  || !handlers->alloc_timer
  || !handlers->free_timer)
  return ENOSYS;
    
    err = setup(if_num, handlers, user_data, new_con);
    return err;
}
~~~

after being called from collectd:
~~~
static int c_ipmi_thread_init(c_ipmi_instance_t *st) {
  ipmi_domain_id_t domain_id;
  int status;

  if (st->connaddr != NULL) {
    status = ipmi_ip_setup_con(
        &st->connaddr, &(char *){IPMI_LAN_STD_PORT_STR}, 1, st->authtype,
        (unsigned int)IPMI_PRIVILEGE_USER, st->username, strlen(st->username),
        st->password, strlen(st->password), os_handler,
        /* user data = */ NULL, &st->connection);
    if (status != 0) {
      c_ipmi_error(st, "ipmi_ip_setup_con", status);
      return -1;
    }
  } else {
    status = ipmi_smi_setup_con(/* if_num = */ 0, os_handler, <===============================================================
                                /* user data = */ NULL, &st->connection);
    if (status != 0) {
      c_ipmi_error(st, "ipmi_smi_setup_con", status);
      return -1;
    }
  }
~~~


[dhill@supportshell-1 kernel]$ cat lsmod  | grep ipmi
ipmi_ssif              32768  0
acpi_ipmi              16384  0
ipmi_si                65536  2
ipmi_devintf           20480  2
ipmi_msghandler       110592  4 ipmi_devintf,ipmi_si,acpi_ipmi,ipmi_ssif

Comment 9 David Hill 2024-01-12 13:45:29 UTC
Yeah that's what I was consider doing in a remote session ... either --privileged or "--cap_add all" .  Is it the same ?

Comment 10 Matthias Runge 2024-01-15 12:05:23 UTC
There was a patch merged, that would allow to add capabilities to the container as a config option.

Comment 12 Matthias Runge 2024-01-15 12:14:21 UTC
or see eg. https://bugzilla.redhat.com/show_bug.cgi?id=1984556

Comment 32 Leonid Natapov 2024-04-18 14:04:25 UTC
Step by step verification.

1.Have a BM environment with OSP16.2 deployed.
2. Install THT RPM with the fix on undercloud. RPM can be found here:  https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2989979
3.Install paunch RPM with the fix on all overcloud nodes. RPM can be found here: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=2975960
4.Create custom template with the following content

resource_registry:
  OS::TripleO::Services::Collectd: /usr/share/openstack-tripleo-heat-templates/deployment/metrics/collectd-container-puppet.yaml

parameter_defaults:
  ComputeSriovOffloadParameters:
    IpmiMonitor: '/dev/ipmi0'

5. Include  path to this yaml in overcloud_deploy.sh 
6. Run ./overcloud_deploy.sh
7.After update successfully finished connect to one of overcloud nodes and check that collectd container is up and running.
8.Connect to collectd container by running: podman exec -it collectd /bin/sh
9.executed ipmitool sensor  command inside collectd container and got the output.

Comment 53 errata-xmlrpc 2025-01-09 14:55:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.2 bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:0200


Note You need to log in before you can comment on or make changes to this bug.