RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1966719 - ipmitool sdr list full fails on Intel S5520HC(Motherboard) and MFSys25(S5520VI) Blade Server since 8.4
Summary: ipmitool sdr list full fails on Intel S5520HC(Motherboard) and MFSys25(S5520V...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: ipmitool
Version: 8.4
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: beta
: ---
Assignee: Pavel Cahyna
QA Contact: Jeff Bastian
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-01 18:11 UTC by Nathan Coulson
Modified: 2022-12-01 07:27 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-12-01 07:27:48 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Nathan Coulson 2021-06-01 18:11:35 UTC
Description of problem:
ipmitool sdr list full results in: (taking about a minute to fail)
<pre>
No data available
Get Device ID command failed
No data available
No data available
No valid response received
No data available
Get Device ID command failed
Unable to open SDR for reading
</pre>



Version-Release number of selected component (if applicable):
This was working fine on AlmaLinux 8.4, with vmlinuz-4.18.0-240.22.1.el8_3.x86_64, but after updating (still fine at this point), then rebooting into vmlinuz-4.18.0-305.el8.x86_64, it stopped working after 5-15 min.


How reproducible:
So far tested 3 blade servers (S5520VI), and 1 standalone server (S5520HC) - Same behavior in all servers.

Steps to Reproduce:
1. Upgrade to vmlinuz-4.18.0-305.el8.x86_64
2. Reboot
3. ipmitool sdr list full (Once in a broken state) - Our monitoring systems run this every 5ish minutes or so.
4. Rebooting back to 4.18.0-240.22.1 does not fix it, but does work again after a full shutdown/startup to that kernel.


Actual results:
ipmitool sdr list full results in: (taking about a minute to fail)
<pre>
No data available
Get Device ID command failed
No data available
No data available
No valid response received
No data available
Get Device ID command failed
Unable to open SDR for reading
</pre>



Expected results:
root@intelu3s2 [/root]# ipmitool sdr list full
BB +12.0V        | 11.78 Volts       | ok
BB+ 5.0V         | 5.23 Volts        | ok
BB+ 3.3V         | 3.20 Volts        | ok
BB +3.3VSTBY     | 3.25 Volts        | ok
BB +5.0VSTBY     | 4.97 Volts        | ok
BB +1.1V P2 vccp | 1.07 Volts        | ok
BB +1.1V P1 Vccp | 0.96 Volts        | ok
P2 DDR3 Voltage  | 1.54 Volts        | ok
P1 DDR3 Voltage  | 1.51 Volts        | ok
BB +1.1V IOH     | 0.49 Volts        | ok
BB +1.8V AUX     | 1.78 Volts        | ok
BB-VBAT          | 2.89 Volts        | ok
Baseboard Temp   | 14 degrees C      | ok
MEM P1 THRM MRGN | -54 degrees C     | ok
MEM P2 THRM MRGN | -55 degrees C     | ok
P1 Therm Ctrl %  | 0 percent         | ok
P2 Therm Ctrl %  | 0 percent         | ok
P2 Therm Margin  | -61 degrees C     | ok
P1 Therm Margin  | -58 degrees C     | ok
Proc Max Therm   | -58 degrees C     | ok
IOH Therm Margin | -64 degrees C     | ok
DIMM Max Temp    | 20 degrees C      | ok
Mezz Temp        | no reading        | ns


Additional info:
Prior to it seizing up, dmesg does show:
[269093.771328] ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 5 cmd 2d
[269400.820231] ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn 7 cmd 1, got netfn 5 cmd 2d
[269400.847859] ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn 5 cmd 27, got netfn 7 cmd 1
[269400.874424] ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn 2d cmd 0, got netfn 5 cmd 27
[269400.894762] ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 2d cmd 0
[269400.921609] ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn 2d cmd 0, got netfn b cmd 23
[269400.942186] ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 2d cmd 0
[269400.967634] ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn 7 cmd 1, got netfn b cmd 23
[269400.997494] ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 7 cmd 1

But we have seen similar messages on Centos 7, and Centos 8.3(and earlier), so may not be directly applicable to the issue

Comment 1 Nathan Coulson 2021-06-01 18:16:17 UTC
Description of problem:

After Upgrading from AlmaLinux 8.3 to 8.4 (Things are still fine at this point) then Rebooting into 4.18.0-305, ipmitool sdr list full stops working.

In addition to this, when locked up, I cannot manage the blade servers via the MFSys25 web interface,
* And rebooting one of these servers when in this state, /dev/ipmi0 was missing, with dmesg saying it couldn't find a BMC at that location.  Unable to test further until I get onsite to pull the blade servers and reinsert due to loss of management

On the standalone board, booting into the old kernel didn't fix the issue, until I did a full shutdown then startup (Done by remote powerbar, due to remote environment.  Unsure if a normal shutdown/startup while power is still applied would fix it)

Comment 2 Pavel Cahyna 2021-06-01 18:51:44 UTC
(In reply to Nathan Coulson from comment #1)

> On the standalone board, booting into the old kernel didn't fix the issue,
> until I did a full shutdown then startup (Done by remote powerbar, due to
> remote environment.  Unsure if a normal shutdown/startup while power is
> still applied would fix it)

Interesting. At this point, the new ipmitool (from 8.4) with the old kernel (from 8.3) keeps working? If so, I suspect a kernel bug more than a ipmitool bug.

Comment 3 Nathan Coulson 2021-06-01 19:14:20 UTC
It appears to be fine with a 8.3 kernel with 8.4 userspace, so I would concur.  At least in the case of a 8.3 system updated to 8.4 running for a few days. 

Doing a test with same hardware, which is newly rebooted into the 8.3 kernel, 8.4 userspace, in case there was something in userspace not restarted during yum upgrade, and so far fine for at least a couple hrs (while this issue seemed to pop up 5-15 min after on the 4 systems out of 4 I rebooted into the 8.4 kernel).

Comment 4 Nathan Coulson 2021-08-31 19:54:13 UTC
As an update, I tested this on 4.18.0-305.12.1.el8_4.x86_64, and it appears to be fine now.

Also not seeing any instances of 'ipmi_si IPI0001:00: BMC returned incorrect response,' anymore (Which was always present on Centos 5,6,7,8), so added bonus.

Comment 5 Andrew Schorr 2021-12-28 14:45:54 UTC
I'm not sure if this is the same problem, but I've been having troubles with 'ipmitool sdr list'
on some old Intel S5500WB and S5520UR motherboards ever since upgrading from 7.9 to 8.4/8.5 a couple
of months ago. I get partial results to the command, and lots of errors. Here's one sample run:

[root@ti129 ~]# ipmitool sdr list
BB +1.1V IOH     | 1.09 Volts        | ok
BB +1.1V P1 Vccp | 0.94 Volts        | ok
BB +1.1V P2 Vccp | 0.93 Volts        | ok
BB +1.5V P1 DDR3 | 1.52 Volts        | ok
BB +1.5V P2 DDR3 | 1.52 Volts        | ok
BB +1.8V AUX     | 1.77 Volts        | ok
BB +3.3V         | 3.31 Volts        | ok
BB +3.3V STBY    | 3.28 Volts        | ok
BB +3.3V Vbat    | 3.09 Volts        | ok
BB +5.0V         | 5.02 Volts        | ok
BB +5.0V STBY    | 4.97 Volts        | ok
BB +12.0V        | 11.88 Volts       | ok
BB -12.0V        | -11.84 Volts      | ok
BB +1.35v P1 MEM | disabled          | ns
BB +1.35v P2 MEM | no reading        | ns
Baseboard Temp   | 28 degrees C      | ok
Front Panel Temp | 25 degrees C      | ok
IOH Therm Margin | -59 degrees C     | ok
Mem P1 Thrm Mrgn | -42 degrees C     | ok
Mem P2 Thrm Mrgn | -44 degrees C     | ok
System Fan 1A    | 13108 RPM         | ok
ipmitool: ipmi_sdr_get_record() failed
System Fan 3A    | 13108 RPM         | ok
System Fan 4A    | 12818 RPM         | ok
System Fan 1B    | 16132 RPM         | ok
System Fan 2B    | 15244 RPM         | ok
System Fan 3B    | 15688 RPM         | ok
System Fan 4B    | 16132 RPM         | ok
System Fan 5A    | 12818 RPM         | ok
ipmitool: ipmi_sdr_get_record() failed
P1 Therm Margin  | -66 degrees C     | ok
P2 Therm Margin  | -65 degrees C     | ok
ipmitool: ipmi_sdr_get_record() failed
P2 Therm Ctrl %  | no reading        | ns
HSBP Temperature | 25 degrees C      | ok
Pwr Unit Status  | 0x00              | ok
IPMI Watchdog    | 0x00              | ok
Physical Scrty   | 0x00              | ok
FP NMI Diag Int  | 0x00              | ok
SMI Timeout      | 0x00              | ok
ipmitool: ipmi_sdr_get_record() failed
System Event     | 0x00              | ok
Button           | 0x00              | ok
ipmitool: ipmi_sdr_get_record() failed
BB +1.1V P1 Vccp | 0.94 Volts        | ok
BB +1.1V P2 Vccp | 0.93 Volts        | ok
BB +1.5V P1 DDR3 | 1.52 Volts        | ok
BB +1.5V P2 DDR3 | 1.52 Volts        | ok
BB +1.8V AUX     | 1.77 Volts        | ok
BB +3.3V         | 3.31 Volts        | ok
BB +3.3V STBY    | 3.28 Volts        | ok
BB +3.3V Vbat    | 3.09 Volts        | ok
ipmitool: ipmi_sdr_get_record() failed
BB +5.0V STBY    | 4.97 Volts        | ok
BB +12.0V        | 11.88 Volts       | ok
BB -12.0V        | -11.84 Volts      | ok
BB +1.35v P1 MEM | disabled          | ns
BB +1.35v P2 MEM | disabled          | ns
Baseboard Temp   | 28 degrees C      | ok
Front Panel Temp | 25 degrees C      | ok
IOH Therm Margin | -59 degrees C     | ok
Mem P1 Thrm Mrgn | -42 degrees C     | ok
ipmitool: ipmi_sdr_get_record() failed
ipmitool: ipmi_sdr_get_record() failed
System Fan 2A    | 13108 RPM         | ok
System Fan 3A    | 13108 RPM         | ok
System Fan 4A    | 12818 RPM         | ok
System Fan 1B    | 16132 RPM         | ok
System Fan 2B    | 15244 RPM         | ok
System Fan 3B    | 15688 RPM         | ok
System Fan 4B    | 16132 RPM         | ok
ipmitool: ipmi_sdr_get_record() failed
System Fan 5B    | 15984 RPM         | ok
P1 Therm Margin  | -66 degrees C     | ok
P2 Therm Margin  | -65 degrees C     | ok
P1 Therm Ctrl %  | no reading        | ns
P2 Therm Ctrl %  | no reading        | ns
HSBP Temperature | 25 degrees C      | ok
Pwr Unit Status  | 0x00              | ok
IPMI Watchdog    | 0x00              | ok
ipmitool: ipmi_sdr_get_record() failed
FP NMI Diag Int  | 0x00              | ok
SMI Timeout      | 0x00              | ok
ipmitool: ipmi_sdr_get_record() failed
System Event     | 0x00              | ok
Button           | 0x00              | ok
ipmitool: ipmi_sdr_get_record() failed
P2 Status        | 0x00              | ok
P1 VRD Hot       | 0x00              | ok
P2 VRD Hot       | 0x00              | ok
CATERR           | 0x00              | ok
CPU Missing      | 0x00              | ok
IOH Therm Trip   | 0x00              | ok
NM Capabilities  | 0x05              | ok
Drv 0 Stat       | 0x00              | ok
Drv 1 Stat       | 0x00              | ok
Drv 2 Stat       | 0x00              | ok
Get SDR 0037 command failed: Unspecified error
Drv 0 Pres       | 0x00              | ok
ipmitool: ipmi_sdr_get_record() failed
Drv 2 Pres       | 0x00              | ok
ipmitool: ipmi_sdr_get_record() failed
ipmitool: ipmi_sdr_get_record() failed
Get SDR 007f command failed: Unspecified error
Get SDR 0089 command failed: Unspecified error
Get SDR 008c command failed: Unspecified error
SDR record id 0x008c: invalid length 0

And these messages are logged:
[root@ti129 ~]# journalctl --no-pager --since today -g ipmi
Dec 28 09:41:45 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn 5 cmd 2d, got netfn c cmd 2d
Dec 28 09:41:46 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn b cmd 46
Dec 28 09:41:51 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:41:54 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn b cmd 47
Dec 28 09:41:56 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:41:58 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:41:59 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn b cmd 47
Dec 28 09:42:02 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:42:04 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn b cmd 48
Dec 28 09:42:04 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:42:05 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:42:07 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:42:07 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:42:09 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:42:13 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:42:14 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:42:15 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Dec 28 09:42:15 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23

I even got an ipmitool core dump from one host yesterday morning with this error message:
free(): invalid next size (fast)

Here's the gdb backtrace:

backtrace
#0  0x00007fab533e937f in raise () from /lib64/libc.so.6
#1  0x00007fab533d3db5 in abort () from /lib64/libc.so.6
#2  0x00007fab5342c4e7 in __libc_message () from /lib64/libc.so.6
#3  0x00007fab534335ec in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fab53434f88 in _int_free () from /lib64/libc.so.6
#5  0x00005651f1e92b00 in ipmi_sdr_get_record ()
#6  0x00005651f1e92cc5 in ipmi_sdr_print_sdr ()
#7  0x00005651f1e94edd in ipmi_sdr_main ()
#8  0x00005651f1ec80c5 in ipmi_main ()
#9  0x00005651f1e8a898 in main ()

So are these kernel bugs or ipmitool bugs or a combination? It's pretty frustrating.

Comment 6 Andrew Schorr 2022-01-05 14:01:59 UTC
I run impitool on these old Intel mobos twice per day, and it always gives lots of error messages and
kernel errors.

Today, I got a new error from ipmitool:

malloc(): corrupted top size

So that's new and exciting. It looks like there are bugs in both the kernel ipmi code
and in the ipmitool user-space code.

Is there any workaround for this or any hope of getting any of this fixed? Or is the response
simply that these old systems must be replaced? They're still working fine otherwise...

Comment 7 Andrew Schorr 2022-01-05 14:03:15 UTC
To be clear, I am seeing these problems using ipmitool-1.8.18-18.el8.x86_64
and kernel 4.18.0-348.2.1.el8_5.x86_64.

Comment 8 Pavel Cahyna 2022-01-05 14:22:18 UTC
Hello Andrew, if the problems started appearing since upgrade from 7.9 to 8.5, can you try ipmitool from 7.9 on the 8.5 system?

Comment 9 Andrew Schorr 2022-01-05 14:42:47 UTC
I copied the ipmitool-1.8.18-9.el7_7.x86_64 binary to an 8.5 system, but it doesn't run:

[root@ti129 ~]# /tmp/ipmitool sdr list
/tmp/ipmitool: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory

I then tried to rebuild the RHEL7 version on RHEL8:

rpmbuild --rebuild ipmitool-1.8.18-9.el7_7.src.rpm

It crapped out with these errors:

/bin/sh ../../../libtool --silent  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I. -I../../..  -I../../../include   -O2 -g -pipe -Wal
l -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches
 -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind
-tables -fstack-clash-protection -fcf-protection -fno-strict-aliasing -Wall -Wextra -std=gnu99 -pedantic -Wformat -Wformat-nonliteral -
c -o lanplus_crypt_impl.lo lanplus_crypt_impl.c
...
lanplus_crypt_impl.c: In function 'lanplus_encrypt_aes_cbc_128':
lanplus_crypt_impl.c:167:17: error: storage size of 'ctx' isn't known
  EVP_CIPHER_CTX ctx;
                 ^~~
lanplus_crypt_impl.c:167:17: warning: unused variable 'ctx' [-Wunused-variable]
lanplus_crypt_impl.c: In function 'lanplus_decrypt_aes_cbc_128':
lanplus_crypt_impl.c:242:17: error: storage size of 'ctx' isn't known
  EVP_CIPHER_CTX ctx;
                 ^~~
lanplus_crypt_impl.c:242:17: warning: unused variable 'ctx' [-Wunused-variable]
lanplus.c: In function 'check_sol_packet_for_new_data':
lanplus.c:2587:29: warning: unused parameter 'intf' [-Wunused-parameter]
          struct ipmi_intf * intf,
          ~~~~~~~~~~~~~~~~~~~^~~~
make[4]: *** [Makefile:454: lanplus_crypt_impl.lo] Error 1

Regards,
Andy

Comment 10 Andrew Schorr 2022-01-05 14:43:58 UTC
And FYI, when ipmitool gave that 'malloc(): corrupted top size', I got a core dump
with this backtrace:

Core was generated by `ipmitool sdr list'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f79cfbef37f in raise () from /lib64/libc.so.6

backtrace
#0  0x00007f79cfbef37f in raise () from /lib64/libc.so.6
#1  0x00007f79cfbd9db5 in abort () from /lib64/libc.so.6
#2  0x00007f79cfc324e7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f79cfc395ec in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f79cfc3ca55 in _int_malloc () from /lib64/libc.so.6
#5  0x00007f79cfc3e8d6 in calloc () from /lib64/libc.so.6
#6  0x00005650d6528950 in ipmi_sdr_get_record ()
#7  0x00005650d6528cc5 in ipmi_sdr_print_sdr ()
#8  0x00005650d652aedd in ipmi_sdr_main ()
#9  0x00005650d655e0c5 in ipmi_main ()
#10 0x00005650d6520898 in main ()

Comment 11 Pavel Cahyna 2022-01-05 17:47:26 UTC
You can copy /lib64/libreadline.so.6 to the new system as well, and do
yum install 'libncurses.so.5()(64bit)'  'libtinfo.so.5()(64bit)' 'libcrypto.so.10()(64bit)'
After that, the old ipmitool should work.

Please also run
yum debuginfo-install ipmitool-1.8.18-18.el8.x86_64

after that, you will be able to get a more informative backtrace from gdb.

Comment 12 Andrew Schorr 2022-01-05 17:58:29 UTC
I copied over libreadline.so.6 and libcrypto.so.10 and put them in LD_LIBRARY_PATH.
With that, I ran it and got some fairly typical errors, consistent with what I see
from the RHEL8 version. Here's one sample run:

[root@ti129 ipmitool]# ./ipmitool sdr list
BB +1.1V IOH     | 1.09 Volts        | ok
BB +1.1V P1 Vccp | 0.94 Volts        | ok
BB +1.1V P2 Vccp | 0.92 Volts        | ok
BB +1.5V P1 DDR3 | 1.52 Volts        | ok
BB +1.5V P2 DDR3 | 1.52 Volts        | ok
BB +1.8V AUX     | 1.77 Volts        | ok
BB +3.3V         | 3.31 Volts        | ok
BB +3.3V STBY    | 3.28 Volts        | ok
BB +3.3V Vbat    | 3.09 Volts        | ok
BB +5.0V         | 5.02 Volts        | ok
BB +5.0V STBY    | 4.97 Volts        | ok
BB +12.0V        | 11.88 Volts       | ok
BB -12.0V        | -11.84 Volts      | ok
BB +1.35v P1 MEM | disabled          | ns
BB +1.35v P2 MEM | disabled          | ns
Baseboard Temp   | 29 degrees C      | ok
Front Panel Temp | 25 degrees C      | ok
IOH Therm Margin | -59 degrees C     | ok
Mem P1 Thrm Mrgn | -42 degrees C     | ok
Mem P2 Thrm Mrgn | -43 degrees C     | ok
System Fan 1A    | 13108 RPM         | ok
System Fan 2A    | 13108 RPM         | ok
System Fan 3A    | 13108 RPM         | ok
System Fan 4A    | 13108 RPM         | ok
System Fan 1B    | 15688 RPM         | ok
System Fan 2B    | 15688 RPM         | ok
System Fan 3B    | 15688 RPM         | ok
System Fan 4B    | 16576 RPM         | ok
System Fan 5A    | 12818 RPM         | ok
System Fan 5B    | 15984 RPM         | ok
P1 Therm Margin  | -67 degrees C     | ok
P2 Therm Margin  | -66 degrees C     | ok
P1 Therm Ctrl %  | no reading        | ns
P2 Therm Ctrl %  | no reading        | ns
HSBP Temperature | 25 degrees C      | ok
Pwr Unit Status  | 0x00              | ok
Get SDR 0025 command failed: Unspecified error
IPMI Watchdog    | 0x00              | ok
Physical Scrty   | 0x00              | ok
FP NMI Diag Int  | 0x00              | ok
SMI Timeout      | 0x00              | ok
System Event Log | 0x00              | ok
System Event     | 0x00              | ok
Button           | 0x00              | ok
P1 Status        | 0x00              | ok
P2 Status        | 0x00              | ok
P1 VRD Hot       | 0x00              | ok
P2 VRD Hot       | 0x00              | ok
CATERR           | 0x00              | ok
CPU Missing      | 0x00              | ok
IOH Therm Trip   | 0x00              | ok
NM Capabilities  | 0x05              | ok
Drv 0 Stat       | 0x00              | ok
Drv 1 Stat       | Not Readable      | ns
Drv 2 Stat       | 0x00              | ok
Drv 0 Pres       | 0x00              | ok
Drv 1 Pres       | 0x00              | ok
Get SDR 0039 command failed: Unspecified error
Drv 2 Pres       | 0x00              | ok
Get SDR 0057 command failed: Unspecified error
ipmitool: ipmi_sdr_get_record() failed
ipmitool: ipmi_sdr_get_record() failed
Get SDR 0089 command failed: Unspecified error
SDR record id 0x008c: invalid length 0

Plus a bunch of kernel complaints in the journal:

Jan 05 12:55:53 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Jan 05 12:55:59 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn b cmd 74
Jan 05 12:56:06 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Jan 05 12:56:06 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Jan 05 12:56:07 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn 9 cmd 23
Jan 05 12:56:09 ti129 kernel: ipmi_si IPI0001:00: BMC returned incorrect response, expected netfn b cmd 23, got netfn b cmd 76

No core dump. That happens only rarely.

Comment 13 Pavel Cahyna 2022-01-05 18:05:22 UTC
> With that, I ran it and got some fairly typical errors, consistent with what I see from the RHEL8 version.

Thanks for the test.

So, if the problem started "since upgrading from 7.9 to 8.4/8.5 a couple of months ago", can we conclude that the new kernel is the likely culprit, not ipmitool, except for the  malloc / core dump errors?

Comment 14 Andrew Schorr 2022-01-05 20:01:49 UTC
Agreed. It is my belief that the kernel is the source of trouble, and in addition, ipmitool has some bugs that are tickled by the kernel defects in RHEL 8.

If I run "valgrind ipmitool sdr list", I get a bunch of complaints:

==3201734== Memcheck, a memory error detector
==3201734== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3201734== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==3201734== Command: ipmitool sdr list
==3201734== 
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12E370: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x191C7F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x15FF6D: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff6f0 is on thread 1's stack
==3201734== 
==3201734== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 15)
==3201734==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==3201734==    by 0x192287: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12E370: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x191C7F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x15FF6D: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734== 
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x15E88A: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1603E0: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff750 is on thread 1's stack
==3201734== 
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x17FC2A: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1605FC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff750 is on thread 1's stack
==3201734== 
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A5BB: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff6b0 is on thread 1's stack
==3201734== 
==3201734== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 15)
==3201734==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==3201734==    by 0x192287: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A5BB: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734== 
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A6C7: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff6b0 is on thread 1's stack
==3201734== 
==3201734== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 14)
==3201734==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==3201734==    by 0x192287: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A6C7: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734== 
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A25F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A656: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff680 is on thread 1's stack
==3201734== 
==3201734== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 2)
==3201734==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==3201734==    by 0x192287: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A25F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A656: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734== 
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A358: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ACA6: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff690 is on thread 1's stack
==3201734== 
Get SDR 0000 command failed: Unspecified error
==3201734== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 7)
==3201734==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==3201734==    by 0x192287: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A358: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ACA6: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734== 
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12AA2B: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ACC4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff680 is on thread 1's stack
==3201734== 
==3201734== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 35)
==3201734==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==3201734==    by 0x192287: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12AA2B: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ACC4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734== 
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x126415: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12693F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x127984: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A163: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12AD9F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff5e0 is on thread 1's stack
==3201734== 
==3201734== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 4)
==3201734==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==3201734==    by 0x192287: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x126415: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12693F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x127984: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A163: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12AD9F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734== 
BB +1.1V IOH     | 1.09 Volts        | ok
BB +1.1V P1 Vccp | 0.95 Volts        | ok
BB +1.1V P2 Vccp | 0.98 Volts        | ok
Get SDR 0004 command failed: Unspecified error
BB +1.5V P1 DDR3 | 1.52 Volts        | ok
ipmitool: ipmi_sdr_get_record() failed
BB +1.8V AUX     | 1.77 Volts        | ok
BB +3.3V         | 3.31 Volts        | ok
BB +3.3V STBY    | 3.28 Volts        | ok
BB +3.3V Vbat    | 3.09 Volts        | ok
BB +5.0V         | 5.02 Volts        | ok
BB +5.0V STBY    | 4.97 Volts        | ok
BB +12.0V        | 11.88 Volts       | ok
BB -12.0V        | -11.86 Volts      | ok
BB +1.35v P1 MEM | disabled          | ns
BB +1.35v P2 MEM | disabled          | ns
Baseboard Temp   | 28 degrees C      | ok
Front Panel Temp | 25 degrees C      | ok
IOH Therm Margin | -59 degrees C     | ok
Mem P1 Thrm Mrgn | -42 degrees C     | ok
ipmitool: ipmi_sdr_get_record() failed
System Fan 1A    | 13108 RPM         | ok
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A25F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12AB40: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ACC4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff650 is on thread 1's stack
==3201734== 
==3201734== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 2)
==3201734==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==3201734==    by 0x192287: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A25F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12AB40: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12ACC4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734== 
System Fan 2A    | 12586 RPM         | ok
System Fan 3A    | 13108 RPM         | ok
System Fan 4A    | 13108 RPM         | ok
System Fan 1B    | 16576 RPM         | ok
System Fan 2B    | 16576 RPM         | ok
System Fan 3B    | 16576 RPM         | ok
System Fan 4B    | 15688 RPM         | ok
System Fan 5     | 12818 RPM         | ok
BB +1.1V P1 Vccp | 0.94 Volts        | ok
BB +1.1V P2 Vccp | 0.92 Volts        | ok
BB +1.5V P1 DDR3 | 1.52 Volts        | ok
BB +1.5V P2 DDR3 | 1.52 Volts        | ok
BB +1.8V AUX     | 1.77 Volts        | ok
ipmitool: ipmi_sdr_get_record() failed
BB +3.3V STBY    | 3.28 Volts        | ok
BB +3.3V Vbat    | 3.09 Volts        | ok
BB +5.0V         | no reading        | ns
BB +5.0V STBY    | 4.97 Volts        | ok
BB +12.0V        | 11.88 Volts       | ok
BB -12.0V        | -11.86 Volts      | ok
BB +1.35v P1 MEM | disabled          | ns
BB +1.35v P2 MEM | disabled          | ns
Baseboard Temp   | 29 degrees C      | ok
ipmitool: ipmi_sdr_get_record() failed
IOH Therm Margin | -59 degrees C     | ok
Mem P1 Thrm Mrgn | -42 degrees C     | ok
Mem P2 Thrm Mrgn | -44 degrees C     | ok
System Fan 1A    | 13108 RPM         | ok
ipmitool: ipmi_sdr_get_record() failed
System Fan 3A    | 13108 RPM         | ok
System Fan 4A    | 13108 RPM         | ok
System Fan 1B    | 16576 RPM         | ok
System Fan 2B    | 16576 RPM         | ok
System Fan 3B    | 16132 RPM         | ok
System Fan 4B    | 15688 RPM         | ok
System Fan 5A    | 12818 RPM         | ok
System Fan 5B    | 15984 RPM         | ok
P1 Therm Margin  | -65 degrees C     | ok
P2 Therm Margin  | -64 degrees C     | ok
ipmitool: ipmi_sdr_get_record() failed
P2 Therm Ctrl %  | no reading        | ns
==3201734== Syscall param ioctl(generic) points to uninitialised byte(s)
==3201734==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==3201734==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1263A4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12693F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x127984: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A163: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12AD9F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734==  Address 0x1ffefff5e0 is on thread 1's stack
==3201734== 
==3201734== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 4)
==3201734==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==3201734==    by 0x192287: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1263A4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12693F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x127984: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12A163: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12AD9F: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x122897: ??? (in /usr/bin/ipmitool)
==3201734==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==3201734== 
HSBP Temperature | 25 degrees C      | ok
Pwr Unit Status  | 0x00              | ok
IPMI Watchdog    | 0x00              | ok
Physical Scrty   | 0x00              | ok
FP NMI Diag Int  | 0x00              | ok
SMI Timeout      | 0x00              | ok
System Event Log | Not Readable      | ns
System Event     | 0x00              | ok
Button           | 0x00              | ok
P1 Status        | 0x00              | ok
ipmitool: ipmi_sdr_get_record() failed
P1 VRD Hot       | 0x00              | ok
P2 VRD Hot       | 0x00              | ok
CATERR           | 0x00              | ok
CPU Missing      | 0x00              | ok
IOH Therm Trip   | 0x00              | ok
NM Capabilities  | 0x05              | ok
Drv 0 Stat       | 0x00              | ok
Drv 1 Stat       | 0x00              | ok
Drv 2 Stat       | 0x00              | ok
Drv 0 Pres       | 0x00              | ok
Drv 1 Pres       | 0x00              | ok
Drv 2 Pres       | 0x00              | ok
ipmitool: ipmi_sdr_get_record() failed
Get SDR 0045 command failed: Unspecified error
ipmitool: ipmi_sdr_get_record() failed
ipmitool: ipmi_sdr_get_record() failed
SDR record id 0x008c: invalid length 0
==3201734== 
==3201734== HEAP SUMMARY:
==3201734==     in use at exit: 0 bytes in 0 blocks
==3201734==   total heap usage: 328 allocs, 328 frees, 12,101 bytes allocated
==3201734== 
==3201734== All heap blocks were freed -- no leaks are possible
==3201734== 
==3201734== Use --track-origins=yes to see where uninitialised values come from
==3201734== For lists of detected and suppressed errors, rerun with: -s
==3201734== ERROR SUMMARY: 1183 errors from 20 contexts (suppressed: 0 from 0)

Comment 15 Andrew Schorr 2022-01-05 20:06:10 UTC
Note: I am also getting valgrind errors on a brand new Supermicro X11SCM-F motherboard.
It seems that something in the code is calling memcpy on overlapping memory regions.
That is not allowed:

==280687== Memcheck, a memory error detector
==280687== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==280687== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==280687== Command: ipmitool sdr list
==280687== 
==280687== Syscall param ioctl(generic) points to uninitialised byte(s)
==280687==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280687==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12E370: ??? (in /usr/bin/ipmitool)
==280687==    by 0x191C7F: ??? (in /usr/bin/ipmitool)
==280687==    by 0x15FF6D: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687==  Address 0x1ffeffff80 is on thread 1's stack
==280687== 
==280687== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 15)
==280687==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280687==    by 0x192287: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12E370: ??? (in /usr/bin/ipmitool)
==280687==    by 0x191C7F: ??? (in /usr/bin/ipmitool)
==280687==    by 0x15FF6D: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687== 
==280687== Syscall param ioctl(generic) points to uninitialised byte(s)
==280687==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280687==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==280687==    by 0x15E88A: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1603E0: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687==  Address 0x1ffeffffe0 is on thread 1's stack
==280687== 
==280687== Syscall param ioctl(generic) points to uninitialised byte(s)
==280687==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280687==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==280687==    by 0x17FC2A: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1605FC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687==  Address 0x1ffeffffe0 is on thread 1's stack
==280687== 
==280687== Syscall param ioctl(generic) points to uninitialised byte(s)
==280687==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280687==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A5BB: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687==  Address 0x1ffeffff40 is on thread 1's stack
==280687== 
==280687== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 15)
==280687==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280687==    by 0x192287: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A5BB: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687== 
==280687== Syscall param ioctl(generic) points to uninitialised byte(s)
==280687==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280687==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A6C7: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687==  Address 0x1ffeffff40 is on thread 1's stack
==280687== 
==280687== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 14)
==280687==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280687==    by 0x192287: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A6C7: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687== 
==280687== Syscall param ioctl(generic) points to uninitialised byte(s)
==280687==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280687==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A25F: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A656: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687==  Address 0x1ffeffff10 is on thread 1's stack
==280687== 
==280687== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 2)
==280687==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280687==    by 0x192287: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A25F: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A656: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ADE9: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687== 
==280687== Syscall param ioctl(generic) points to uninitialised byte(s)
==280687==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280687==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A358: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ACA6: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687==  Address 0x1ffeffff20 is on thread 1's stack
==280687== 
==280687== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 7)
==280687==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280687==    by 0x192287: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A358: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ACA6: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687== 
==280687== Syscall param ioctl(generic) points to uninitialised byte(s)
==280687==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280687==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12AA2B: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ACC4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687==  Address 0x1ffeffff10 is on thread 1's stack
==280687== 
==280687== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 35)
==280687==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280687==    by 0x192287: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12AA2B: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12ACC4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687== 
==280687== Syscall param ioctl(generic) points to uninitialised byte(s)
==280687==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280687==    by 0x19202C: ??? (in /usr/bin/ipmitool)
==280687==    by 0x126415: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12693F: ??? (in /usr/bin/ipmitool)
==280687==    by 0x127984: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A163: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12AD9F: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687==  Address 0x1ffefffe70 is on thread 1's stack
==280687== 
CPU Temp         | no reading        | ns
==280687== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 3)
==280687==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280687==    by 0x192287: ??? (in /usr/bin/ipmitool)
==280687==    by 0x126415: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12693F: ??? (in /usr/bin/ipmitool)
==280687==    by 0x127984: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12A163: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12AD9F: ??? (in /usr/bin/ipmitool)
==280687==    by 0x12CEDC: ??? (in /usr/bin/ipmitool)
==280687==    by 0x1600C4: ??? (in /usr/bin/ipmitool)
==280687==    by 0x122897: ??? (in /usr/bin/ipmitool)
==280687==    by 0x5D7B492: (below main) (in /usr/lib64/libc-2.28.so)
==280687== 
PCH Temp         | 28 degrees C      | ok
System Temp      | 22 degrees C      | ok
Peripheral Temp  | 26 degrees C      | ok
VcpuVRM Temp     | 29 degrees C      | ok
M2NVMeSSD Temp1  | no reading        | ns
M2NVMeSSD Temp2  | no reading        | ns
DIMMA1 Temp      | no reading        | ns
DIMMA2 Temp      | 25 degrees C      | ok
DIMMB1 Temp      | no reading        | ns
DIMMB2 Temp      | 26 degrees C      | ok
FAN1             | 8400 RPM          | ok
FAN2             | 8400 RPM          | ok
FAN3             | 9000 RPM          | ok
FAN4             | no reading        | ns
FANA             | 9300 RPM          | ok
FANB             | no reading        | ns
12V              | 11.94 Volts       | ok
5VCC             | 5.15 Volts        | ok
3.3VCC           | 3.28 Volts        | ok
VBAT             | 0x04              | ok
Vcpu             | 1.07 Volts        | ok
Vdimm            | 1.21 Volts        | ok
VCC_SA           | 1.05 Volts        | ok
5VSB             | 5.14 Volts        | ok
3.3VSB           | 3.28 Volts        | ok
VCC_IO           | 0.95 Volts        | ok
1.8V_PCH         | 1.78 Volts        | ok
1.2V_BMC         | 1.20 Volts        | ok
1.05V_PCH        | 1.05 Volts        | ok
Chassis Intru    | 0x01              | ok
==280687== 
==280687== HEAP SUMMARY:
==280687==     in use at exit: 0 bytes in 0 blocks
==280687==   total heap usage: 80 allocs, 80 frees, 3,932 bytes allocated
==280687== 
==280687== All heap blocks were freed -- no leaks are possible
==280687== 
==280687== Use --track-origins=yes to see where uninitialised values come from
==280687== For lists of detected and suppressed errors, rerun with: -s
==280687== ERROR SUMMARY: 279 errors from 16 contexts (suppressed: 0 from 0)

Comment 16 Andrew Schorr 2022-01-05 20:10:47 UTC
With debuginfo:

==280848== Memcheck, a memory error detector
==280848== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==280848== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==280848== Command: ipmitool sdr list
==280848== 
==280848== Syscall param ioctl(generic) points to uninitialised byte(s)
==280848==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280848==    by 0x19202C: ipmi_openipmi_send_cmd (open.c:369)
==280848==    by 0x12E370: UnknownInlinedFun (ipmi_sel.c:318)
==280848==    by 0x12E370: ipmi_get_oem (ipmi_sel.c:290)
==280848==    by 0x191C7F: ipmi_openipmi_open (open.c:140)
==280848==    by 0x15FF6D: ipmi_main (ipmi_main.c:890)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848==  Address 0x1ffeffff80 is on thread 1's stack
==280848==  in frame #1, created by ipmi_openipmi_send_cmd (open.c:169)
==280848== 
==280848== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 15)
==280848==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280848==    by 0x192287: UnknownInlinedFun (string_fortified.h:40)
==280848==    by 0x192287: ipmi_openipmi_send_cmd (open.c:428)
==280848==    by 0x12E370: UnknownInlinedFun (ipmi_sel.c:318)
==280848==    by 0x12E370: ipmi_get_oem (ipmi_sel.c:290)
==280848==    by 0x191C7F: ipmi_openipmi_open (open.c:140)
==280848==    by 0x15FF6D: ipmi_main (ipmi_main.c:890)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848== 
==280848== Syscall param ioctl(generic) points to uninitialised byte(s)
==280848==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280848==    by 0x19202C: ipmi_openipmi_send_cmd (open.c:369)
==280848==    by 0x15E88A: picmg_discover (ipmi_picmg.c:2358)
==280848==    by 0x1603E0: ipmi_main (ipmi_main.c:901)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848==  Address 0x1ffeffffe0 is on thread 1's stack
==280848==  in frame #1, created by ipmi_openipmi_send_cmd (open.c:169)
==280848== 
==280848== Syscall param ioctl(generic) points to uninitialised byte(s)
==280848==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280848==    by 0x19202C: ipmi_openipmi_send_cmd (open.c:369)
==280848==    by 0x17FC2A: vita_discover (ipmi_vita.c:191)
==280848==    by 0x1605FC: ipmi_main (ipmi_main.c:903)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848==  Address 0x1ffeffffe0 is on thread 1's stack
==280848==  in frame #1, created by ipmi_openipmi_send_cmd (open.c:169)
==280848== 
==280848== Syscall param ioctl(generic) points to uninitialised byte(s)
==280848==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280848==    by 0x19202C: ipmi_openipmi_send_cmd (open.c:369)
==280848==    by 0x12A5BB: ipmi_sdr_start (ipmi_sdr.c:2844)
==280848==    by 0x12ADE9: ipmi_sdr_print_sdr (ipmi_sdr.c:2676)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848==  Address 0x1ffeffff40 is on thread 1's stack
==280848==  in frame #1, created by ipmi_openipmi_send_cmd (open.c:169)
==280848== 
==280848== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 15)
==280848==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280848==    by 0x192287: UnknownInlinedFun (string_fortified.h:40)
==280848==    by 0x192287: ipmi_openipmi_send_cmd (open.c:428)
==280848==    by 0x12A5BB: ipmi_sdr_start (ipmi_sdr.c:2844)
==280848==    by 0x12ADE9: ipmi_sdr_print_sdr (ipmi_sdr.c:2676)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848== 
==280848== Syscall param ioctl(generic) points to uninitialised byte(s)
==280848==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280848==    by 0x19202C: ipmi_openipmi_send_cmd (open.c:369)
==280848==    by 0x12A6C7: ipmi_sdr_start (ipmi_sdr.c:2887)
==280848==    by 0x12ADE9: ipmi_sdr_print_sdr (ipmi_sdr.c:2676)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848==  Address 0x1ffeffff40 is on thread 1's stack
==280848==  in frame #1, created by ipmi_openipmi_send_cmd (open.c:169)
==280848== 
==280848== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 14)
==280848==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280848==    by 0x192287: UnknownInlinedFun (string_fortified.h:40)
==280848==    by 0x192287: ipmi_openipmi_send_cmd (open.c:428)
==280848==    by 0x12A6C7: ipmi_sdr_start (ipmi_sdr.c:2887)
==280848==    by 0x12ADE9: ipmi_sdr_print_sdr (ipmi_sdr.c:2676)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848== 
==280848== Syscall param ioctl(generic) points to uninitialised byte(s)
==280848==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280848==    by 0x19202C: ipmi_openipmi_send_cmd (open.c:369)
==280848==    by 0x12A25F: ipmi_sdr_get_reservation (ipmi_sdr.c:2802)
==280848==    by 0x12A656: ipmi_sdr_start (ipmi_sdr.c:2952)
==280848==    by 0x12ADE9: ipmi_sdr_print_sdr (ipmi_sdr.c:2676)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848==  Address 0x1ffeffff10 is on thread 1's stack
==280848==  in frame #1, created by ipmi_openipmi_send_cmd (open.c:169)
==280848== 
==280848== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 2)
==280848==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280848==    by 0x192287: UnknownInlinedFun (string_fortified.h:40)
==280848==    by 0x192287: ipmi_openipmi_send_cmd (open.c:428)
==280848==    by 0x12A25F: ipmi_sdr_get_reservation (ipmi_sdr.c:2802)
==280848==    by 0x12A656: ipmi_sdr_start (ipmi_sdr.c:2952)
==280848==    by 0x12ADE9: ipmi_sdr_print_sdr (ipmi_sdr.c:2676)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848== 
==280848== Syscall param ioctl(generic) points to uninitialised byte(s)
==280848==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280848==    by 0x19202C: ipmi_openipmi_send_cmd (open.c:369)
==280848==    by 0x12A358: ipmi_sdr_get_header (ipmi_sdr.c:797)
==280848==    by 0x12A358: ipmi_sdr_get_next_header (ipmi_sdr.c:880)
==280848==    by 0x12ACA6: ipmi_sdr_print_sdr (ipmi_sdr.c:2694)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848==  Address 0x1ffeffff20 is on thread 1's stack
==280848==  in frame #1, created by ipmi_openipmi_send_cmd (open.c:169)
==280848== 
==280848== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 7)
==280848==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280848==    by 0x192287: UnknownInlinedFun (string_fortified.h:40)
==280848==    by 0x192287: ipmi_openipmi_send_cmd (open.c:428)
==280848==    by 0x12A358: ipmi_sdr_get_header (ipmi_sdr.c:797)
==280848==    by 0x12A358: ipmi_sdr_get_next_header (ipmi_sdr.c:880)
==280848==    by 0x12ACA6: ipmi_sdr_print_sdr (ipmi_sdr.c:2694)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848== 
==280848== Syscall param ioctl(generic) points to uninitialised byte(s)
==280848==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280848==    by 0x19202C: ipmi_openipmi_send_cmd (open.c:369)
==280848==    by 0x12AA2B: ipmi_sdr_get_record (ipmi_sdr.c:3031)
==280848==    by 0x12ACC4: ipmi_sdr_print_sdr (ipmi_sdr.c:2698)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848==  Address 0x1ffeffff10 is on thread 1's stack
==280848==  in frame #1, created by ipmi_openipmi_send_cmd (open.c:169)
==280848== 
==280848== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 35)
==280848==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280848==    by 0x192287: UnknownInlinedFun (string_fortified.h:40)
==280848==    by 0x192287: ipmi_openipmi_send_cmd (open.c:428)
==280848==    by 0x12AA2B: ipmi_sdr_get_record (ipmi_sdr.c:3031)
==280848==    by 0x12ACC4: ipmi_sdr_print_sdr (ipmi_sdr.c:2698)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848== 
==280848== Syscall param ioctl(generic) points to uninitialised byte(s)
==280848==    at 0x5E4B62B: ioctl (in /usr/lib64/libc-2.28.so)
==280848==    by 0x19202C: ipmi_openipmi_send_cmd (open.c:369)
==280848==    by 0x126415: ipmi_sdr_get_sensor_reading_ipmb (ipmi_sdr.c:601)
==280848==    by 0x12693F: ipmi_sdr_read_sensor_value (ipmi_sdr.c:1434)
==280848==    by 0x12693F: ipmi_sdr_read_sensor_value (ipmi_sdr.c:1398)
==280848==    by 0x127984: ipmi_sdr_print_sensor_fc (ipmi_sdr.c:1528)
==280848==    by 0x12A163: ipmi_sdr_print_rawentry (ipmi_sdr.c:2553)
==280848==    by 0x12AD9F: ipmi_sdr_print_sdr (ipmi_sdr.c:2760)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848==  Address 0x1ffefffe70 is on thread 1's stack
==280848==  in frame #1, created by ipmi_openipmi_send_cmd (open.c:169)
==280848== 
CPU Temp         | no reading        | ns
==280848== Source and destination overlap in memcpy_chk(0x3f45c1, 0x3f45c2, 3)
==280848==    at 0x4C41260: __memcpy_chk (vg_replace_strmem.c:1617)
==280848==    by 0x192287: UnknownInlinedFun (string_fortified.h:40)
==280848==    by 0x192287: ipmi_openipmi_send_cmd (open.c:428)
==280848==    by 0x126415: ipmi_sdr_get_sensor_reading_ipmb (ipmi_sdr.c:601)
==280848==    by 0x12693F: ipmi_sdr_read_sensor_value (ipmi_sdr.c:1434)
==280848==    by 0x12693F: ipmi_sdr_read_sensor_value (ipmi_sdr.c:1398)
==280848==    by 0x127984: ipmi_sdr_print_sensor_fc (ipmi_sdr.c:1528)
==280848==    by 0x12A163: ipmi_sdr_print_rawentry (ipmi_sdr.c:2553)
==280848==    by 0x12AD9F: ipmi_sdr_print_sdr (ipmi_sdr.c:2760)
==280848==    by 0x12CEDC: ipmi_sdr_main (ipmi_sdr.c:4652)
==280848==    by 0x1600C4: ipmi_main (ipmi_main.c:1009)
==280848==    by 0x122897: main (ipmitool.c:136)
==280848== 
PCH Temp         | 28 degrees C      | ok
System Temp      | 23 degrees C      | ok
Peripheral Temp  | 26 degrees C      | ok
VcpuVRM Temp     | 28 degrees C      | ok
M2NVMeSSD Temp1  | no reading        | ns
M2NVMeSSD Temp2  | no reading        | ns
DIMMA1 Temp      | no reading        | ns
DIMMA2 Temp      | 25 degrees C      | ok
DIMMB1 Temp      | no reading        | ns
DIMMB2 Temp      | 26 degrees C      | ok
FAN1             | 8500 RPM          | ok
FAN2             | 8400 RPM          | ok
FAN3             | 9000 RPM          | ok
FAN4             | no reading        | ns
FANA             | 9300 RPM          | ok
FANB             | no reading        | ns
12V              | 11.94 Volts       | ok
5VCC             | 5.15 Volts        | ok
3.3VCC           | 3.28 Volts        | ok
VBAT             | 0x04              | ok
Vcpu             | 0.04 Volts        | ok
Vdimm            | 1.20 Volts        | ok
VCC_SA           | 1.05 Volts        | ok
5VSB             | 5.14 Volts        | ok
3.3VSB           | 3.28 Volts        | ok
VCC_IO           | 0.95 Volts        | ok
1.8V_PCH         | 1.77 Volts        | ok
1.2V_BMC         | 1.20 Volts        | ok
1.05V_PCH        | 1.05 Volts        | ok
Chassis Intru    | 0x01              | ok
==280848== 
==280848== HEAP SUMMARY:
==280848==     in use at exit: 0 bytes in 0 blocks
==280848==   total heap usage: 80 allocs, 80 frees, 3,932 bytes allocated
==280848== 
==280848== All heap blocks were freed -- no leaks are possible
==280848== 
==280848== Use --track-origins=yes to see where uninitialised values come from
==280848== For lists of detected and suppressed errors, rerun with: -s
==280848== ERROR SUMMARY: 279 errors from 16 contexts (suppressed: 0 from 0)

Comment 17 Pavel Cahyna 2022-01-05 22:41:29 UTC
(In reply to Andrew Schorr from comment #14)
> Agreed. It is my belief that the kernel is the source of trouble, and in
> addition, ipmitool has some bugs that are tickled by the kernel defects in
> RHEL 8.

Could you perhaps try to contact the BMC over IP to avoid going through the kernel IPMI driver?

Comment 18 Andrew Schorr 2022-01-06 01:29:13 UTC
That's an interesting idea. Before addressing that, I will note that the valgrind memcpy
errors seem to pertain to these memmove calls in src/plugins/open/open.c:ipmi_openipmi_send_cmd():

        /* save response data for caller */
        if (rsp.ccode == 0 && rsp.data_len > 0) {
           memmove(rsp.data, rsp.data + 1, rsp.data_len);
           rsp.data[rsp.data_len] = 0;
        }

I have no idea why valgrind is flagging them as errors, since overlapping areas
should be OK with memmove. In /usr/include/bits/string_fortified.h, that gets
converted to a call to __builtin___memmove_chk:

__fortify_function void *
__NTH (memmove (void *__dest, const void *__src, size_t __len))
{
  return __builtin___memmove_chk (__dest, __src, __len, __bos0 (__dest));
}

Why does that trigger a valgrind complaint about memcpy? At this point, it seems one would
have to dive into the gcc logic.

I tried using "ipmitool -I lanplus" to run the "sdr list" command, and it does 
at least eliminate the kernel complaints. But it's really slow,
and it terminates like this:

Drv 0 Stat       | 0x00              | ok
Drv 1 Stat       | 0x00              | ok
Drv 2 Stat       | 0x00              | ok
Drv 0 Pres       | 0x00              | ok
Drv 1 Pres       | 0x00              | ok
Drv 2 Pres       | 0x00              | ok
SDR record id 0x008c: invalid length 0

So that still doesn't seem quite normal, and the speed is a problem.
There are also no valgrind complaints when using "-I lanplus".

Comment 19 Andrew Schorr 2022-01-06 01:38:17 UTC
And the results are not consistent when using lanplus. Even more entries seem to be missing.
To take one example, when I run it directly on a certain host, I get 84 records of output
in 23 seconds. When using lanplus from a remote host, it takes 130 seconds, and I
get 57 records of output.

Comment 20 Andrew Schorr 2022-01-06 15:27:47 UTC
FYI, here's the backtrace from that most recent core dump with debuginfo installed:

Reading symbols from /usr/bin/ipmitool...Reading symbols from /usr/lib/debug/usr/bin/ipmitool-1.8.18-18.el8.x86_64.debug...done.
done.
[New LWP 3090452]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `ipmitool sdr list'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f79cfbef37f in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-164.el8.x86_64 ncurses-libs-6.1-9.20180224.el8.x86_64 openssl-libs-1.1.1k-5.el8_5.x86_64 readline-7.0-10.el8.x86_64 zlib-1.2.11-17.el8.x86_64
(gdb) bt
#0  0x00007f79cfbef37f in raise () from /lib64/libc.so.6
#1  0x00007f79cfbd9db5 in abort () from /lib64/libc.so.6
#2  0x00007f79cfc324e7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f79cfc395ec in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f79cfc3ca55 in _int_malloc () from /lib64/libc.so.6
#5  0x00007f79cfc3e8d6 in calloc () from /lib64/libc.so.6
#6  0x00005650d6528950 in ipmi_sdr_get_record (intf=intf@entry=0x5650d67eec60 <ipmi_open_intf>, 
    header=header@entry=0x5650d67efd20 <sdr_rs>, itr=0x5650d81ce2e0) at ipmi_sdr.c:2985
#7  0x00005650d6528cc5 in ipmi_sdr_print_sdr (intf=intf@entry=0x5650d67eec60 <ipmi_open_intf>, type=type@entry=254 '\376')
    at ipmi_sdr.c:2698
#8  0x00005650d652aedd in ipmi_sdr_main (intf=0x5650d67eec60 <ipmi_open_intf>, argc=1, argv=0x7ffe6e27c248) at ipmi_sdr.c:4652
#9  0x00005650d655e0c5 in ipmi_main (argc=<optimized out>, argv=<optimized out>, cmdlist=0x5650d67e2020 <ipmitool_cmd_list>, 
    intflist=0x0) at ipmi_main.c:1009
#10 0x00005650d6520898 in main (argc=<optimized out>, argv=<optimized out>) at ipmitool.c:136
(gdb)

Comment 22 RHEL Program Management 2022-12-01 07:27:48 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.