OCP Version at Install Time: 4.8 RHCOS Version at Install Time: 4.8 Platform: bare metal Architecture: x86_64 What are you trying to do? What is your use case? Customer is trying to install OCP 4.8 using baremetal UPI installation format and booting the machines using an iPXE server. The iPXE config is included in the further comments. The machines fail in the initramfs stage on the baremetal hosts modeled HP DL360 with ilo5 lifecycle. It fails with the following error message, ~~~ Sep 08 08:22:50 xxxx systemd[1838]: rdma-ndd.service: Failed to execute command: No such file or directory Sep 08 08:22:50 xxxx systemd[1838]: rdma-ndd.service: Failed at step EXEC spawning /usr/sbin/rdma-ndd: No such file or directory Sep 08 08:22:51 xxxx systemd[1]: Starting Dracut Emergency Shell... Sep 08 08:22:51 xxxx systemd[1]: rdma-ndd.service: Main process exited, code=exited, status=203/EXEC Sep 08 08:22:51 xxxx systemd[1]: rdma-ndd.service: Failed with result 'exit-code'. Sep 08 08:22:51 xxxx systemd[1]: Failed to start RDMA Node Description Daemon. Sep 08 08:22:52 xxxx systemd[1]: rdma-ndd.service: Service RestartSec=100ms expired, scheduling restart. Sep 08 08:22:52 xxxx systemd[1]: rdma-ndd.service: Scheduled restart job, restart counter is at 18. Sep 08 08:22:52 xxxx systemd[1]: Stopped RDMA Node Description Daemon. Sep 08 08:22:52 xxxx systemd[1]: Starting RDMA Node Description Daemon... Sep 08 08:22:52 xxxx systemd[1855]: rdma-ndd.service: Failed to execute command: No such file or directory Sep 08 08:22:52 xxxx systemd[1855]: rdma-ndd.service: Failed at step EXEC spawning /usr/sbin/rdma-ndd: No such file or directory ~~~ If I use the live iso of that version on the same machine the RDMA service is active and running, attached is the screenshot of the same. I have seen a similar issue in rhel8 some days back and could be related as well. https://bugzilla.redhat.com/show_bug.cgi?id=1946606 What happened? What went wrong or what did you expect? What are the steps to reproduce your issue? Please try to reduce these steps to something that can be reproduced with a single RHCOS node. - To reproduce the issue, - Configure an iPXE server with the versions of initrd, rootfs, and kernel mentioned in the below comments. - Select a physical host like HP DL360 - Boot up the machine in the iPXE mode
This looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1946606 Specifically, it has been observed on RHEL 8.4 https://bugzilla.redhat.com/show_bug.cgi?id=1946606#c5 using `rdma-core-32.0-4.el8` which is included in RHCOS 4.8 It appears this is fixed in RHEL 8.5, but since RHCOS 4.8 will continue to use RHEL 8.4 EUS content, the fix should be requested to be backported to 8.4.z. Please follow the z-stream backport request procedures here - https://source.redhat.com/departments/pnt/pnt_cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook Note: while this problem affects OpenShift, you must use the process for requesting a RHEL backport as the affected package is in RHEL --- While the backport is requested, we'll mark this BZ as a tracking BZ to follow the progress of the updated package in RHCOS.
This problem was fixed as part of https://bugzilla.redhat.com/show_bug.cgi?id=2019819 in `rdma-core-32.0-5.el8_4` That version of the package was included as part of RHCOS 410.84.202112162002-0 on Dec 12. Since this was fixed in RHEL 8.4.z, it means that the fixed package also landed in RHCOS 4.9/4.8/4.7 I'll create backport BZs to track inclusion of the fixed packages in those releases.