--- OCP Version at Install Time: # oc version Client Version: 4.8.0-rc.0 Server Version: 4.8.0-rc.0 Kubernetes Version: v1.21.0-rc.0+120883f RHCOS Version at Install Time: $ sudo cat /etc/os-release NAME="Red Hat Enterprise Linux CoreOS" VERSION="48.84.202106130219-0" ID="rhcos" ID_LIKE="rhel fedora" VERSION_ID="4.8" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux CoreOS 48.84.202106130219-0 (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos" HOME_URL="https://www.redhat.com/" DOCUMENTATION_URL="https://docs.openshift.com/container-platform/4.8/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform" REDHAT_BUGZILLA_PRODUCT_VERSION="4.8" REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform" REDHAT_SUPPORT_PRODUCT_VERSION="4.8" OPENSHIFT_VERSION="4.8" RHEL_VERSION="8.4" OSTREE_VERSION='48.84.202106130219-0' Platform: IBM Power Systems Architecture: ppc64le What are you trying to do? What is your use case? I was trying to configure the Kdumps on the rhcos nodes (worker nodes) What happened? What went wrong or what did you expect? kdumps.service failed to start # systemctl status kdump.service ● kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Mon 2021-06-21 07:10:54 UTC; 5h 8min ago Main PID: 1471 (code=exited, status=1/FAILURE) CPU: 130ms Jun 21 07:10:53 worker-0 systemd[1]: Starting Crash recovery kernel arming... Jun 21 07:10:54 worker-0 kdumpctl[1471]: kdump: No kdump initial ramdisk found. Jun 21 07:10:54 worker-0 kdumpctl[1471]: kdump: Rebuilding /boot/ostree/rhcos-952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716/initramfs-4.18.0-305.3.1.el8_4.ppc64lekdump.img Jun 21 07:10:54 worker-0 kdumpctl[1471]: kdump: /boot/ostree/rhcos-952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716 does not have write permission.Can not rebuild /boot/ostree/rhcos-952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716/initramfs-4.18.0-305.3.1.el8_4.ppc64lekdump.img Jun 21 07:10:54 worker-0 kdumpctl[1471]: kdump: Starting kdump: [FAILED] Jun 21 07:10:54 worker-0 systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE Jun 21 07:10:54 worker-0 systemd[1]: kdump.service: Failed with result 'exit-code'. Jun 21 07:10:54 worker-0 systemd[1]: Failed to start Crash recovery kernel arming. Jun 21 07:10:54 worker-0 systemd[1]: kdump.service: Consumed 130ms CPU time What are the steps to reproduce your issue? Please try to reduce these steps to something that can be reproduced with a single RHCOS node. Using this doc for ref: https://docs.openshift.com/container-platform/4.7/support/troubleshooting/troubleshooting-operating-system-issues.html Other details: # journalctl -b 0 | grep kdumpctl Jun 22 06:17:23 worker-0 kdumpctl[1419]: kdump: No kdump initial ramdisk found. Jun 22 06:17:23 worker-0 kdumpctl[1419]: kdump: Rebuilding /boot/ostree/rhcos-952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716/initramfs-4.18.0-305.3.1.el8_4.ppc64lekdump.img Jun 22 06:17:23 worker-0 kdumpctl[1419]: kdump: /boot/ostree/rhcos-952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716 does not have write permission. Can not rebuild /boot/ostree/rhcos-952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716/initramfs-4.18.0-305.3.1.el8_4.ppc64lekdump.img Jun 22 06:17:23 worker-0 kdumpctl[1419]: kdump: Starting kdump: [FAILED] # cat /proc/cmdline BOOT_IMAGE=(ieee1275//vdevice/v-scsi@30000002/disk@8200000000000000,gpt3)/ostree/rhcos-952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716/vmlinuz-4.18.0-305.3.1.el8_4.ppc64le random.trust_cpu=on console=tty0 console=hvc0,115200n8 ostree=/ostree/boot.1/rhcos/952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716/0 ignition.platform.id=openstack root=UUID=80edca18-0f7e-48b4-bd59-336c5d3fe2a8 rw rootflags=prjquota crashkernel=1024M # dmesg | grep crashkernel [ 0.000000] Reserving 1024MB of memory at 128MB for crashkernel (System RAM: 32768MB) [ 0.000000] Kernel command line: BOOT_IMAGE=(ieee1275//vdevice/v-scsi@30000002/disk@8200000000000000,gpt3)/ostree/rhcos-952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716/vmlinuz-4.18.0-305.3.1.el8_4.ppc64le random.trust_cpu=on console=tty0 console=hvc0,115200n8 ostree=/ostree/boot.1/rhcos/952b979da3b3785d4154f56213b0c66d327a4f29fd167a9d8121ded93d158716/0 ignition.platform.id=openstack root=UUID=80edca18-0f7e-48b4-bd59-336c5d3fe2a8 rw rootflags=prjquota crashkernel=1024M # grep ^[^#] /etc/sysconfig/kdump KDUMP_KERNELVER="" KDUMP_COMMANDLINE="" KDUMP_COMMANDLINE_REMOVE="hugepages hugepagesz slub_debug quiet log_buf_len swiotlb" KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 noirqdistrib reset_devices cgroup_disable=memory numa=off udev.children-max=2 ehea.use_mcs=0 panic=10 rootflags=nofail kvm_cma_resv_ratio=0 transparent_hugepage=never novmcoredd" KEXEC_ARGS="--dt-no-old-root" KDUMP_IMG="vmlinuz" KDUMP_IMG_EXT="" # grep ^[^#] /etc/kdump.conf path /var/crash core_collector makedumpfile -l --message-level 7 -d 31 # rpm -qa | grep kexec-tools kexec-tools-2.0.20-46.el8.ppc64le
*** This bug has been marked as a duplicate of bug 1971739 ***
Likely to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1971739. Current workaround is to remount /boot RW. You can do that by dropping the following config https://github.com/coreos/fedora-coreos-config/blob/testing-devel/overlay.d/12kdump/usr/lib/systemd/system/kdump.service.d/remount-boot.conf in /etc/systemd/system/kdump.service.d/.