Bug 1971739 - Keep /boot RW when kdump is enabled
Summary: Keep /boot RW when kdump is enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: RHCOS
Version: 4.8
Hardware: All
OS: Linux
unspecified
low
Target Milestone: ---
: 4.8.0
Assignee: Timothée Ravier
QA Contact: Michael Nguyen
URL:
Whiteboard:
: 1974639 (view as bug list)
Depends On: 1971738
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-14 16:59 UTC by Timothée Ravier
Modified: 2021-07-27 23:13 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The kdump tool is trying to place the generated initrd for kdump support in /boot which is read-only by default. Consequence: Enabling Kdump fails. Fix: Remount /boot read-write when kdump is in use. Result: Enabling kdump is successful.
Clone Of: 1971738
Environment:
Last Closed: 2021-07-27 23:12:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift os pull 563 0 None open overaly: Keep /boot RW when kdump is enabled 2021-06-14 16:59:12 UTC
Github openshift os pull 570 0 None open [release-4.8] Bug 1971739: overaly: Keep /boot RW when kdump is enabled 2021-06-23 10:13:56 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:13:09 UTC

Description Timothée Ravier 2021-06-14 16:59:12 UTC
+++ This bug was initially created as a clone of Bug #1971738 +++

Enabling kdump on RHCOS currently requires /boot to be RW until https://src.fedoraproject.org/rpms/kexec-tools/c/75bdcb7399b6fe48032a8db534e18b01206601bc?branch=rawhide is backported to RHEL 8.4.

Comment 1 Timothée Ravier 2021-06-22 10:24:27 UTC
*** Bug 1974639 has been marked as a duplicate of this bug. ***

Comment 4 Michael Nguyen 2021-07-01 15:52:32 UTC
Verified on 4.8.0-0.nightly-2021-07-01-043852

Install kdump day-1 with openshift-installer placing the following file in the openshift directory
cat kdump.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-kdump
spec:
  kernelArguments:
  - 'crashkernel=256M'
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
      - enabled: true
        name: kdump.service


Once the cluster is up, check kdump and kernel arguments

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-07-01-043852   True        False         111s    Cluster version is 4.8.0-0.nightly-2021-07-01-043852
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-133-72.us-west-2.compute.internal    Ready    worker   15m   v1.21.0-rc.0+1622f87
ip-10-0-135-135.us-west-2.compute.internal   Ready    master   25m   v1.21.0-rc.0+1622f87
ip-10-0-164-250.us-west-2.compute.internal   Ready    worker   17m   v1.21.0-rc.0+1622f87
ip-10-0-166-68.us-west-2.compute.internal    Ready    master   25m   v1.21.0-rc.0+1622f87
ip-10-0-213-255.us-west-2.compute.internal   Ready    master   24m   v1.21.0-rc.0+1622f87
ip-10-0-223-239.us-west-2.compute.internal   Ready    worker   16m   v1.21.0-rc.0+1622f87
$ oc debug node/ip-10-0-133-72.us-west-2.compute.internal
Starting pod/ip-10-0-133-72us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# systemctl is-enabled kdump
enabled
sh-4.4# systemctl status kdump
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kdump.service.d
           └─remount-boot.conf
   Active: active (exited) since Thu 2021-07-01 15:32:43 UTC; 17min ago
  Process: 1313 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
  Process: 1307 ExecStartPre=/usr/bin/mount -o remount,rw /boot (code=exited, status=0/SUCCESS)
 Main PID: 1313 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 46827)
   Memory: 0B
      CPU: 0
   CGroup: /system.slice/kdump.service

Jul 01 15:32:15 ip-10-0-133-72 dracut[1621]: Stored kernel commandline:
Jul 01 15:32:15 ip-10-0-133-72 dracut[1621]: No dracut internal kernel commandline stored in the initramfs
Jul 01 15:32:15 ip-10-0-133-72 dracut[1621]: *** Install squash loader ***
Jul 01 15:32:15 ip-10-0-133-72 dracut[1621]: *** Squashing the files inside the initramfs ***
Jul 01 15:32:39 ip-10-0-133-72 dracut[1621]: *** Squashing the files inside the initramfs done ***
Jul 01 15:32:39 ip-10-0-133-72 dracut[1621]: *** Creating image file '/boot/ostree/rhcos-29ab1c9b5b6c9982c924c03bf74f5b7554>
Jul 01 15:32:42 ip-10-0-133-72 dracut[1621]: *** Creating initramfs image file '/boot/ostree/rhcos-29ab1c9b5b6c9982c924c03b>
Jul 01 15:32:43 ip-10-0-133-72 kdumpctl[1313]: kdump: kexec: loaded kdump kernel
Jul 01 15:32:43 ip-10-0-133-72 kdumpctl[1313]: kdump: Starting kdump: [OK]
Jul 01 15:32:43 ip-10-0-133-72 systemd[1]: Started Crash recovery kernel arming.
sh-4.4# cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-29ab1c9b5b6c9982c924c03bf74f5b7554ce2bf1dad21cd308a8abb8a37ed91f/vmlinuz-4.18.0-305.7.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.0/rhcos/29ab1c9b5b6c9982c924c03bf74f5b7554ce2bf1dad21cd308a8abb8a37ed91f/0 ignition.platform.id=aws root=UUID=b02825d3-82a1-4a69-a70f-31637aa7c512 rw rootflags=prjquota crashkernel=256M
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:549075ee410913efc2a222b1c19ad6653123a526fe4a639f851cf9e0cea8a74e
              CustomOrigin: Managed by machine-config-operator
                   Version: 48.84.202106301921-0 (2021-06-30T19:24:35Z)

  ostree://457db8ff03dda5b3ce1a8e242fd91ddbe6a82f838d1b0047c3d4aeaf6c53f572
                   Version: 48.84.202106091622-0 (2021-06-09T16:25:42Z)
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

Comment 7 errata-xmlrpc 2021-07-27 23:12:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.