1971738 – Keep /boot RW when kdump is enabled

Bug 1971738 - Keep /boot RW when kdump is enabled

Summary: Keep /boot RW when kdump is enabled

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	RHCOS
Sub Component:
Version:	4.9
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Timothée Ravier
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1971739
TreeView+	depends on / blocked

Reported:	2021-06-14 16:56 UTC by Timothée Ravier
Modified:	2021-10-18 17:34 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1971739 (view as bug list)
Environment:
Last Closed:	2021-10-18 17:34:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift os pull 563	0	None	open	overaly: Keep /boot RW when kdump is enabled	2021-06-14 16:58:24 UTC
Red Hat Bugzilla	1971734	1	unspecified	CLOSED	Backport support for /boot being RO	2021-06-15 07:30:51 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:34:38 UTC

Description Timothée Ravier 2021-06-14 16:56:59 UTC

Enabling kdump on RHCOS currently requires /boot to be RW until https://src.fedoraproject.org/rpms/kexec-tools/c/75bdcb7399b6fe48032a8db534e18b01206601bc?branch=rawhide is backported to RHEL 8.4.

Comment 2 Timothée Ravier 2021-06-23 10:37:44 UTC

Moving back to POST has this needs the second PR.

Comment 4 Michael Nguyen 2021-06-30 14:23:42 UTC

@travier  I wasn't able to get this to work.  I think this requires a boot image bump to enable kdump day-1.

-- Logs begin at Wed 2021-06-30 12:51:02 UTC, end at Wed 2021-06-30 14:18:20 UTC. --
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: Starting Crash recovery kernel arming...
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal kdumpctl[312655]: kdump: No kdump initial ramdisk found.
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal kdumpctl[312655]: kdump: Rebuilding /boot/ostree/rhcos-804feafb00025bb841960dcf2cce555bb8988>
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal kdumpctl[312655]: kdump: /boot/ostree/rhcos-804feafb00025bb841960dcf2cce555bb8988d5b4bda50cc>
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal kdumpctl[312655]: kdump: Starting kdump: [FAILED]
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: kdump.service: Failed with result 'exit-code'.
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: Failed to start Crash recovery kernel arming.
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: kdump.service: Consumed 82ms CPU time


sh-4.4# sudo rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb121c97e974d7372eba219506205f364ab8f7809de74a3ef4c0157c4372938c
              CustomOrigin: Managed by machine-config-operator
                   Version: 48.84.202106162024-0 (2021-06-16T20:28:08Z)

  ostree://92ede04b462bc884de5562062fb45e06d803754cbaa466e3a2d34b4ee5e9634b
                   Version: 48.84.202105190318-0 (2021-05-19T03:22:10Z)

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-06-30-034414   True        False         66m     Cluster version is 4.9.0-0.nightly-2021-06-30-034414

Comment 5 Timothée Ravier 2021-06-30 15:55:43 UTC

This was merged in master on Jun 23, 2021 so 48.84.202106162024-0 will not have it. You can check if the file is included with: `systemctl cat kdump.service`

Comment 6 Timothée Ravier 2021-06-30 15:56:29 UTC

This should not require a boot image bump AFAIK.

Comment 7 Michael Nguyen 2021-07-01 17:56:46 UTC

Confirmed, this does not need a boot image bump but https://github.com/openshift/os/pull/561 needs to be in the boot image in order for this to work.

Comment 8 Timothée Ravier 2021-07-07 11:24:59 UTC

A boot image bump is only needed for https://github.com/openshift/os/pull/561 if kdump is setup on day-1 via Ignition or a MC, not for manual setup.

Comment 9 Michael Nguyen 2021-07-07 15:23:02 UTC

Thanks for the clarification.  I was able to verify this fix on enabling kdump day-2.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-07-07-021823   True        False         35m     Cluster version is 4.9.0-0.nightly-2021-07-07-021823
$ cat 99-kdump-worker.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-kdump
spec:
  kernelArguments:
  - 'crashkernel=256M'
  config:
    ignition:
      version: 3.2.0
    systemd:
      units:
      - enabled: true
        name: kdump.service
$ oc create -f 99-kdump-worker.yaml 
machineconfig.machineconfiguration.openshift.io/99-worker-kdump created
$ oc get mc
NAME                                               GENERATEDBYCONTROLLER                      IGNITIONVERSION   AGE
00-master                                          95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
00-worker                                          95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
01-master-container-runtime                        95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
01-master-kubelet                                  95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
01-worker-container-runtime                        95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
01-worker-kubelet                                  95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
99-master-generated-registries                     95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
99-master-ssh                                                                                 3.2.0             63m
99-worker-generated-registries                     95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
99-worker-kdump                                                                               3.2.0             9s
99-worker-ssh                                                                                 3.2.0             63m
rendered-master-38ccce6954aff7d3ae792692e8962bb1   95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
rendered-worker-0b5f5063c2b34c7b307de02d3cb44eaa   95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             4s
rendered-worker-da71f2941166daf4ff90249dc4d88a52   95dcdb123f1a5fa8887a1fb66a044c9ad74191ce   3.2.0             60m
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-38ccce6954aff7d3ae792692e8962bb1   True      False      False      3              3                   3                     0                      62m
worker   rendered-worker-da71f2941166daf4ff90249dc4d88a52   False     True       False      3              0                   0                     0                      62m
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-38ccce6954aff7d3ae792692e8962bb1   True      False      False      3              3                   3                     0                      72m
worker   rendered-worker-0b5f5063c2b34c7b307de02d3cb44eaa   True      False      False      3              3                   3                     0                      72m
$ oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-148-96.us-west-2.compute.internal    Ready    worker   65m   v1.21.1+0228142
ip-10-0-154-86.us-west-2.compute.internal    Ready    master   72m   v1.21.1+0228142
ip-10-0-164-96.us-west-2.compute.internal    Ready    master   73m   v1.21.1+0228142
ip-10-0-178-111.us-west-2.compute.internal   Ready    worker   64m   v1.21.1+0228142
ip-10-0-209-56.us-west-2.compute.internal    Ready    master   72m   v1.21.1+0228142
ip-10-0-215-192.us-west-2.compute.internal   Ready    worker   65m   v1.21.1+0228142
$ oc debug node/ip-10-0-148-96.us-west-2.compute.internal
Starting pod/ip-10-0-148-96us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# systemctl is-enabled kdump
enabled
sh-4.4# systemctl status kdump
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kdump.service.d
           └─remount-boot.conf
   Active: active (exited) since Wed 2021-07-07 15:11:52 UTC; 7min ago
  Process: 1348 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
  Process: 1342 ExecStartPre=/usr/bin/mount -o remount,rw /boot (code=exited, status=0/SUCCESS)
 Main PID: 1348 (code=exited, status=0/SUCCESS)
      CPU: 1min 3.582s

Jul 07 15:10:57 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: Stored kernel commandline:
Jul 07 15:10:57 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: No dracut internal kernel commandline stored in the initramfs
Jul 07 15:10:57 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Install squash loader ***
Jul 07 15:10:59 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Squashing the files inside the initramfs ***
Jul 07 15:11:46 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Squashing the files inside the initramfs done ***
Jul 07 15:11:46 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Creating image file '/boot/ostree/rhcos-4a92af59dcf52f5d7c3a58e>
Jul 07 15:11:50 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Creating initramfs image file '/boot/ostree/rhcos-4a92af59dcf52>
Jul 07 15:11:52 ip-10-0-148-96.us-west-2.compute.internal kdumpctl[1348]: kdump: kexec: loaded kdump kernel
Jul 07 15:11:52 ip-10-0-148-96.us-west-2.compute.internal kdumpctl[1348]: kdump: Starting kdump: [OK]
Jul 07 15:11:52 ip-10-0-148-96.us-west-2.compute.internal systemd[1]: Started Crash recovery kernel arming.
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:19ca82ee51a648e95a6cc2c2fd99260d28a89a9be8467d2575ebcc1491c4383f
              CustomOrigin: Managed by machine-config-operator
                   Version: 49.84.202107051924-0 (2021-07-05T19:28:02Z)

  pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:19ca82ee51a648e95a6cc2c2fd99260d28a89a9be8467d2575ebcc1491c4383f
              CustomOrigin: Managed by machine-config-operator
                   Version: 49.84.202107051924-0 (2021-07-05T19:28:02Z)
sh-4.4# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-4a92af59dcf52f5d7c3a58e9722f057bf98a16b7625d925326e9af2fdd4060b3/vmlinuz-4.18.0-305.7.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.1/rhcos/4a92af59dcf52f5d7c3a58e9722f057bf98a16b7625d925326e9af2fdd4060b3/0 ignition.platform.id=aws root=UUID=daa7e97d-faaf-46e9-995e-1e41f03bd55f rw rootflags=prjquota crashkernel=256M exit

Comment 12 errata-xmlrpc 2021-10-18 17:34:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.