Bug 1971738
| Summary: | Keep /boot RW when kdump is enabled | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Timothée Ravier <travier> | |
| Component: | RHCOS | Assignee: | Timothée Ravier <travier> | |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | |
| Severity: | low | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.9 | CC: | dornelas, jligon, mrussell, nstielau | |
| Target Milestone: | --- | |||
| Target Release: | 4.9.0 | |||
| Hardware: | All | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1971739 (view as bug list) | Environment: | ||
| Last Closed: | 2021-10-18 17:34:12 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1971739 | |||
|
Description
Timothée Ravier
2021-06-14 16:56:59 UTC
Moving back to POST has this needs the second PR. @travier I wasn't able to get this to work. I think this requires a boot image bump to enable kdump day-1.
-- Logs begin at Wed 2021-06-30 12:51:02 UTC, end at Wed 2021-06-30 14:18:20 UTC. --
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: Starting Crash recovery kernel arming...
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal kdumpctl[312655]: kdump: No kdump initial ramdisk found.
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal kdumpctl[312655]: kdump: Rebuilding /boot/ostree/rhcos-804feafb00025bb841960dcf2cce555bb8988>
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal kdumpctl[312655]: kdump: /boot/ostree/rhcos-804feafb00025bb841960dcf2cce555bb8988d5b4bda50cc>
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal kdumpctl[312655]: kdump: Starting kdump: [FAILED]
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: kdump.service: Main process exited, code=exited, status=1/FAILURE
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: kdump.service: Failed with result 'exit-code'.
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: Failed to start Crash recovery kernel arming.
Jun 30 14:18:20 ip-10-0-131-173.us-west-2.compute.internal systemd[1]: kdump.service: Consumed 82ms CPU time
sh-4.4# sudo rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:eb121c97e974d7372eba219506205f364ab8f7809de74a3ef4c0157c4372938c
CustomOrigin: Managed by machine-config-operator
Version: 48.84.202106162024-0 (2021-06-16T20:28:08Z)
ostree://92ede04b462bc884de5562062fb45e06d803754cbaa466e3a2d34b4ee5e9634b
Version: 48.84.202105190318-0 (2021-05-19T03:22:10Z)
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-06-30-034414 True False 66m Cluster version is 4.9.0-0.nightly-2021-06-30-034414
This was merged in master on Jun 23, 2021 so 48.84.202106162024-0 will not have it. You can check if the file is included with: `systemctl cat kdump.service` This should not require a boot image bump AFAIK. Confirmed, this does not need a boot image bump but https://github.com/openshift/os/pull/561 needs to be in the boot image in order for this to work. A boot image bump is only needed for https://github.com/openshift/os/pull/561 if kdump is setup on day-1 via Ignition or a MC, not for manual setup. Thanks for the clarification. I was able to verify this fix on enabling kdump day-2.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-07-07-021823 True False 35m Cluster version is 4.9.0-0.nightly-2021-07-07-021823
$ cat 99-kdump-worker.yaml
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 99-worker-kdump
spec:
kernelArguments:
- 'crashkernel=256M'
config:
ignition:
version: 3.2.0
systemd:
units:
- enabled: true
name: kdump.service
$ oc create -f 99-kdump-worker.yaml
machineconfig.machineconfiguration.openshift.io/99-worker-kdump created
$ oc get mc
NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
00-master 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
00-worker 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
01-master-container-runtime 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
01-master-kubelet 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
01-worker-container-runtime 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
01-worker-kubelet 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
99-master-generated-registries 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
99-master-ssh 3.2.0 63m
99-worker-generated-registries 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
99-worker-kdump 3.2.0 9s
99-worker-ssh 3.2.0 63m
rendered-master-38ccce6954aff7d3ae792692e8962bb1 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
rendered-worker-0b5f5063c2b34c7b307de02d3cb44eaa 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 4s
rendered-worker-da71f2941166daf4ff90249dc4d88a52 95dcdb123f1a5fa8887a1fb66a044c9ad74191ce 3.2.0 60m
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-38ccce6954aff7d3ae792692e8962bb1 True False False 3 3 3 0 62m
worker rendered-worker-da71f2941166daf4ff90249dc4d88a52 False True False 3 0 0 0 62m
$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-38ccce6954aff7d3ae792692e8962bb1 True False False 3 3 3 0 72m
worker rendered-worker-0b5f5063c2b34c7b307de02d3cb44eaa True False False 3 3 3 0 72m
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-148-96.us-west-2.compute.internal Ready worker 65m v1.21.1+0228142
ip-10-0-154-86.us-west-2.compute.internal Ready master 72m v1.21.1+0228142
ip-10-0-164-96.us-west-2.compute.internal Ready master 73m v1.21.1+0228142
ip-10-0-178-111.us-west-2.compute.internal Ready worker 64m v1.21.1+0228142
ip-10-0-209-56.us-west-2.compute.internal Ready master 72m v1.21.1+0228142
ip-10-0-215-192.us-west-2.compute.internal Ready worker 65m v1.21.1+0228142
$ oc debug node/ip-10-0-148-96.us-west-2.compute.internal
Starting pod/ip-10-0-148-96us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# systemctl is-enabled kdump
enabled
sh-4.4# systemctl status kdump
● kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kdump.service.d
└─remount-boot.conf
Active: active (exited) since Wed 2021-07-07 15:11:52 UTC; 7min ago
Process: 1348 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
Process: 1342 ExecStartPre=/usr/bin/mount -o remount,rw /boot (code=exited, status=0/SUCCESS)
Main PID: 1348 (code=exited, status=0/SUCCESS)
CPU: 1min 3.582s
Jul 07 15:10:57 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: Stored kernel commandline:
Jul 07 15:10:57 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: No dracut internal kernel commandline stored in the initramfs
Jul 07 15:10:57 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Install squash loader ***
Jul 07 15:10:59 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Squashing the files inside the initramfs ***
Jul 07 15:11:46 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Squashing the files inside the initramfs done ***
Jul 07 15:11:46 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Creating image file '/boot/ostree/rhcos-4a92af59dcf52f5d7c3a58e>
Jul 07 15:11:50 ip-10-0-148-96.us-west-2.compute.internal dracut[1643]: *** Creating initramfs image file '/boot/ostree/rhcos-4a92af59dcf52>
Jul 07 15:11:52 ip-10-0-148-96.us-west-2.compute.internal kdumpctl[1348]: kdump: kexec: loaded kdump kernel
Jul 07 15:11:52 ip-10-0-148-96.us-west-2.compute.internal kdumpctl[1348]: kdump: Starting kdump: [OK]
Jul 07 15:11:52 ip-10-0-148-96.us-west-2.compute.internal systemd[1]: Started Crash recovery kernel arming.
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:19ca82ee51a648e95a6cc2c2fd99260d28a89a9be8467d2575ebcc1491c4383f
CustomOrigin: Managed by machine-config-operator
Version: 49.84.202107051924-0 (2021-07-05T19:28:02Z)
pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:19ca82ee51a648e95a6cc2c2fd99260d28a89a9be8467d2575ebcc1491c4383f
CustomOrigin: Managed by machine-config-operator
Version: 49.84.202107051924-0 (2021-07-05T19:28:02Z)
sh-4.4# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-4a92af59dcf52f5d7c3a58e9722f057bf98a16b7625d925326e9af2fdd4060b3/vmlinuz-4.18.0-305.7.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.1/rhcos/4a92af59dcf52f5d7c3a58e9722f057bf98a16b7625d925326e9af2fdd4060b3/0 ignition.platform.id=aws root=UUID=daa7e97d-faaf-46e9-995e-1e41f03bd55f rw rootflags=prjquota crashkernel=256M exit
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |