Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2102386

Summary: SNO kdump-capture.service oom
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Telco EdgeAssignee: Jim Ramsay <jramsay>
Telco Edge sub component: ZTP QA Contact: yliu1
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: djuran, jramsay, keyoung
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-23 18:24:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2102134    
Bug Blocks:    

Description OpenShift BugZilla Robot 2022-06-29 20:13:08 UTC
+++ This bug was initially created as a clone of Bug #2102134 +++

+++ This bug was initially created as a clone of Bug #2101356 +++

Description of problem:
SNO 4.10.17 ZTP provisioned.
kdump service is enabled by default with 256M memory, same as in docs [1] after kernel crash core is not generated, from console logs it can be seen kdump-capture.service is oom 

Version-Release number of selected component (if applicable):
4.10.15
4.10.17

How reproducible:


Steps to Reproduce:
1. ZTP SNO installation
2. kdump enabled by default
3. trigger a manual kernel crash with `echo c > /proc/sysrq-trigger` 

Actual results:


Expected results:
kdump configuration or documentation corrected.


Additional info:
[1] https://docs.openshift.com/container-platform/4.10/support/troubleshooting/troubleshooting-operating-system-issues.html#enabling-kdump

--- Additional comment from djuran on 2022-06-27 11:10:00 UTC ---

It looks to me the crashkernel memory is hardcoded in the template[1]
Maybe we could use the "auto" parameter[2] instead?

[1]
https://github.com/openshift-kni/cnf-features-deploy/blob/7be94f9bff2601d219444c082c88e503dc609dcc/ztp/source-crs/extra-manifest/06-kdump-master.yaml#L18

[2]
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel#configuring-kdump-memory-usage_configuring-kdump-on-the-command-line

Comment 1 Jim Ramsay 2022-06-29 20:18:13 UTC
*** Bug 2102328 has been marked as a duplicate of this bug. ***

Comment 3 yliu1 2022-08-05 20:22:02 UTC
Verified with latest 4.10 ztp.

[core@sno ~]$ cat /proc/cmdline 
BOOT_IMAGE=(hd5,gpt3)/ostree/rhcos-0828d150eaed69a98e648cc7cbf4c258eda44b78b2c2a58773a0b8042812d011/vmlinuz-4.18.0-305.57.1.rt7.129.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/0828d150eaed69a98e648cc7cbf4c258eda44b78b2c2a58773a0b8042812d011/0 ip=ens1f0:dhcp6 root=UUID=ae957a9a-bd17-429e-8b72-42056415e1e9 rw rootflags=prjquota boot=UUID=3b00cd8b-68cf-45fd-b012-a1a02aacc0e8 skew_tick=1 nohz=on rcu_nocbs=2-31,34-63 tuned.non_isolcpus=00000003,00000003 intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,2-31,34-63 systemd.cpu_affinity=0,1,32,33 default_hugepagesz=1G hugepagesz=1G hugepages=32 idle=poll rcupdate.rcu_normal_after_boot=0 nohz_full=2-31,34-63 crashkernel=512M

[root@sno core]# echo c > /proc/sysrq-trigger

[core@sno ~]$ ls -la /var/crash/127.0.0.1-2022-08-05-20\:12\:58/
total 1259492
drwxr-xr-x. 2 root root         67 Aug  5 20:13 .
drwxr-xr-x. 3 root root         43 Aug  5 20:12 ..
-rw-r--r--. 1 root root     148698 Aug  5 20:13 kexec-dmesg.log
-rw-------. 1 root root 1288671930 Aug  5 20:13 vmcore
-rw-r--r--. 1 root root     891664 Aug  5 20:12 vmcore-dmesg.txt

Comment 8 errata-xmlrpc 2022-08-23 18:24:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.28 extras update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6096