Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2102134

Summary: SNO kdump-capture.service oom
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Telco EdgeAssignee: Jim Ramsay <jramsay>
Telco Edge sub component: ZTP QA Contact: yliu1
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: unspecified CC: djuran, imiller, jramsay, keyoung
Version: 4.10   
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-26 16:43:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2101356, 2102328    
Bug Blocks: 2102386    

Description OpenShift BugZilla Robot 2022-06-29 11:26:08 UTC
+++ This bug was initially created as a clone of Bug #2101356 +++

Description of problem:
SNO 4.10.17 ZTP provisioned.
kdump service is enabled by default with 256M memory, same as in docs [1] after kernel crash core is not generated, from console logs it can be seen kdump-capture.service is oom 

Version-Release number of selected component (if applicable):
4.10.15
4.10.17

How reproducible:


Steps to Reproduce:
1. ZTP SNO installation
2. kdump enabled by default
3. trigger a manual kernel crash with `echo c > /proc/sysrq-trigger` 

Actual results:


Expected results:
kdump configuration or documentation corrected.


Additional info:
[1] https://docs.openshift.com/container-platform/4.10/support/troubleshooting/troubleshooting-operating-system-issues.html#enabling-kdump

--- Additional comment from djuran on 2022-06-27 11:10:00 UTC ---

It looks to me the crashkernel memory is hardcoded in the template[1]
Maybe we could use the "auto" parameter[2] instead?

[1]
https://github.com/openshift-kni/cnf-features-deploy/blob/7be94f9bff2601d219444c082c88e503dc609dcc/ztp/source-crs/extra-manifest/06-kdump-master.yaml#L18

[2]
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kdump-on-the-command-line_managing-monitoring-and-updating-the-kernel#configuring-kdump-memory-usage_configuring-kdump-on-the-command-line

Comment 1 yliu1 2022-08-02 15:53:39 UTC
Verified with latest 4.11 ztp container image. 

[root@test-sno-1 core]# cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-7e7edbd8297c357c29e5cf8afc286cd0f1e2c199867bd90bc4dfa41c74b99ea5/vmlinuz-4.18.0-372.16.1.rt7.173.el8_6.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ignition.platform.id=metal ostree=/ostree/boot.1/rhcos/7e7edbd8297c357c29e5cf8afc286cd0f1e2c199867bd90bc4dfa41c74b99ea5/0 ip=eno1:dhcp root=UUID=fb7ae80f-fd01-4c50-a0bb-b560057e9d3b rw rootflags=prjquota boot=UUID=2c93d2c1-7d4a-4687-aa83-f9e7d676d8d6 intel_iommu=on iommu=pt skew_tick=1 nohz=on rcu_nocbs=2-39,42-79 tuned.non_isolcpus=00000300,00000003 systemd.cpu_affinity=0,1,40,41 intel_iommu=on iommu=pt isolcpus=managed_irq,2-39,42-79 nohz_full=2-39,42-79 tsc=nowatchdog nosoftlockup nmi_watchdog=0 mce=off skew_tick=1 default_hugepagesz=1G hugepagesz=1G hugepages=32 idle=poll rcupdate.rcu_normal_after_boot=0 efi=runtime nohz_full=2-39,42-79 crashkernel=512M

[root@test-sno-1 core]# echo c > /proc/sysrq-trigger

[root@test-sno-1 core]# ls -la /var/crash/127.0.0.1-2022-08-02-15\:47\:38/
total 2066724
drwxr-xr-x. 2 root root         67 Aug  2 15:48 .
drwxr-xr-x. 3 root root         43 Aug  2 15:47 ..
-rw-------. 1 root root     197643 Aug  2 15:48 kexec-dmesg.log
-rw-------. 1 root root 2115704516 Aug  2 15:48 vmcore
-rw-------. 1 root root     415023 Aug  2 15:47 vmcore-dmesg.txt