Bug 1510654

Summary: kdump.service fails to start when crashkernel=auto with "No memory reserved for crash kernel"
Product: Red Hat Enterprise Linux 7 Reporter: Afom T. Michael <tmichael>
Component: kexec-toolsAssignee: kdump team <kdump-team-bugs>
Status: CLOSED DUPLICATE QA Contact: Emma Wu <xiawu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.5-AltCC: fperalta, kernel-general-qe, ruyang, xiawu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-09 07:12:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Afom T. Michael 2017-11-07 22:41:16 UTC
Description of problem:
kdump service fails to start when set with "crashkernel=auto" running 4.14.0-0.rc7.1.el7a.ppc64le. 

Version-Release number of selected component (if applicable):
$ cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.5 Beta (Maipo)
$ uname -rm
4.14.0-0.rc7.2.el7a.x86_64 x86_64
$ rpm -q kexec-tools
kexec-tools-2.0.15-4.el7.x86_64
$

How reproducible:
Always

Steps to Reproduce:
1. Install latest such as RHEL-ALT-7.5-20171106.1
2. Install kexec-tools and add crashkernel=auto in grub (as described on https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/kernel_crash_dump_guide/sect-kdump-config-cli)
3. reboot & check kdump status (systemctl status kdump -l)

Actual results:
kdump service doesn't start with "...kdumpctl[2041]: No memory reserved for crash kernel..."

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          32228         363       31647           9         217       31450
Swap:         16127           0       16127
$ cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-4.14.0-0.rc7.2.el7a.x86_64 root=/dev/mapper/rhelaa_rdma--dev--05-root ro console=tty0 rd_NO_PLYMOUTH crashkernel=auto rd.lvm.lv=rhelaa_rdma-dev-05/root rd.lvm.lv=rhelaa_rdma-dev-05/swap console=ttyS1,115200
$ systemctl is-enabled kdump
enabled
$ systemctl is-active kdump
failed
$ systemctl status kdump -l
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2017-11-07 21:09:08 EST; 47min ago
  Process: 970 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
 Main PID: 970 (code=exited, status=1/FAILURE)

Nov 07 21:09:08 localhost.localdomain systemd[1]: Starting Crash recovery kernel arming...
Nov 07 21:09:08 localhost.localdomain kdumpctl[970]: No memory reserved for crash kernel
Nov 07 21:09:08 localhost.localdomain kdumpctl[970]: Starting kdump: [FAILED]
Nov 07 21:09:08 localhost.localdomain systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
Nov 07 21:09:08 localhost.localdomain systemd[1]: Failed to start Crash recovery kernel arming.
Nov 07 21:09:08 localhost.localdomain systemd[1]: Unit kdump.service entered failed state.
Nov 07 21:09:08 localhost.localdomain systemd[1]: kdump.service failed.
$ systemctl start kdump
Job for kdump.service failed because the control process exited with error code. See "systemctl status kdump.service" and "journalctl -xe" for details.
$ 

Expected results:
For kdump to start & run when set with crashkernel=auto.

Additional info:
Seen on x86_64 & ppc64le

Comment 3 Afom T. Michael 2017-11-07 22:46:09 UTC
The service starts/runs when crashkernel is set to a numeric value (memory size), Eg. 128M

Comment 5 Dave Young 2017-11-09 07:12:37 UTC

*** This bug has been marked as a duplicate of bug 1431982 ***

Comment 6 Francisco Peralta 2019-11-08 13:09:34 UTC
(In reply to Dave Young from comment #5)
> 
> *** This bug has been marked as a duplicate of bug 1431982 ***

Dear Dave and Team,
 I've a customer on RHEL7.7 that still sees this issue and we do not agree that this issue is a duplicate of the one you mentioned (which was fixed but is actually for having a fixed value of crashkernel).

 Would you agree to unmark it as duplicate and follow up on this one (if that's possible and makes sense, otherwise I can still open a new one on 7.7)

 My customer is ok to accept that the value =auto is going to disappear in RHEL8 but in RHEL7 it exists and is documented and whatsmore all their servers already have auto, (presumably) because that's what's configured automatically in a default installation. Changing this now to a large-enough hard-coded value is a mayor issue. 

 What is your opinion?

Thanks in advance,
 Cisco.

Comment 7 Emma Wu 2019-11-12 05:47:40 UTC
Dave may be a better one to answer this request.
Redirect needinfo to Dave Young...

Comment 8 Dave Young 2019-11-12 07:35:59 UTC
On the early phase of RHEL-alt, we have no crashkernel=auto code enabled in the 4.14 kernel, so this is expected, and later we ported the crashkernel=auto in RHEL-alt and also simplified it a lot, which is the bug 1431982.  So this should be good to be a duplicate.

About the similar issue customer has, it is probably not a same issue, the product is RHEL7.7 not RHEL-alt, and the kernel is also different, we use 3.10 kernel in RHEL7 instead.  I would suggest to collect the kernel log in case crashkernel=auto failed. Then we can see what it happened.

Thanks
Dave