Bug 220906

Summary: kdump init script fails when the crash kernel's commandline string is too long without a good error message
Product: Red Hat Enterprise Linux 5 Reporter: Amul Shah <amul.shah>
Component: kexec-toolsAssignee: Neil Horman <nhorman>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: low Docs Contact:
Priority: medium    
Version: 5.0CC: jarod, mohan, qcai, vgoyal
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 5.0.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-13 17:05:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
This would be an example update to the sysconfig kdump config file
none
update the init script for kdump to handle the added configuration variable
none
/etc/sysconfig/kdump
none
patch to log output of kexec on error none

Description Amul Shah 2006-12-28 18:16:25 UTC
Description of problem:
The /etc/init.d/kdump script will silently fail if /sbin/kexec cannot load the
crash kernel because the command line string passed to crash kernel is too long.
 For example, the crash kernel string 
"ro root=/dev/VolGroup00/LogVol00  console=tty0 console=ttyS0,115200n8 irqpoll
maxcpus=1 lpj=2999715 earlyprintk=serial,ttyS0,115200n8 memmap=exactmap
memmap=640K@0K memmap=5452K@16384K memmap=518180K@22476K elfcorehdr=540656K
memmap=412K#915904K memmap=164K#916316K"

is too long, but the following is not

"ro root=/dev/VolGroup00/LogVol00  console=tty0 console=ttyS0,115200 irqpoll
maxcpus=1 lpj=2999715 earlyprintk=ttyS0,115200 memmap=exactmap memmap=640K@0K
memmap=5452K@16384K memmap=518180K@22476K elfcorehdr=540656K memmap=412K#915904K
memmap=164K#916316K"

It took executing the kexec command by hand without the "2> /dev/null" in the
script to see what the real problem is.

Version-Release number of selected component (if applicable):
RHEL5 RC5

How reproducible:


Steps to Reproduce:
1. You need a system like and ES7000 that will have a long string for the exact
memmap provided to the crash kernel.
2. Add the line "console=tty0 console=ttyS0,115200n8 selinux=0
earlyprintk=serial,ttyS0,115200n8" to /etc/sysconfig/kdump (or the GRUB boot
line which is what I was doing)
3. If you have an ES7000, you will need to use the parameter lpj=SOME_VALUE
where you can get SOME_LVALUE from "dmesg | grep lpj | head -n1"
4. run /etc/init.d/kdump

Actual results:
/etc/init.d/kdump fails to load kdump and all I see in /var/log/messages is that
it failed to load the crash kernel.

Expected results:
Ideally, I would like to know why kexec choked.  I can modify the script myself
to find the answer, but having a debug option in /etc/sysconfig/kdump that I can
tell a customer to change is preferable.

Additional info:

Comment 1 Amul Shah 2006-12-28 21:43:36 UTC
Created attachment 144491 [details]
This would be an example update to the sysconfig kdump config file

Comment 2 Amul Shah 2006-12-28 21:44:30 UTC
Created attachment 144492 [details]
update the init script for kdump to handle the added configuration variable

Comment 3 M. Mohan Kumar 2007-01-23 17:09:09 UTC
I tested passing such a long parameter list to kdump, but kdump throws an error
saying "Command line overflow" and exits. The kexec-tools level is
kexec-tools-1.101-112.el5.

So can this bug be closed?

Comment 4 Neil Horman 2007-01-23 18:00:09 UTC
yep

Comment 5 Amul Shah 2007-01-23 18:25:28 UTC
This bug still exists with kexec-tools-1.101-163.el5, so no, you may not close it.

Please re-read the bug.  To make our lives simpler and to keep us on the same
page, I am attaching my /etc/sysconfig/kdump file for you to use.  You may only
test this feature with /etc/init.d/kdump.  You need to look at /var/log/messages
to see the error messages for the kdump script failure, not the command line. 
DO NOT execute /sbin/kexec by hand.

The kexec tools work flawlessly.

Comment 6 Amul Shah 2007-01-23 18:33:41 UTC
[continued] The kexec tools work flawlessly.  This problem pokes a hole in an
otherwise excellent integration of the kdump feature.

In the series of commands below, no where do I see a report for why the kdump
kernel failed to load.  The customer needs to see why the failure occured.

[root@localhost ~]# grep APPEND /etc/sysconfig/kdump
KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 lpj=3001000i console=tty0
console=ttyS0,115200n8 earlyprintk=serial,ttyS0,115200n8,keep debug acpi=debug"

[root@localhost ~]# /etc/init.d/kdump start
Starting kdump:                                            [FAILED]

[root@localhost ~]# tail -n 2 /var/log/messages
Jan 23 14:24:14 localhost kdump: kexec: failed to load kdump kernel
Jan 23 14:24:14 localhost kdump: failed to start up


Comment 7 Amul Shah 2007-01-23 18:43:10 UTC
Created attachment 146330 [details]
/etc/sysconfig/kdump

Please use this file for your /etc/sysconfig/kdump.  The parameters are
arbitrary and there just to make sure that your system's command line will
overflow.  The ES7000's long exactmap helps me see this problem easier.

Comment 8 M. Mohan Kumar 2007-01-23 19:04:32 UTC
Okay, When I run 'init 3' to load kdump kernel using the init scripts with a
very long parameter list, I get

Starting portmap: [  OK  ]
Starting kdump:  Command line overflow
Starting kdump:[FAILED]
[FAILED]

Then I tried kexec-tools-1.101-163 on a POWER machine with long parameter list.
kdump init script does not say the reason for kdump load failure and I need to
check the /var/log/message.

So the problem still exists in 163 level of kexec-tools.

Comment 9 Amul Shah 2007-01-23 19:19:49 UTC
Mohan, mind sharing your /etc/init.d/kdump or at least doing a diff of it to see
what went wrong between now and then?

If the scripts are the same, then maybe /sbin/kexec was printing the error
message on STDOUT and not STDERR. In the original bug report, I noted that the
script gets rid of any /sbin/kexec error messages by redirect STDERR to
/dev/null ("2> /dev/null").

Comment 10 Neil Horman 2007-01-23 19:44:05 UTC
Ok, it does appear the addition of the /dev/null redirection on kexec regressed
this.  I don'tthink we need a whole separate log facility to catch this though.
 The following patch should work just fine.  Please test and confirm

Comment 11 Neil Horman 2007-01-23 19:47:07 UTC
Created attachment 146345 [details]
patch to log output of kexec on error

Comment 12 Amul Shah 2007-01-23 20:49:38 UTC
Neil, you're right we don't need the added logging complexity. Thanks for the
simpler fix. The change works, I can see the error in /var/log/messages.

I assume this change will make it into 5.1 correct?

Comment 13 Jay Turner 2007-01-26 02:20:04 UTC
Retargeting for 5.1.  Also throwing back into Assigned, as this patch hasn't
been incorporated into a package build.

Comment 14 Neil Horman 2007-01-26 19:24:51 UTC
fixed in -164.el5.  Thanks!

Comment 15 Jay Turner 2007-02-13 17:05:53 UTC
kexec-tools-1.101-164.el5 included in 20070208.0 trees.