Bug 220906 - kdump init script fails when the crash kernel's commandline string is too long without a good error message
kdump init script fails when the crash kernel's commandline string is too lon...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kexec-tools (Show other bugs)
5.0
All Linux
medium Severity low
: ---
: ---
Assigned To: Neil Horman
Red Hat Kernel QE team
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-28 13:16 EST by Amul Shah
Modified: 2009-09-09 01:14 EDT (History)
4 users (show)

See Also:
Fixed In Version: 5.0.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-02-13 12:05:53 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
This would be an example update to the sysconfig kdump config file (392 bytes, patch)
2006-12-28 16:43 EST, Amul Shah
no flags Details | Diff
update the init script for kdump to handle the added configuration variable (1.93 KB, patch)
2006-12-28 16:44 EST, Amul Shah
no flags Details | Diff
/etc/sysconfig/kdump (1.32 KB, text/plain)
2007-01-23 13:43 EST, Amul Shah
no flags Details
patch to log output of kexec on error (916 bytes, patch)
2007-01-23 14:47 EST, Neil Horman
no flags Details | Diff

  None (edit)
Description Amul Shah 2006-12-28 13:16:25 EST
Description of problem:
The /etc/init.d/kdump script will silently fail if /sbin/kexec cannot load the
crash kernel because the command line string passed to crash kernel is too long.
 For example, the crash kernel string 
"ro root=/dev/VolGroup00/LogVol00  console=tty0 console=ttyS0,115200n8 irqpoll
maxcpus=1 lpj=2999715 earlyprintk=serial,ttyS0,115200n8 memmap=exactmap
memmap=640K@0K memmap=5452K@16384K memmap=518180K@22476K elfcorehdr=540656K
memmap=412K#915904K memmap=164K#916316K"

is too long, but the following is not

"ro root=/dev/VolGroup00/LogVol00  console=tty0 console=ttyS0,115200 irqpoll
maxcpus=1 lpj=2999715 earlyprintk=ttyS0,115200 memmap=exactmap memmap=640K@0K
memmap=5452K@16384K memmap=518180K@22476K elfcorehdr=540656K memmap=412K#915904K
memmap=164K#916316K"

It took executing the kexec command by hand without the "2> /dev/null" in the
script to see what the real problem is.

Version-Release number of selected component (if applicable):
RHEL5 RC5

How reproducible:


Steps to Reproduce:
1. You need a system like and ES7000 that will have a long string for the exact
memmap provided to the crash kernel.
2. Add the line "console=tty0 console=ttyS0,115200n8 selinux=0
earlyprintk=serial,ttyS0,115200n8" to /etc/sysconfig/kdump (or the GRUB boot
line which is what I was doing)
3. If you have an ES7000, you will need to use the parameter lpj=SOME_VALUE
where you can get SOME_LVALUE from "dmesg | grep lpj | head -n1"
4. run /etc/init.d/kdump

Actual results:
/etc/init.d/kdump fails to load kdump and all I see in /var/log/messages is that
it failed to load the crash kernel.

Expected results:
Ideally, I would like to know why kexec choked.  I can modify the script myself
to find the answer, but having a debug option in /etc/sysconfig/kdump that I can
tell a customer to change is preferable.

Additional info:
Comment 1 Amul Shah 2006-12-28 16:43:36 EST
Created attachment 144491 [details]
This would be an example update to the sysconfig kdump config file
Comment 2 Amul Shah 2006-12-28 16:44:30 EST
Created attachment 144492 [details]
update the init script for kdump to handle the added configuration variable
Comment 3 M. Mohan Kumar 2007-01-23 12:09:09 EST
I tested passing such a long parameter list to kdump, but kdump throws an error
saying "Command line overflow" and exits. The kexec-tools level is
kexec-tools-1.101-112.el5.

So can this bug be closed?
Comment 4 Neil Horman 2007-01-23 13:00:09 EST
yep
Comment 5 Amul Shah 2007-01-23 13:25:28 EST
This bug still exists with kexec-tools-1.101-163.el5, so no, you may not close it.

Please re-read the bug.  To make our lives simpler and to keep us on the same
page, I am attaching my /etc/sysconfig/kdump file for you to use.  You may only
test this feature with /etc/init.d/kdump.  You need to look at /var/log/messages
to see the error messages for the kdump script failure, not the command line. 
DO NOT execute /sbin/kexec by hand.

The kexec tools work flawlessly.
Comment 6 Amul Shah 2007-01-23 13:33:41 EST
[continued] The kexec tools work flawlessly.  This problem pokes a hole in an
otherwise excellent integration of the kdump feature.

In the series of commands below, no where do I see a report for why the kdump
kernel failed to load.  The customer needs to see why the failure occured.

[root@localhost ~]# grep APPEND /etc/sysconfig/kdump
KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1 lpj=3001000i console=tty0
console=ttyS0,115200n8 earlyprintk=serial,ttyS0,115200n8,keep debug acpi=debug"

[root@localhost ~]# /etc/init.d/kdump start
Starting kdump:                                            [FAILED]

[root@localhost ~]# tail -n 2 /var/log/messages
Jan 23 14:24:14 localhost kdump: kexec: failed to load kdump kernel
Jan 23 14:24:14 localhost kdump: failed to start up
Comment 7 Amul Shah 2007-01-23 13:43:10 EST
Created attachment 146330 [details]
/etc/sysconfig/kdump

Please use this file for your /etc/sysconfig/kdump.  The parameters are
arbitrary and there just to make sure that your system's command line will
overflow.  The ES7000's long exactmap helps me see this problem easier.
Comment 8 M. Mohan Kumar 2007-01-23 14:04:32 EST
Okay, When I run 'init 3' to load kdump kernel using the init scripts with a
very long parameter list, I get

Starting portmap: [  OK  ]
Starting kdump:  Command line overflow
Starting kdump:[FAILED]
[FAILED]

Then I tried kexec-tools-1.101-163 on a POWER machine with long parameter list.
kdump init script does not say the reason for kdump load failure and I need to
check the /var/log/message.

So the problem still exists in 163 level of kexec-tools.
Comment 9 Amul Shah 2007-01-23 14:19:49 EST
Mohan, mind sharing your /etc/init.d/kdump or at least doing a diff of it to see
what went wrong between now and then?

If the scripts are the same, then maybe /sbin/kexec was printing the error
message on STDOUT and not STDERR. In the original bug report, I noted that the
script gets rid of any /sbin/kexec error messages by redirect STDERR to
/dev/null ("2> /dev/null").
Comment 10 Neil Horman 2007-01-23 14:44:05 EST
Ok, it does appear the addition of the /dev/null redirection on kexec regressed
this.  I don'tthink we need a whole separate log facility to catch this though.
 The following patch should work just fine.  Please test and confirm
Comment 11 Neil Horman 2007-01-23 14:47:07 EST
Created attachment 146345 [details]
patch to log output of kexec on error
Comment 12 Amul Shah 2007-01-23 15:49:38 EST
Neil, you're right we don't need the added logging complexity. Thanks for the
simpler fix. The change works, I can see the error in /var/log/messages.

I assume this change will make it into 5.1 correct?
Comment 13 Jay Turner 2007-01-25 21:20:04 EST
Retargeting for 5.1.  Also throwing back into Assigned, as this patch hasn't
been incorporated into a package build.
Comment 14 Neil Horman 2007-01-26 14:24:51 EST
fixed in -164.el5.  Thanks!
Comment 15 Jay Turner 2007-02-13 12:05:53 EST
kexec-tools-1.101-164.el5 included in 20070208.0 trees.

Note You need to log in before you can comment on or make changes to this bug.