466436 – [5.3][RFE] Makedumpfile Error Messages

Bug 466436 - [5.3][RFE] Makedumpfile Error Messages

Summary: [5.3][RFE] Makedumpfile Error Messages

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kexec-tools
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-10-10 09:48 UTC by Qian Cai
Modified:	2009-09-02 09:13 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	600585 (view as bug list)
Environment:
Last Closed:	2009-09-02 09:13:36 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
patch to allow users to specify makdumpfile level (1.50 KB, patch) 2009-07-02 10:42 UTC, Neil Horman	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:1258	0	normal	SHIPPED_LIVE	kexec-tools bug fix and enhancement update	2009-09-01 09:09:40 UTC

Description Qian Cai 2008-10-10 09:48:30 UTC

Description of problem:
When makedumpfle failed on a vmcore, we need to know why. Otherwise, it is a pain to debug the failure when looking back from the serial console log. For example,

EXT3-fs: mounted filesystem with ordered data mode.
[  0 %][ 14 %][ 23 %][ 33 %][ 42 %][ 51 %][ 55 %][ 57 %][ 59 %][ 69 %]dropping to initramfs shell
exiting this shell will reboot your system
root:/>

In fact, makedumpfile failed because of there was not enough disk space.


Version-Release number of selected component (if applicable):
kexec-tools-1.102pre-46.el5

How reproducible:
always

Steps to Reproduce:
1. configure Kdump with the following options,

  ext3 <small partition without enough disk space to save vmcore>
  core_collector makedumpfile -E

2. SysRq-C
3. check the serial console log to see if there is the reason of failure.
  
Actual results:
No error message.

Expected results:
Some error messages. For example,

[  0 %]write_buffer: Can't write the dump file(vmcore). Success

makedumpfile Failed.

Comment 1 Neil Horman 2008-10-10 14:05:14 UTC

Cai, can you please elaborate a bit on what you're looking for?  It seems like in the above case, setting the default_action to shell would allow you to recreate and debug the issue (by manually re-running the makedumpfile command with a higher log level.  Is there something more you're looking for?

Comment 2 Qian Cai 2008-10-22 12:35:28 UTC

Yes, that makes sense. I'll close this. Thanks.

Comment 3 Qian Cai 2009-07-02 06:19:43 UTC

I am afraid I'll need re-open it. It is such a pain to debug makedumpfile failures afterwards. For example, the configuration file contains,

core_collector makedumpfile --dump-dmesg /proc/vmcore /tmp/dmesg

From the serial console logs I can only see,

...
Saving to the local filesystem /dev/mapper/VolGroup00-LogVol00
e2fsck 1.38 (30-Jun-2005)
/dev/mapper/VolGroup00-LogVol00: recovering journal
/dev/mapper/VolGroup00-LogVol00: clean, 86240/16204320 files, 1176172/16203776 blocks
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
mv: unable to rename `/mnt//var/crasmd: stopping all md devices.
h/127.0.0.1-2009-07-01-12:17:53/vmcore-incomplete': No such file or directory
[0JSaving core complete
megaraid: flushing adapter 0...<6>usb 3-1: new full speed USB device using uhci_hcd and address 2
usb 3-1: not running at top speed; connect to a high speed hub
usb 3-1: configuration #1 chosen from 1 choice
hub 3-1:1.0: USB hub found
hub 3-1:1.0: 2 ports detected
done
Restarting system.

I have no idea if the above makedumpfile command fail or not?

Note, the "mv: unable to rename" and "No such file or directory" are expected, since I only want to capture dmesg in this case.

I think there is a valid reason not to use "default_action shell" in many situations. Users might setup kdump to enter INIT to capture the VMCore in the second attempt when makedumpfile failed etc.

I think it is a trivial fix with a big saver for debugging makedumpfile failures. The only downside I can think of is that more bug reports might come in since all the error and warning messages are opening to the users!

Comment 4 Qian Cai 2009-07-02 07:46:56 UTC

I have to manually workaround this problem by changing --message-level to 15 for the following lines,

        core_collector)
            if [ -x /sbin/makedumpfile ]; then
                CORE_COLLECTOR=$config_val
                if [ -e $SYS_VMCOREINFO ]
                then
                    grep -q control_d /proc/xen/capabilities 2>/dev/null
                    if [ $? -eq 0 ]
                    then
                        CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile -X --message-level 1/'`
                    else
                        CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile --message-level 1/'`
                    fi
                else
                    grep -q control_d /proc/xen/capabilities 2>/dev/null
                    if [ $? -eq 0 ]
                    then
                        CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile --xen-vmcoreinfo \/etc\/makedumpfile.config --message-level 1/'`
                    else
                        CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile -i \/etc\/makedumpfile.config --message-level 1/'`
                    fi
                fi
            else

Now, it is all clear what is wrong there,

...
open_dump_memory: Can't open the dump memory((null)). Bad address

makedumpfile Completed.
...

Comment 5 Qian Cai 2009-07-02 08:25:02 UTC

It does not mean that we always need --message-level 15 here, which is quite verbose. I think to efficiently ease debugging pain, both command and error messages are needed, so 7 sounds like a good combination.

      Message | progress    common    error     debug
      Level   | indicator   message   message   message
     ---------+-----------------------------------------
            0 |
            1 |     X
            2 |                X
            4 |                          X
          * 7 |     X          X         X
            8 |                                    X
           15 |     X          X         X         X

Comment 6 Neil Horman 2009-07-02 10:40:24 UTC

I'm a bit hesitant to do this as makedumpfile gets pretty verbose pretty quickly, and the extra messages will destroy the progress counter that we added.  Also there was a bug a few years ago now that explicity requested that makedumpfile be silent, although I never really agreed with that too much.  Maybe what I can do is is not specify message level at all in mkdumprd, and just let the user set it in kdump.conf.  Then we can change the example core_collector configuration to specify message-level 1 by default.

Comment 7 Neil Horman 2009-07-02 10:42:55 UTC

Created attachment 350259 [details]
patch to allow users to specify makdumpfile level

Cai, could you please give this patch a try.  It should allow you to specify --message-level in the core_collector line in /etc/kdump.conf.  Thanks!

Comment 8 Qian Cai 2009-07-02 11:21:44 UTC

Thanks Neil, I agree with your proposal, and I have also tested the patch on an ia64 machine without seen any problem. Once the patch has been integrated into packages, I will do more testing for it.

Comment 13 errata-xmlrpc 2009-09-02 09:13:36 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1258.html

Note You need to log in before you can comment on or make changes to this bug.