Bug 466436 - [5.3][RFE] Makedumpfile Error Messages
[5.3][RFE] Makedumpfile Error Messages
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kexec-tools (Show other bugs)
5.2
All Linux
low Severity low
: rc
: ---
Assigned To: Neil Horman
Red Hat Kernel QE team
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-10-10 05:48 EDT by CAI Qian
Modified: 2009-09-02 05:13 EDT (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 600585 (view as bug list)
Environment:
Last Closed: 2009-09-02 05:13:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to allow users to specify makdumpfile level (1.50 KB, patch)
2009-07-02 06:42 EDT, Neil Horman
no flags Details | Diff

  None (edit)
Description CAI Qian 2008-10-10 05:48:30 EDT
Description of problem:
When makedumpfle failed on a vmcore, we need to know why. Otherwise, it is a pain to debug the failure when looking back from the serial console log. For example,

EXT3-fs: mounted filesystem with ordered data mode.
[  0 %][ 14 %][ 23 %][ 33 %][ 42 %][ 51 %][ 55 %][ 57 %][ 59 %][ 69 %]dropping to initramfs shell
exiting this shell will reboot your system
root:/>

In fact, makedumpfile failed because of there was not enough disk space.


Version-Release number of selected component (if applicable):
kexec-tools-1.102pre-46.el5

How reproducible:
always

Steps to Reproduce:
1. configure Kdump with the following options,

  ext3 <small partition without enough disk space to save vmcore>
  core_collector makedumpfile -E

2. SysRq-C
3. check the serial console log to see if there is the reason of failure.
  
Actual results:
No error message.

Expected results:
Some error messages. For example,

[  0 %]write_buffer: Can't write the dump file(vmcore). Success

makedumpfile Failed.
Comment 1 Neil Horman 2008-10-10 10:05:14 EDT
Cai, can you please elaborate a bit on what you're looking for?  It seems like in the above case, setting the default_action to shell would allow you to recreate and debug the issue (by manually re-running the makedumpfile command with a higher log level.  Is there something more you're looking for?
Comment 2 CAI Qian 2008-10-22 08:35:28 EDT
Yes, that makes sense. I'll close this. Thanks.
Comment 3 CAI Qian 2009-07-02 02:19:43 EDT
I am afraid I'll need re-open it. It is such a pain to debug makedumpfile failures afterwards. For example, the configuration file contains,

core_collector makedumpfile --dump-dmesg /proc/vmcore /tmp/dmesg

From the serial console logs I can only see,

...
Saving to the local filesystem /dev/mapper/VolGroup00-LogVol00
e2fsck 1.38 (30-Jun-2005)
/dev/mapper/VolGroup00-LogVol00: recovering journal
/dev/mapper/VolGroup00-LogVol00: clean, 86240/16204320 files, 1176172/16203776 blocks
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
mv: unable to rename `/mnt//var/crasmd: stopping all md devices.
h/127.0.0.1-2009-07-01-12:17:53/vmcore-incomplete': No such file or directory
[0JSaving core complete
megaraid: flushing adapter 0...<6>usb 3-1: new full speed USB device using uhci_hcd and address 2
usb 3-1: not running at top speed; connect to a high speed hub
usb 3-1: configuration #1 chosen from 1 choice
hub 3-1:1.0: USB hub found
hub 3-1:1.0: 2 ports detected
done
Restarting system.

I have no idea if the above makedumpfile command fail or not?

Note, the "mv: unable to rename" and "No such file or directory" are expected, since I only want to capture dmesg in this case.

I think there is a valid reason not to use "default_action shell" in many situations. Users might setup kdump to enter INIT to capture the VMCore in the second attempt when makedumpfile failed etc.

I think it is a trivial fix with a big saver for debugging makedumpfile failures. The only downside I can think of is that more bug reports might come in since all the error and warning messages are opening to the users!
Comment 4 CAI Qian 2009-07-02 03:46:56 EDT
I have to manually workaround this problem by changing --message-level to 15 for the following lines,

        core_collector)
            if [ -x /sbin/makedumpfile ]; then
                CORE_COLLECTOR=$config_val
                if [ -e $SYS_VMCOREINFO ]
                then
                    grep -q control_d /proc/xen/capabilities 2>/dev/null
                    if [ $? -eq 0 ]
                    then
                        CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile -X --message-level 1/'`
                    else
                        CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile --message-level 1/'`
                    fi
                else
                    grep -q control_d /proc/xen/capabilities 2>/dev/null
                    if [ $? -eq 0 ]
                    then
                        CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile --xen-vmcoreinfo \/etc\/makedumpfile.config --message-level 1/'`
                    else
                        CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile -i \/etc\/makedumpfile.config --message-level 1/'`
                    fi
                fi
            else

Now, it is all clear what is wrong there,

...
open_dump_memory: Can't open the dump memory((null)). Bad address

makedumpfile Completed.
...
Comment 5 CAI Qian 2009-07-02 04:25:02 EDT
It does not mean that we always need --message-level 15 here, which is quite verbose. I think to efficiently ease debugging pain, both command and error messages are needed, so 7 sounds like a good combination.

      Message | progress    common    error     debug
      Level   | indicator   message   message   message
     ---------+-----------------------------------------
            0 |
            1 |     X
            2 |                X
            4 |                          X
          * 7 |     X          X         X
            8 |                                    X
           15 |     X          X         X         X
Comment 6 Neil Horman 2009-07-02 06:40:24 EDT
I'm a bit hesitant to do this as makedumpfile gets pretty verbose pretty quickly, and the extra messages will destroy the progress counter that we added.  Also there was a bug a few years ago now that explicity requested that makedumpfile be silent, although I never really agreed with that too much.  Maybe what I can do is is not specify message level at all in mkdumprd, and just let the user set it in kdump.conf.  Then we can change the example core_collector configuration to specify message-level 1 by default.
Comment 7 Neil Horman 2009-07-02 06:42:55 EDT
Created attachment 350259 [details]
patch to allow users to specify makdumpfile level

Cai, could you please give this patch a try.  It should allow you to specify --message-level in the core_collector line in /etc/kdump.conf.  Thanks!
Comment 8 CAI Qian 2009-07-02 07:21:44 EDT
Thanks Neil, I agree with your proposal, and I have also tested the patch on an ia64 machine without seen any problem. Once the patch has been integrated into packages, I will do more testing for it.
Comment 13 errata-xmlrpc 2009-09-02 05:13:36 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1258.html

Note You need to log in before you can comment on or make changes to this bug.