Description of problem: When makedumpfle failed on a vmcore, we need to know why. Otherwise, it is a pain to debug the failure when looking back from the serial console log. For example, EXT3-fs: mounted filesystem with ordered data mode. [ 0 %][ 14 %][ 23 %][ 33 %][ 42 %][ 51 %][ 55 %][ 57 %][ 59 %][ 69 %]dropping to initramfs shell exiting this shell will reboot your system root:/> In fact, makedumpfile failed because of there was not enough disk space. Version-Release number of selected component (if applicable): kexec-tools-1.102pre-46.el5 How reproducible: always Steps to Reproduce: 1. configure Kdump with the following options, ext3 <small partition without enough disk space to save vmcore> core_collector makedumpfile -E 2. SysRq-C 3. check the serial console log to see if there is the reason of failure. Actual results: No error message. Expected results: Some error messages. For example, [ 0 %]write_buffer: Can't write the dump file(vmcore). Success makedumpfile Failed.
Cai, can you please elaborate a bit on what you're looking for? It seems like in the above case, setting the default_action to shell would allow you to recreate and debug the issue (by manually re-running the makedumpfile command with a higher log level. Is there something more you're looking for?
Yes, that makes sense. I'll close this. Thanks.
I am afraid I'll need re-open it. It is such a pain to debug makedumpfile failures afterwards. For example, the configuration file contains, core_collector makedumpfile --dump-dmesg /proc/vmcore /tmp/dmesg From the serial console logs I can only see, ... Saving to the local filesystem /dev/mapper/VolGroup00-LogVol00 e2fsck 1.38 (30-Jun-2005) /dev/mapper/VolGroup00-LogVol00: recovering journal /dev/mapper/VolGroup00-LogVol00: clean, 86240/16204320 files, 1176172/16203776 blocks kjournald starting. Commit interval 5 seconds EXT3 FS on dm-0, internal journal EXT3-fs: mounted filesystem with ordered data mode. mv: unable to rename `/mnt//var/crasmd: stopping all md devices. h/127.0.0.1-2009-07-01-12:17:53/vmcore-incomplete': No such file or directory [0JSaving core complete megaraid: flushing adapter 0...<6>usb 3-1: new full speed USB device using uhci_hcd and address 2 usb 3-1: not running at top speed; connect to a high speed hub usb 3-1: configuration #1 chosen from 1 choice hub 3-1:1.0: USB hub found hub 3-1:1.0: 2 ports detected done Restarting system. I have no idea if the above makedumpfile command fail or not? Note, the "mv: unable to rename" and "No such file or directory" are expected, since I only want to capture dmesg in this case. I think there is a valid reason not to use "default_action shell" in many situations. Users might setup kdump to enter INIT to capture the VMCore in the second attempt when makedumpfile failed etc. I think it is a trivial fix with a big saver for debugging makedumpfile failures. The only downside I can think of is that more bug reports might come in since all the error and warning messages are opening to the users!
I have to manually workaround this problem by changing --message-level to 15 for the following lines, core_collector) if [ -x /sbin/makedumpfile ]; then CORE_COLLECTOR=$config_val if [ -e $SYS_VMCOREINFO ] then grep -q control_d /proc/xen/capabilities 2>/dev/null if [ $? -eq 0 ] then CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile -X --message-level 1/'` else CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile --message-level 1/'` fi else grep -q control_d /proc/xen/capabilities 2>/dev/null if [ $? -eq 0 ] then CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile --xen-vmcoreinfo \/etc\/makedumpfile.config --message-level 1/'` else CORE_COLLECTOR=`echo $CORE_COLLECTOR | sed -e's/makedumpfile/makedumpfile -i \/etc\/makedumpfile.config --message-level 1/'` fi fi else Now, it is all clear what is wrong there, ... open_dump_memory: Can't open the dump memory((null)). Bad address makedumpfile Completed. ...
It does not mean that we always need --message-level 15 here, which is quite verbose. I think to efficiently ease debugging pain, both command and error messages are needed, so 7 sounds like a good combination. Message | progress common error debug Level | indicator message message message ---------+----------------------------------------- 0 | 1 | X 2 | X 4 | X * 7 | X X X 8 | X 15 | X X X X
I'm a bit hesitant to do this as makedumpfile gets pretty verbose pretty quickly, and the extra messages will destroy the progress counter that we added. Also there was a bug a few years ago now that explicity requested that makedumpfile be silent, although I never really agreed with that too much. Maybe what I can do is is not specify message level at all in mkdumprd, and just let the user set it in kdump.conf. Then we can change the example core_collector configuration to specify message-level 1 by default.
Created attachment 350259 [details] patch to allow users to specify makdumpfile level Cai, could you please give this patch a try. It should allow you to specify --message-level in the core_collector line in /etc/kdump.conf. Thanks!
Thanks Neil, I agree with your proposal, and I have also tested the patch on an ia64 machine without seen any problem. Once the patch has been integrated into packages, I will do more testing for it.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1258.html