Bug 605411 - check raw partition for core dump when starting kdump service
check raw partition for core dump when starting kdump service
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kexec-tools (Show other bugs)
6.1
All Linux
medium Severity medium
: rc
: ---
Assigned To: Cong Wang
Chao Ye
:
Depends On:
Blocks: 561978
  Show dependency treegraph
 
Reported: 2010-06-17 17:08 EDT by Dave Maley
Modified: 2013-09-29 22:17 EDT (History)
9 users (show)

See Also:
Fixed In Version: kexec-tools-2_0_0-155_el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-05-19 10:15:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
partner provided patch (869 bytes, patch)
2010-06-17 17:08 EDT, Dave Maley
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0736 normal SHIPPED_LIVE kexec-tools bug fix update 2011-05-18 14:09:18 EDT

  None (edit)
Description Dave Maley 2010-06-17 17:08:55 EDT
Created attachment 424944 [details]
partner provided patch

Description of problem:
When a "raw" partition is entered in /etc/kdump.conf, kdump does write a dump to the partition but the dump is not automatically recovered at the next reboot.

Judging from the comment in /etc/init.d/kdump:

function start()
{
       #TODO check raw partition for core dump image

this seems to be a missing feature actually.

However it makes dumping to a raw partition (the most robust setting IMO) unpractical for anybody except gurus who would be able to recover the dump manually (guessing the size of the dump would be the problem). I am not such a guru and don't feel like figuring out how the size is encoded in the makedumpfile header.


Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-72.el6


How reproducible:
always


Steps to Reproduce:
1. configure a "raw" device in /etc/kdump.conf
2. crash system
3. vmcore not retrieved from raw partition upon reboot

  
Actual results:
dump is written but not recovered


Expected results:
dump is recovered, as it was with diskdump in the old days


Additional info:
Comment 1 Dave Maley 2010-06-17 17:12:44 EDT
It looks like RHEL5 needs this as well, however I'll hold off on cloning until we have some feedback on the patch from FJ.
Comment 2 RHEL Product and Program Management 2010-06-17 17:14:10 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 3 Cong Wang 2010-06-22 05:35:22 EDT
This looks strange, if you really want to collect the core from a raw partition, why not just dump to a vmcore file instead of a raw partition? Neil, is this what 'raw XXX' is used for?
Comment 4 Neil Horman 2010-06-23 07:43:54 EDT
 

raw XXX causes the kdump initramfs to dump /proc/vmcore to a raw disk w/o the aid of any file system.

The patch above appears to recover such a dump, although I'm hesitant to take it in this form, as there is no guarantee that:
1) there will be space in /var/crash for such a dump
2) there is no guarantee that we know to need to use makedumpfile -R on the dump.

Thats part of the reason we require by hand dump recovery

it might be better to just indicate to the user that a dump is waiting for recovery on startup
Comment 5 RHEL Product and Program Management 2010-07-15 11:22:05 EDT
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **
Comment 6 Neil Horman 2010-07-19 07:02:24 EDT
Dave, still waiting on feedback from FJ, as per comment 4.  Thanks!
Comment 7 Martin Wilck 2010-07-20 05:19:46 EDT
(In reply to comment #4)

> raw XXX causes the kdump initramfs to dump /proc/vmcore to a raw disk w/o the
> aid of any file system.
> 
> The patch above appears to recover such a dump, although I'm hesitant to take
> it in this form, as there is no guarantee that:
> 1) there will be space in /var/crash for such a dump

You never know that, the same situation exists if you dump straight into a partition in the first place. But AFAICT many customers use DUMPLEVEL 31 anyway, so the space requirements are moderate.

mkdumprd issues a warning if dump is configured to write to a file system partition that it considers too small to capture the dump (well, IMO mkdumprd's required size estimate is pretty rough). The same could be done for raw partitions.

> 2) there is no guarantee that we know to need to use makedumpfile -R on the
> dump.

I don't understand what you mean here. To my understanding makedumpfile -R will be required because mkdumprd either uses dd or makedumpfile -F, which both produce "flat" format.

> Thats part of the reason we require by hand dump recovery
> 
> it might be better to just indicate to the user that a dump is waiting for
> recovery on startup    

That should be very clearly documented and indicated, otherwise the risk of loosing a dump in a critical situation is very high.

Raw dumping is desirable in many environments because the chance for failure is lower than in all other cases. Besides, it behaves like diskdump in RHEL3/RHEL4 and many users like this behavior.
Comment 8 Neil Horman 2010-07-20 08:02:50 EDT
fine.
Comment 9 Neil Horman 2010-08-02 08:44:36 EDT
lowering priority/severity, since this has not been accepted for 6.0, and isn't causing any real failures currently.
Comment 11 Martin Wilck 2010-10-25 09:46:51 EDT
Changing into a 6.1 feature request.
Comment 12 Cong Wang 2010-11-09 00:10:49 EST
So, besides this patch, we also need to:

1) warn if the partition is too small
2) document that a dump is waiting for recovery on startup in this case.

Am I missing anything?
Comment 13 Martin Wilck 2010-11-09 04:29:54 EST
With dump compression, how do you know if the partition is too small? With the terabyte memory era approaching, requiring the dump partition to be the same size as physical memory is getting unrealistic.
Comment 14 Neil Horman 2010-11-09 08:50:18 EST
Amerigo, I think all we need to do here honestly is just check to see if a core exits and try to recover it as the attached patch does.  Lets not try get too fancy with it.
Comment 16 Chao Ye 2011-03-18 05:56:16 EDT
Tested with latest build:
=====================================================
[root@ibm-js22-07 ~]# rpm -q kernel kexec-tools
kernel-2.6.32-122.el6.ppc64
kexec-tools-2.0.0-171.el6.ppc64
[root@ibm-js22-07 ~]# tail /etc/kdump.conf 
#kdump_post /var/crash/scripts/kdump-post.sh
#extra_bins /usr/bin/lftp
#disk_timeout 30
#extra_modules gfs2
#options modulename options
#default shell

raw /dev/sda3
core_collector makedumpfile -c --message-level 1 -d 31
default shell
[root@ibm-js22-07 ~]# touch /etc/kdump.conf 
[root@ibm-js22-07 ~]# service kdump restart
Stopping kdump:[  OK  ]
Detected change(s) the following file(s):

  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-122.el6.ppc64kdump.img
Starting kdump:[  OK  ]
[root@ibm-js22-07 ~]# echo c > /proc/sysrq-trigger
--------------------------------------------------------------------------------------
mdadm: No arrays found in config file or automatically
Free memory/Total memory (free %): 150208 / 231296 ( 64.9419 )
Scanning logical volumes
  Reading all physical volumes.  This may take a while...
  Found volume group "vg_ibmjs2207" using metadata type lvm2
Activating logical volumes
  2 logical volume(s) in volume group "vg_ibmjs2207" now active
Free memory/Total memory (free %): 149312 / 231296 ( 64.5545 )
Saving to partition /dev/sda3
Free memory/Total memory (free %): 149312 / 231296 ( 64.5545 )
Copying data                       : [100 %] 
Saving core complete
Restarting system.
......

Starting RPC idmapd: [  OK  ]
Dump saved to /var/crash/2011-03-18-05:46/vmcore
Starting kdump:[  OK  ]
......
[root@ibm-js22-07 ~]# ls -lsh /var/crash/2011-03-18-05\:46/
total 28M
28M -rw-------. 1 root root 28M Mar 18 05:46 vmcore

Change status to VERIFIED.
Comment 17 errata-xmlrpc 2011-05-19 10:15:15 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0736.html

Note You need to log in before you can comment on or make changes to this bug.